# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json
import statistics

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [6]:
#5a Jason
len({c['instructor'] for c in courses if c['subject'] == 'COSI'})

27

In [7]:
#5b Tiancheng
sum([c['enrolled'] for c in courses if c['subject'] == 'COSI'])
# what if one student is taking multiple CS classes? I guess there is no way to tell...

2223

In [8]:
#5c Tiancheng
statistics.median([c['enrolled'] for c in courses if (c['enrolled'] >= 10 and c['subject'] == 'COSI')])

37

In [9]:
#5d Iria
subjects = {c['subject'] for c in courses}     
output = [(sum([c['enrolled'] for c in courses if c['subject'] == subject]),subject) for subject in subjects]
output.sort(key = lambda pair: -pair[0])
print(output[:10])

[(5318, 'HS'), (3085, 'BIOL'), (2766, 'BUS'), (2734, 'HWL'), (2322, 'CHEM'), (2315, 'ECON'), (2223, 'COSI'), (1785, 'MATH'), (1704, 'PSYC'), (1144, 'ANTH')]


In [10]:
#5e Iria
output = [(len([c for c in courses if c['subject'] == subject]),subject) for subject in subjects]
output.sort(key = lambda pair: -pair[0])
print(output[:10])

[(613, 'BIOL'), (498, 'HIST'), (417, 'PSYC'), (403, 'NEUR'), (296, 'BCHM'), (288, 'PHYS'), (274, 'HS'), (272, 'COSI'), (266, 'MUS'), (265, 'ENG')]


In [11]:
#5f Iria
output = [(len({c['instructor'] for c in courses if c['subject'] == subject}),subject) for subject in subjects]
output.sort(key = lambda pair: -pair[0])
print(output[:10])

[(87, 'HS'), (67, 'BIOL'), (52, 'ECON'), (49, 'BCHM'), (47, 'BUS'), (47, 'HIST'), (46, 'BCBP'), (42, 'HWL'), (37, 'MATH'), (37, 'NEJS')]


In [12]:
#5g Iria
profs = {c['instructor'] for c in courses} 
output = [(sum([c['enrolled'] for c in courses if c['instructor'] == prof]),prof) for prof in profs]
output.sort(key = lambda pair: -pair[0])
print([prof for (num, prof) in output[:20]])

[('Leah', 'Berkenwald', 'leahb@brandeis.edu'), ('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'), ('Stephanie', 'Murray', 'murray@brandeis.edu'), ('Milos', 'Dolnik', 'dolnik@brandeis.edu'), ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), ('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu'), ('Rachel V.E.', 'Woodruff', 'woodruff@brandeis.edu'), ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), ('Daniel', 'Breen', 'dbreen91@brandeis.edu'), ('Melissa', 'Kosinski-Collins', 'kosinski@brandeis.edu'), ('Claudia', 'Novack', 'novack@brandeis.edu'), ('Antonella', 'DiLillo', 'dilant@brandeis.edu'), ('Jon', 'Chilingerian', 'chilinge@brandeis.edu'), ('Ahmad', 'Namini', 'anamini@brandeis.edu'), ('Iraklis', 'Tsekourakis', 'tsekourakis@brandeis.edu'), ('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu'), ('Peter', 'Mistark', 'pmistark@brandeis.edu'), ('Brenda', 'Anderson', 'banders@brandeis.edu'), ('Colleen', 'Hitchcock', 'hitchcock@brandeis.edu'), ('Scott A.', 'Redenius', 'redenius@brandeis.edu')]


In [13]:
#5h Iria
courseIDs = {c['subject'] + c['coursenum'] for c in courses}
output = [(sum([c['enrolled'] for c in courses if c['subject'] + c['coursenum'] == courseID]),courseID) 
          for courseID in courseIDs]
output.sort(key = lambda pair: -pair[0])
print([course for (num, course) in output[:20]])

['HWL1', 'HWL1-PRE', 'BIOL14A', 'COSI10A', 'PSYC10A', 'BIOL15B', 'MATH10A', 'BIOL18B', 'BIOL18A', 'CHEM29A', 'CHEM29B', 'CHEM25A', 'PSYC51A', 'CHEM25B', 'COSI12B', 'BUS6A', 'CHEM18A', 'ECON10A', 'MATH15A', 'COSI21A']


In [14]:
#5i, Jason's interesting question: How many courses were taught by professor Hickey?
len([c for c in courses if 'Hickey' in c['instructor']])

21

In [15]:
#5i, Tiancheng's interesting question: What COSI courses do not require students to be in person?
print({'COSI: ' + c['coursenum'] for c in courses 
       if (c['subject'] == 'COSI' and ('remote' in c['details'] or 'hybrid' in c['details']))})

{'COSI: 143B', 'COSI: 166B', 'COSI: 131A', 'COSI: 102A', 'COSI: 164A', 'COSI: 21A', 'COSI: 217B', 'COSI: 123A', 'COSI: 149B', 'COSI: 101A', 'COSI: 29A', 'COSI: 190A', 'COSI: 12B', 'COSI: 121B', 'COSI: 114A', 'COSI: 118A', 'COSI: 130A', 'COSI: 233A', 'COSI: 140B', 'COSI: 134A', 'COSI: 114B', 'COSI: 138A', 'COSI: 10A', 'COSI: 159A', 'COSI: 165B', 'COSI: 119A', 'COSI: 137B', 'COSI: 132A', 'COSI: 175A', 'COSI: 139B'}


In [16]:
#5i, Iria's interesting question: How many courses meet twice a week?
len([c for c in courses if c['times'] != [] and len(c['times'][0]['days']) == 2])

1224

# Hello World!

for the video :)

## bold
regular