# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json
import statistics

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [None]:
#5a Jason
len({c['instructor'] for c in courses if c['subject'] == 'COSI'})

In [None]:
#5b Tiancheng
sum([c['enrolled'] for c in courses if c['subject'] == 'COSI'])
# what if one student is taking multiple CS classes? I guess there is no way to tell...

In [None]:
#5c Tiancheng
statistics.median([c['enrolled'] for c in courses if (c['enrolled'] >= 10 and c['subject'] == 'COSI')])

In [None]:
#5d Iria
subjects = {c['subject'] for c in courses}     
output = [(sum([c['enrolled'] for c in courses if c['subject'] == subject]),subject) for subject in subjects]
output.sort(key = lambda pair: -pair[0])
print(output[:10])

In [None]:
#5e Iria
output = [(len([c for c in courses if c['subject'] == subject]),subject) for subject in subjects]
output.sort(key = lambda pair: -pair[0])
print(output[:10])

In [None]:
#5f Iria
output = [(len({c['instructor'] for c in courses if c['subject'] == subject}),subject) for subject in subjects]
output.sort(key = lambda pair: -pair[0])
print(output[:10])

In [None]:
#5g Iria
profs = {c['instructor'] for c in courses} 
output = [(sum([c['enrolled'] for c in courses if c['instructor'] == prof]),prof) for prof in profs]
output.sort(key = lambda pair: -pair[0])
print([prof for (num, prof) in output[:20]])

In [None]:
#5h Iria
courseIDs = {c['subject'] + c['coursenum'] for c in courses}
output = [(sum([c['enrolled'] for c in courses if c['subject'] + c['coursenum'] == courseID]),courseID) 
          for courseID in courseIDs]
output.sort(key = lambda pair: -pair[0])
print([course for (num, course) in output[:20]])

In [None]:
#5i, Jason's interesting question: How many courses were taught by professor Hickey?
len([c for c in courses if 'Hickey' in c['instructor']])

In [None]:
#5i, Tiancheng's interesting question: What COSI courses do not require students to be in person?
print({'COSI: ' + c['coursenum'] for c in courses 
       if (c['subject'] == 'COSI' and ('remote' in c['details'] or 'hybrid' in c['details']))})

In [None]:
#5i, Iria's interesting question: How many courses meet twice a week?
len([c for c in courses if c['times'] != [] and len(c['times'][0]['days']) == 2])