# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [6]:
{c['term'] for c in courses}

{'1203', '1211', '1212'}

In [7]:
len({s['subject']for s in courses})

120

In [8]:
len({i['instructor']for i in courses})

904

#### 5e

Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10 subjects in terms of number of courses offered. 

(# of courses offered per subject) 
Subject with highest number of courses 

In [9]:
sizes_dict = {}
for course in courses:
    sizes_dict[course["subject"]] = (sizes_dict[course["subject"]] + 1) if course['subject'] in sizes_dict else 0 
    #Create a counter  with subjects as keys ie {Biol: 1} and then, then use a sorter to get the b top ten in a for loop
    # and then use that to create a tuples
courses_1 = []
for subject_name, subject_size in sizes_dict.items():
    courses_1.append([subject_name, subject_size])
subjects_by_size = sorted(courses_1, key = lambda subject:- subject[1])
tuple([(course[1], course[0]) for course in subjects_by_size [1:11]])

((497, 'HIST'),
 (416, 'PSYC'),
 (402, 'NEUR'),
 (295, 'BCHM'),
 (287, 'PHYS'),
 (273, 'HS'),
 (271, 'COSI'),
 (265, 'MUS'),
 (264, 'ENG'),
 (262, 'BCBP'))

#5f

Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10 subjects in terms of number of faculty teaching courses in that subject

In [10]:
sizes_dict = {}
for course in courses:
    sizes_dict[course["instructor"]] = (sizes_dict[course["instructor"]] + 1) if course["instructor"] in sizes_dict else 0   
courses = []
for subject_name, subject_size in sizes_dict.items():
    courses.append([subject_name, subject_size])
subjects_by_size = sorted(courses, key = lambda subject:- subject[1])
tuple([(course[1], course[0]) for course in subjects_by_size [1:11]])

((48, ('Leslie Claire', 'Griffith', 'griffith@brandeis.edu')),
 (48, ('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu')),
 (46, ('Sacha', 'Nelson', 'nelson@brandeis.edu')),
 (46, ('Seth', 'Fraden', 'fraden@brandeis.edu')),
 (46, ('Paul', 'DiZio', 'dizio@brandeis.edu')),
 (45, ('John', 'Lisman', 'lisman@brandeis.edu')),
 (44, ('Michael', 'Rosbash', 'rosbash@brandeis.edu')),
 (44, ('Donald B.', 'Katz', 'dbkatz@brandeis.edu')),
 (44, ('Robert W', 'Sekuler', 'sekuler@brandeis.edu')),
 (43, ('Arthur', 'Wingfield', 'wingfiel@brandeis.edu')))

#5i 
Create your own interesting question (each team member creates their own) and use Python to answer that question.

My question:
Calculate the median enrollment taught in biology. 

In [11]:
import statistics
list = {c['enrolled'] for c in courses if c['subject']=='BIOL' if c['enrolled']<=10}
print(statistics.median([list]))

TypeError: list indices must be integers or slices, not str