# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [10]:
import json

In [11]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [12]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]


notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [6]:
subjects = []
[subjects.append(course['subject']) for course in courses if course.get('subject') not in subjects]
print(subjects)

['HRNS', 'BUS', 'ECON', 'FIN', 'HS', 'ANTH', 'CHEM', 'PHIL', 'HIST', 'PSYC', 'WGS', 'ENG', 'POL', 'BIOL', 'NBIO', 'CHIN', 'MATH', 'MUS', 'SOC', 'BCBP', 'BIOT', 'ED', 'CLAS', 'COMH', 'COSI', 'GRK', 'LAT', 'THA', 'PHYS', 'NEJS', 'NEUR', 'PMED', 'ESL', 'BUS/FIN', 'BUS/ECON', 'ECON/FIN', 'HS/POL', 'HWL', 'GER', 'JAPN', 'KOR', 'RUS', 'ITAL', 'HISP', 'FREN', 'IGS', 'AMST', 'AAAS', 'BCHM', 'BIPH', 'EAS', 'COML', 'SAS', 'YDSH', 'SJSP', 'REL', 'PAX', 'LGLS', 'QBIO', 'BIBC', 'HSSP', 'ENVS', 'ECS', 'FILM', 'FA', 'HOID', 'IIM', 'CAST', 'IMES', 'LALS', 'LING', 'JOUR', 'HBRW', 'NPSY', 'ARBC', 'INT', 'CA', 'EBIO', 'AAAS/WGS', 'HUM', 'EL', 'CBIO', 'AMST/ENG', 'RECS/THA', 'UWS', 'COMP', 'MERS', 'AAS/AAPI', 'AAPI/HIS', 'RECS', 'AAPI/WGS', 'AAAS/HIS', 'BISC', 'AAPI', 'GS', 'POL/WGS', 'HUM/UWS', 'CHSC', 'AMST/MUS', 'HIST/SOC', 'QR', 'CLAS/ENG', 'ANTH/WGS', 'HIST/WGS', 'CLAS/NEJ', 'ECS/ENG', 'RBIF', 'RBOT', 'RCOM', 'RDFT', 'RDMD', 'RHIN', 'RIAS', 'RIDT', 'RMGT', 'RPJM', 'RSAN', 'RSEG', 'RUCD', 'GECS']


In [7]:
terms = []
[terms.append(course['term']) for course in courses if course.get('term') not in terms]
print(terms)

['1203', '1211', '1212']


In [8]:
instructors = []
[instructors.append(course['instructor']) for course in courses if course.get('instructor') not in instructors]
print(instructors)

[('Sharon', 'Feiman-Nemser', 'snemser@brandeis.edu'), ('Joseph B', 'Reimer', 'reimer@brandeis.edu'), ('Mark', 'Rosen', 'mirosen@brandeis.edu'), ('Jonathan', 'Sarna', 'sarna@brandeis.edu'), ('Ellen', 'Smith', 'esmith2@brandeis.edu'), ('Leonard', 'Saxe', 'saxe@brandeis.edu'), ('Jon A', 'Levisohn', 'levisohn@brandeis.edu'), ('Matthew E.', 'Boxer', 'mboxer@brandeis.edu'), ('Barry', 'Shrage', 'barryshrage@brandeis.edu'), ('Janet K.', 'Aronson', 'jaronson@brandeis.edu'), ('Andrew L', 'Molinsky', 'molinsky@brandeis.edu'), ('Arnold', 'Kamis', 'arnoldkamis@brandeis.edu'), ('Robert', 'Malenfant', 'robmalenfant@brandeis.edu'), ('Staff', 'Staff', 'no_email'), ('Hamza', 'Abdurezak', 'abdurez@brandeis.edu'), ('John', 'Kolovos', 'jkolovos@brandeis.edu'), ('Ian M.', 'Roy', 'ianroy@brandeis.edu'), ('Ahmad', 'Namini', 'anamini@brandeis.edu'), ('Javier', 'Vidal-Berastain', 'xvidalberastain@brandeis.edu'), ('Shirley', 'Idelson', 'sidelson@brandeis.edu'), ('Anna', 'Scherbina', 'ascherbina@brandeis.edu'), (

In [35]:
consent = []
[consent.append(course) for course in courses if 'Consent' in course['status_text']]
print(consent)

[{'limit': None, 'times': [], 'enrolled': 0, 'details': "Instructor's Signature Required.\nSee Course Catalog for Special Notes.", 'type': 'section', 'status_text': 'Open Consent Req.', 'section': '2', 'waiting': 0, 'instructor': ['Sharon', 'Feiman-Nemser', 'snemser@brandeis.edu'], 'coinstructors': [], 'code': ['HRNS', '329F'], 'subject': 'HRNS', 'coursenum': '329F', 'name': 'Readings in Jewish Professional Leadership', 'independent_study': True, 'term': '1203', 'description': 'Meets for one-half semester and yields half-course credit.\n\nSharon Feiman-Nemser'}, {'limit': None, 'times': [], 'enrolled': 0, 'details': "Instructor's Signature Required.\nSee Course Catalog for Special Notes.", 'type': 'section', 'status_text': 'Open Consent Req.', 'section': '2', 'waiting': 0, 'instructor': ['Joseph B', 'Reimer', 'reimer@brandeis.edu'], 'coinstructors': [], 'code': ['HRNS', '333F'], 'subject': 'HRNS', 'coursenum': '333F', 'name': 'Readings in Jewish Professional Leadership', 'independent_s