# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Question 5: explore the data set

Interested topics:
* how many faculty taught COSI courses last year?
* what is the total number of students taking COSI courses last year?
* what was the median size of a COSI course last year (counting only those courses with at least 10 students)
* create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.
* do the same as in (d) but print the top 10 subjects in terms of number of courses offered
* do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject
* list the top 20 faculty in terms of number of students they taught
* list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)
* Create your own interesting question (each team member creates their own) and use Python to answer that question.


In [6]:
# How many faculty taught COSI courses last year?
res = set(course['instructor'] for course in courses if course['subject'] == 'COSI')
print(len(res))


27


In [7]:
# What is the total number of students taking COSI courses last year?
res = sum(course['enrolled'] for course in courses if course['subject'] == 'COSI')
print(res)

2223


In [8]:
# What is the median size of a COSI course last year?
from statistics import median

res = median(course['enrolled'] for course in courses if course['subject'] == 'COSI' and course['enrolled'] >= 10)
print(res)

37


In [9]:
# create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, 
# sort it and print the top 10.

from collections import defaultdict

tmp = {}
for course in courses:
    if course['subject'] not in tmp:
        tmp[course['subject']] = course['enrolled']
    else:
        tmp[course['subject']] += course['enrolled']
res = list(tmp.items())
res.sort(key=lambda x: -x[1])
print(res[:10])

[('HS', 5318), ('BIOL', 3085), ('BUS', 2766), ('HWL', 2734), ('CHEM', 2322), ('ECON', 2315), ('COSI', 2223), ('MATH', 1785), ('PSYC', 1704), ('ANTH', 1144)]


In [10]:
# do the same as in (d) but print the top 10 subjects in terms of number of courses offered

from collections import defaultdict

tmp = defaultdict(set)
for course in courses:
    tmp[course['subject']].add(course['coursenum'])
res = [(k, len(tmp[k])) for k in tmp]
res.sort(key=lambda x: -x[1])
print(res[:10])

[('HS', 188), ('BIOL', 72), ('MUS', 72), ('ENG', 71), ('ANTH', 68), ('BUS', 65), ('NEJS', 64), ('PSYC', 57), ('MATH', 55), ('ECON', 53)]


In [11]:
# do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject

from collections import defaultdict

tmp = defaultdict(set)
for course in courses:
    tmp[course['subject']].add(course['instructor'])
res = [(k, len(tmp[k])) for k in tmp]
res.sort(key=lambda x: -x[1])
print(res[:10])

[('HS', 87), ('BIOL', 67), ('ECON', 52), ('BCHM', 49), ('BUS', 47), ('HIST', 47), ('BCBP', 46), ('HWL', 42), ('MATH', 37), ('NEJS', 37)]


In [12]:
# list the top 20 faculty in terms of number of students they taught

from collections import defaultdict

tmp = defaultdict(int)
for course in courses:
    tmp[course['instructor']] += course['enrolled']
res = list(((k[0], k[1]), tmp[k]) for k in tmp)
res.sort(key=lambda x: -x[1])
print([item[0] for item in res][:20])



[('Leah', 'Berkenwald'), ('Kene Nathan', 'Piasta'), ('Stephanie', 'Murray'), ('Milos', 'Dolnik'), ('Maria', 'de Boef Miara'), ('Bryan', 'Ingoglia'), ('Rachel V.E.', 'Woodruff'), ('Timothy J', 'Hickey'), ('Daniel', 'Breen'), ('Melissa', 'Kosinski-Collins'), ('Claudia', 'Novack'), ('Antonella', 'DiLillo'), ('Jon', 'Chilingerian'), ('Ahmad', 'Namini'), ('Iraklis', 'Tsekourakis'), ('Geoffrey', 'Clarke'), ('Peter', 'Mistark'), ('Brenda', 'Anderson'), ('Colleen', 'Hitchcock'), ('Scott A.', 'Redenius')]


In [13]:
# list the top 20 courses in terms of number of students taking that course (where you combine different sections 
# and semesters, i.e. just use the subject and course number)

from collections import defaultdict

tmp = defaultdict(int)
for course in courses:
    tmp[course['code']] += course['enrolled']
res = list(tmp.items())
res.sort(key=lambda x: -x[1])
print([item[0] for item in res][:20])

[('HWL', '1'), ('HWL', '1-PRE'), ('BIOL', '14A'), ('COSI', '10A'), ('PSYC', '10A'), ('BIOL', '15B'), ('MATH', '10A'), ('BIOL', '18B'), ('BIOL', '18A'), ('CHEM', '29A'), ('CHEM', '29B'), ('CHEM', '25A'), ('PSYC', '51A'), ('CHEM', '25B'), ('COSI', '12B'), ('BUS', '6A'), ('CHEM', '18A'), ('ECON', '10A'), ('MATH', '15A'), ('ANTH', '1A')]


In [14]:
# Create your own interesting question (each team member creates their own) and use Python to answer that question.

# What are the top 10 popular COSI courses (measure by amount of enrolled students)

from collections import defaultdict

tmp = defaultdict(int)

for course in courses:
    if course['subject'] == 'COSI' :
        key = course['coursenum'] + ": " + course['name']
        val = course['enrolled']
        tmp[key] = val
res = list(tmp.items())
res.sort(key=lambda x: -x[1])
print(res[:10])

[('164A: Introduction to 3-D Animation', 166), ('21A: Data Structures and the Fundamentals of Computing', 102), ('29A: Discrete Structures', 101), ('131A: Operating Systems', 94), ('130A: Introduction to the Theory of Computation', 87), ('114A: Fundamentals of Natural Language Processing I', 84), ('121B: Structure and Interpretation of Computer Programs', 61), ('123A: Statistical Machine Learning', 56), ('149B: Practical Machine Learning with Big Data', 48), ('102A: Software Entrepreneurship', 47)]
