# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [2]:
import json

In [3]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [4]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [5]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [6]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

**(a) How many faculty taught COSI courses last year?**

**(b) What is the total number of students taking COSI courses last year?**

**(c) What was the median size of a COSI course last year (counting only those courses with at least 10 students)?**

In [7]:
cosi_list = []
for course in courses:
    if course['subject'] == 'COSI' and course['enrolled'] >= 10:
            cosi_list.append(course['enrolled'])

sorted_cosi_list = sorted(cosi_list)
median_index = int(len(cosi_list)/2)
print('Median size:',sorted_cosi_list[median_index])

Median size: 37


**(d) Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.**

In [8]:
enrolled_list = []
dict_of_ES = {}
list_of_tuples = []

# add each subject and students enrolled to a dictionary 
# dictionary has unique keys, so allows us to check if a subject has already been added
for course in courses:
    subject = course['subject']
    enrolled = course['enrolled']
    if subject not in dict_of_ES:
        dict_of_ES[subject] = 0
    dict_of_ES[subject] += enrolled

# iterate through dictionary and add each entry as a tuple to list_of_tuples
for key in dict_of_ES.keys():
    list_of_tuples.append((dict_of_ES[key], key))

# sort list_of_tuples descending
sorted_list_of_tuples = sorted(list_of_tuples, key = lambda x: x[0], reverse = True)

# print top 10 subjects in terms of enrolled students
print(sorted_list_of_tuples[:10])

[(5318, 'HS'), (3085, 'BIOL'), (2766, 'BUS'), (2734, 'HWL'), (2322, 'CHEM'), (2315, 'ECON'), (2223, 'COSI'), (1785, 'MATH'), (1704, 'PSYC'), (1144, 'ANTH')]


**(e) Do the same as in (d) but print the top 10 subjects in terms of number of courses offered.**

In [9]:
dict_of_subjects = {}
list_of_tuples = []

# iterate through courses and keep count of the number of courses in each subject
for course in courses:
    subject = course['subject']
    if subject not in dict_of_subjects:
        dict_of_subjects[subject] = 0
    dict_of_subjects[subject] += 1

# iterate through dictionary and add each entry as a tuple to list_of_tuples
for key in dict_of_subjects.keys():
    list_of_tuples.append((dict_of_subjects[key], key))

# sort list_of_tuples descending
sorted_list_of_tuples = sorted(list_of_tuples, key = lambda x: x[0], reverse = True)

# print top 10 subjects in terms of number of courses offered
print(sorted_list_of_tuples[:10])

[(613, 'BIOL'), (498, 'HIST'), (417, 'PSYC'), (403, 'NEUR'), (296, 'BCHM'), (288, 'PHYS'), (274, 'HS'), (272, 'COSI'), (266, 'MUS'), (265, 'ENG')]


**(f) Do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject.**

In [10]:
dict_of_subjects = {}
list_of_tuples = []

# iterate through courses and keep count of the number of faculty teaching courses in each subject
for course in courses:
    subject = course['subject']
    if subject not in dict_of_subjects:
        dict_of_subjects[subject] = {course['faculty']}
    dict_of_subjects[subject]

# iterate through dictionary and add each entry as a tuple to list_of_tuples
for key in dict_of_subjects.keys():
    list_of_tuples.append((dict_of_subjects[key], key))

# sort list_of_tuples descending
sorted_list_of_tuples = sorted(list_of_tuples, key = lambda x: x[0], reverse = True)

# print top 10 subjects in terms of number of courses offered
print(sorted_list_of_tuples[:10])

KeyError: 'faculty'

**(g) List the top 20 faculty in terms of number of students they taught.**

In [11]:
from collections import Counter
count = Counter()
for course in courses:
    count[course['instructor']] += course['enrolled']
    
list_of_tuples = []
for key, value in count.items():
    list_of_tuples.append((key,value))
sorted_list_of_tuples = sorted(list_of_tuples,key = lambda x: x[1],reverse=True)

# print top 10 subjects in terms of enrolled students
print(sorted_list_of_tuples[:20])

[(('Leah', 'Berkenwald', 'leahb@brandeis.edu'), 926), (('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'), 583), (('Stephanie', 'Murray', 'murray@brandeis.edu'), 515), (('Milos', 'Dolnik', 'dolnik@brandeis.edu'), 489), (('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), 450), (('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu'), 439), (('Rachel V.E.', 'Woodruff', 'woodruff@brandeis.edu'), 422), (('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), 411), (('Daniel', 'Breen', 'dbreen91@brandeis.edu'), 375), (('Melissa', 'Kosinski-Collins', 'kosinski@brandeis.edu'), 365), (('Claudia', 'Novack', 'novack@brandeis.edu'), 355), (('Antonella', 'DiLillo', 'dilant@brandeis.edu'), 342), (('Jon', 'Chilingerian', 'chilinge@brandeis.edu'), 330), (('Ahmad', 'Namini', 'anamini@brandeis.edu'), 327), (('Iraklis', 'Tsekourakis', 'tsekourakis@brandeis.edu'), 316), (('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu'), 315), (('Peter', 'Mistark', 'pmistark@brandeis.edu'), 277), (('Brenda', 'Anderson', 'banders@brande

**(h) List the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number).**

In [12]:
count = Counter()
for course in courses:
    count[course['code']] += course['enrolled']
    
list_of_tuples = []
for key, value in count.items():
    list_of_tuples.append((key,value))
sorted_list_of_tuples = sorted(list_of_tuples,key = lambda x: x[1],reverse=True)

# print top 10 subjects in terms of enrolled students
print(sorted_list_of_tuples[:20])

[(('HWL', '1'), 940), (('HWL', '1-PRE'), 879), (('BIOL', '14A'), 358), (('COSI', '10A'), 343), (('PSYC', '10A'), 336), (('BIOL', '15B'), 287), (('MATH', '10A'), 280), (('BIOL', '18B'), 274), (('BIOL', '18A'), 262), (('CHEM', '29A'), 245), (('CHEM', '29B'), 239), (('CHEM', '25A'), 236), (('PSYC', '51A'), 231), (('CHEM', '25B'), 226), (('COSI', '12B'), 225), (('BUS', '6A'), 215), (('CHEM', '18A'), 208), (('ECON', '10A'), 207), (('MATH', '15A'), 204), (('ANTH', '1A'), 201)]


**(i) Create your own interesting question (each team member creates their own) and use Python to answer that question.**

Kelly's Question: What are the top 10 courses taught in the COSI department in terms of number of students enrolled in the course?

In [13]:
# get a list of cosi courses and number of enrolled students
list_of_tuples = []
for course in courses:
    if course['subject'] == 'COSI':
        name = course['name']
        enrolled = course ['enrolled']
        list_of_tuples.append((name, enrolled))

# print top ten based on number of enrolled students
sorted_list_of_tuples = sorted(list_of_tuples, key = lambda x: x[1], reverse = True)
print(sorted_list_of_tuples[:10], sep = "\n")

[('Introduction to 3-D Animation', 166), ('Introduction to Problem Solving in Python', 150), ('Data Structures and the Fundamentals of Computing', 102), ('Discrete Structures', 101), ('Data Structures and the Fundamentals of Computing', 99), ('Advanced Programming Techniques in Java', 96), ('Operating Systems', 94), ('Introduction to Problem Solving in Python', 92), ('Introduction to the Theory of Computation', 87), ('Fundamentals of Natural Language Processing I', 84)]


Zheyu's Question: What COSI courses have an enrollment greater than 100?

In [14]:
#list all COSI courses with an enrollment greater than 100
list_of_courses = []
for course in courses:
    if course['subject'] == 'COSI':
        if course['enrolled'] >= 100:
            name = course['name']
            list_of_courses.append(name)
print(list_of_courses ,sep = '\n')

['Introduction to Problem Solving in Python', 'Data Structures and the Fundamentals of Computing', 'Discrete Structures', 'Introduction to 3-D Animation']


Group Member Name's Question

Group Member Name's Question

Group Member Name's Question