# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json
import math

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1051]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': None,
 'times': [],
 'enrolled': 0,
 'details': '',
 'type': 'section',
 'status_text': 'Open',
 'section': '3',
 'waiting': 0,
 'instructor': ['Chandler', 'Rosenberger', 'crosen@brandeis.edu'],
 'coinstructors': [],
 'code': ['SOC', '292A'],
 'subject': 'SOC',
 'coursenum': '292A',
 'name': "Master's Graduate Internship",
 'independent_study': True,
 'term': '1203',
 'description': 'Usually offered every year.\nStaff'}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [6]:
#Markdown Cell Placeholder

In [7]:
#instructors who taught COSI, problem a
len({c['instructor']  for c in courses if c['subject'] == 'COSI'})

27

In [8]:
# how many cosi students total, problem b
totalStudents = 0
for course in courses:
    if course['subject'] == "COSI":
        totalStudents+=course['enrolled']
print(totalStudents)

2223


In [9]:
#Markdown Cell Placeholder

In [10]:
#top 10 subjects in terms of students enrolled, problem d
enrolledDict = {}
for course in courses:
        if course['subject'] not in enrolledDict:
                enrolledDict[course['subject']] = course['enrolled']
        else:
                enrolledDict[course['subject']] += course['enrolled']
sortedEnrolledDict = sorted(enrolledDict.items(), key = lambda kv:(kv[1], kv[0]), reverse=True)
print(sortedEnrolledDict[0:10])

[('HS', 5318), ('BIOL', 3085), ('BUS', 2766), ('HWL', 2734), ('CHEM', 2322), ('ECON', 2315), ('COSI', 2223), ('MATH', 1785), ('PSYC', 1704), ('ANTH', 1144)]


In [11]:
# Problem e
# This assumes that different sections of the same course numbers count as classes
enrolledDict = {}
for course in courses:
        if course['subject'] not in enrolledDict:
                enrolledDict[course['subject']] = 1
        else:
                enrolledDict[course['subject']] += 1

sortedEnrolledDict = sorted(enrolledDict.items(), key = lambda kv:(kv[1], kv[0]), reverse=True)
print(sortedEnrolledDict[0:10])

[('BIOL', 613), ('HIST', 498), ('PSYC', 417), ('NEUR', 403), ('BCHM', 296), ('PHYS', 288), ('HS', 274), ('COSI', 272), ('MUS', 266), ('ENG', 265)]


In [12]:
#top 20 professors by students, problem g
enrolledDict = {}
for course in courses:
        if ' '.join(course['instructor'][:2]) not in enrolledDict:
                enrolledDict[' '.join(course['instructor'][:2])] = course['enrolled']
        else:
                enrolledDict[' '.join(course['instructor'][:2])] += course['enrolled']
sortedEnrolledDict = sorted(enrolledDict.items(), key = lambda kv:(kv[1], kv[0]), reverse=True)
print(sortedEnrolledDict[0:10])

[('Leah Berkenwald', 926), ('Kene Nathan Piasta', 583), ('Stephanie Murray', 515), ('Milos Dolnik', 489), ('Maria de Boef Miara', 450), ('Bryan Ingoglia', 439), ('Rachel V.E. Woodruff', 422), ('Timothy J Hickey', 411), ('Daniel Breen', 375), ('Melissa Kosinski-Collins', 365)]


In [19]:
#Median size of Cosi course
total={i['enrolled'] for i in courses if i['enrolled']>9 and i['subject'] =='COSI'}
convert=list(total)
if len(convert)%2==1:
    convert[math.floor(len(convert)/2)]
else:
    print((convert[math.floor(len(convert)/2)]+convert[math.floor(len(convert)/2+1)])/2)


[('HWL 1', 940), ('HWL 1-PRE', 879), ('BIOL 14A', 358), ('COSI 10A', 343), ('PSYC 10A', 336), ('BIOL 15B', 287), ('MATH 10A', 280), ('BIOL 18B', 274), ('BIOL 18A', 262), ('CHEM 29A', 245), ('CHEM 29B', 239), ('CHEM 25A', 236), ('PSYC 51A', 231), ('CHEM 25B', 226), ('COSI 12B', 225), ('BUS 6A', 215), ('CHEM 18A', 208), ('ECON 10A', 207), ('MATH 15A', 204), ('COSI 21A', 201)]


In [None]:
# Problem h
enrolledDict = {}
for course in courses:
        course_subject_plus_num = course['subject'] +" "+course['coursenum'] 
        if course_subject_plus_num not in enrolledDict:
                enrolledDict[course_subject_plus_num] = course['enrolled']
        else:
                enrolledDict[course_subject_plus_num] += course['enrolled']
sortedEnrolledDict = sorted(enrolledDict.items(), key = lambda kv:(kv[1], kv[0]), reverse=True)
print(sortedEnrolledDict[0:20])

In [None]:
# number of in person and remote classes in term 1203 with over 14 people. Note courses with both doublecount
inperson={i['subject'] for i in courses if i['details'].find('in person')==-1 and i['enrolled']>14 and i['term']=='1203'}
print(len(inperson))
inperson={i['subject'] for i in courses if i['details'].find('remot')==-1 and i['enrolled']>14 and i['term']==str(1203)}
print(len(inperson))


In [None]:
#Question F counts number of proffesors in each subject
enrolledDict = {}
instructorDict = {}
for course in courses:
        if course['subject'] not in enrolledDict:
                enrolledDict[course['subject']] = course['instructor']
                instructorDict[course['subject']] = 1
        elif instructorDict['instructor'] not in instructorDict:
                instructorDict[course['subject']] +=1

sortedEnrolledDict = sorted(enrolledDict.items(), key = lambda kv:(kv[1], kv[0]), reverse=True)
print(sortedEnrolledDict[:10])

In [18]:
# how many history professors have their first name start with a D? problem i part 1

len({c['instructor'] for c in courses if c['subject'] == 'HIST' and c['instructor'][0][0] == 'D'})

5

In [17]:
#how many professors teach a subject that starts with the same letter as their first name 1i for Vibhu
len({c['instructor'] for c in courses if c['subject'][0][0] == c['instructor'][0][0]})

57

In [15]:
total={i['enrolled'] for i in courses if course['enrolled']>9 and course['subject'] =="COSI"}
convert=list(total)
if len(convert)%2==1:
    convert[math.floor(len(convert)/2)]
else:
    (convert[math.floor(len(convert)/2)]+convert[math.floor(len(convert)/2+1)])/2
    convert[0]

IndexError: list index out of range