# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [3]:
import json

In [4]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [5]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [6]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [7]:
print('notice that the instructor and code are tuples now')
courses[1246]


notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [8]:
subjects = []
[subjects.append(course['subject']) for course in courses if course.get('subject') not in subjects]
print(subjects)

['HRNS', 'BUS', 'ECON', 'FIN', 'HS', 'ANTH', 'CHEM', 'PHIL', 'HIST', 'PSYC', 'WGS', 'ENG', 'POL', 'BIOL', 'NBIO', 'CHIN', 'MATH', 'MUS', 'SOC', 'BCBP', 'BIOT', 'ED', 'CLAS', 'COMH', 'COSI', 'GRK', 'LAT', 'THA', 'PHYS', 'NEJS', 'NEUR', 'PMED', 'ESL', 'BUS/FIN', 'BUS/ECON', 'ECON/FIN', 'HS/POL', 'HWL', 'GER', 'JAPN', 'KOR', 'RUS', 'ITAL', 'HISP', 'FREN', 'IGS', 'AMST', 'AAAS', 'BCHM', 'BIPH', 'EAS', 'COML', 'SAS', 'YDSH', 'SJSP', 'REL', 'PAX', 'LGLS', 'QBIO', 'BIBC', 'HSSP', 'ENVS', 'ECS', 'FILM', 'FA', 'HOID', 'IIM', 'CAST', 'IMES', 'LALS', 'LING', 'JOUR', 'HBRW', 'NPSY', 'ARBC', 'INT', 'CA', 'EBIO', 'AAAS/WGS', 'HUM', 'EL', 'CBIO', 'AMST/ENG', 'RECS/THA', 'UWS', 'COMP', 'MERS', 'AAS/AAPI', 'AAPI/HIS', 'RECS', 'AAPI/WGS', 'AAAS/HIS', 'BISC', 'AAPI', 'GS', 'POL/WGS', 'HUM/UWS', 'CHSC', 'AMST/MUS', 'HIST/SOC', 'QR', 'CLAS/ENG', 'ANTH/WGS', 'HIST/WGS', 'CLAS/NEJ', 'ECS/ENG', 'RBIF', 'RBOT', 'RCOM', 'RDFT', 'RDMD', 'RHIN', 'RIAS', 'RIDT', 'RMGT', 'RPJM', 'RSAN', 'RSEG', 'RUCD', 'GECS']


In [9]:
terms = []
[terms.append(course['term']) for course in courses if course.get('term') not in terms]
print(terms)

['1203', '1211', '1212']


In [10]:
instructors = []
[instructors.append(course['instructor']) for course in courses if course.get('instructor') not in instructors]
print(instructors)

[('Sharon', 'Feiman-Nemser', 'snemser@brandeis.edu'), ('Joseph B', 'Reimer', 'reimer@brandeis.edu'), ('Mark', 'Rosen', 'mirosen@brandeis.edu'), ('Jonathan', 'Sarna', 'sarna@brandeis.edu'), ('Ellen', 'Smith', 'esmith2@brandeis.edu'), ('Leonard', 'Saxe', 'saxe@brandeis.edu'), ('Jon A', 'Levisohn', 'levisohn@brandeis.edu'), ('Matthew E.', 'Boxer', 'mboxer@brandeis.edu'), ('Barry', 'Shrage', 'barryshrage@brandeis.edu'), ('Janet K.', 'Aronson', 'jaronson@brandeis.edu'), ('Andrew L', 'Molinsky', 'molinsky@brandeis.edu'), ('Arnold', 'Kamis', 'arnoldkamis@brandeis.edu'), ('Robert', 'Malenfant', 'robmalenfant@brandeis.edu'), ('Staff', 'Staff', 'no_email'), ('Hamza', 'Abdurezak', 'abdurez@brandeis.edu'), ('John', 'Kolovos', 'jkolovos@brandeis.edu'), ('Ian M.', 'Roy', 'ianroy@brandeis.edu'), ('Ahmad', 'Namini', 'anamini@brandeis.edu'), ('Javier', 'Vidal-Berastain', 'xvidalberastain@brandeis.edu'), ('Shirley', 'Idelson', 'sidelson@brandeis.edu'), ('Anna', 'Scherbina', 'ascherbina@brandeis.edu'), (

In [11]:
# Elizabeth 5i
consent = []
[consent.append(course['name']) for course in courses if 'Consent' in course['status_text']]
print(consent[:10])

['Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership', 'Readings in Jewish Professional Leadership']


In [12]:
# Elizabeth 5i
noconsent = []
[noconsent.append(course['name']) for course in courses if 'Consent' not in course['status_text']]
print(noconsent[:10])

['Global Dexterity', 'Information Visualization', 'Sales and Sales Management', 'Corporate Financial Modeling', 'Technical Analysis', 'Digital Fabrication with Robotics', 'Python and Applications to Business Analytics', 'Analyzing Big Data I', 'Python and Applications to Business Analytics II', 'Jewish Community and Jewish Identity']


In [13]:
#Leora 5i - Total Number of students taking Math Courses total and in 2020
a =set(course[('instructor')] for course in courses if "MATH" in course['subject'])
b =tuple(course for course in courses if "MATH" in course['subject'])
c = set(course['instructor'] for course in b if '20' in course['coursenum'])
len(a)
print("There are: "+ str(len(a))+" Math Teachers in total")
print("There are: "+ str(len(c))+" Math Teachers teaching in 2020")


There are: 37 Math Teachers in total
There are: 8 Math Teachers teaching in 2020


In [14]:
#Aarthi 5c
import statistics
def medianSize():
    coursesize =[]

    for course in courses:
       coursesize.append(course['enrolled'])
       coursesize.sort()
    print((statistics.median(coursesize)))
    
medianSize()    

0


In [21]:
#Aarthi 5d
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
   #for course in courses:
    gather = dict()
    for course in courses:
        if course['subject'] in gather:
            num = gather[course['subject']]
            gather[course['subject']] = num+course['enrolled']
        else:
            gather[course['subject']] = course['enrolled']
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-11:-1]
    print(list(reversed(subjectData)))
           
sortedTuples()

[('BIOL', 3085), ('BUS', 2766), ('HWL', 2734), ('CHEM', 2322), ('ECON', 2315), ('COSI', 2223), ('MATH', 1785), ('PSYC', 1704), ('ANTH', 1144), ('ENG', 1109)]


In [16]:
#Aarthi 5e
#Leora helped figure this out. This was a team effort
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
   #for course in courses:
    gather = dict()
    for course in courses:
        if course['subject'] in gather:
            num = gather[course['subject']]
            gather[course['subject']] = num+1
        else:
            gather[course['subject']] =1
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-11:-1]
    print(list(reversed(subjectData)))
           
sortedTuples()


[('HIST', 498), ('PSYC', 417), ('NEUR', 403), ('BCHM', 296), ('PHYS', 288), ('HS', 274), ('COSI', 272), ('MUS', 266), ('ENG', 265), ('BCBP', 263)]


In [17]:
#Aarthi 5f
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
    #for course in courses:
    gather = dict()
    for course in courses:
        if course['instructor'] in gather:
            num = gather[course['instructor']]
            gather[course['instructor']] = num+1
        else:
            gather[course['instructor']] =1
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-11:-1]
    print(list(reversed(subjectData)))
        
           
sortedTuples()


[(('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'), 49), (('Leslie Claire', 'Griffith', 'griffith@brandeis.edu'), 49), (('Paul', 'DiZio', 'dizio@brandeis.edu'), 47), (('Seth', 'Fraden', 'fraden@brandeis.edu'), 47), (('Sacha', 'Nelson', 'nelson@brandeis.edu'), 47), (('John', 'Lisman', 'lisman@brandeis.edu'), 46), (('Robert W', 'Sekuler', 'sekuler@brandeis.edu'), 45), (('Donald B.', 'Katz', 'dbkatz@brandeis.edu'), 45), (('Michael', 'Rosbash', 'rosbash@brandeis.edu'), 45), (('Arthur', 'Wingfield', 'wingfiel@brandeis.edu'), 44)]


In [22]:
#Aarthi 5g
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
    #for course in courses:
    gather = dict()
    for course in courses:
        if course['instructor'] in gather:
            num = gather[course['instructor']]
            gather[course['instructor']] = num+course['enrolled']
        else:
            gather[course['instructor']] =course['enrolled']
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-21:-1]
    print(list(reversed(subjectData)))
sortedTuples()

[(('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'), 583), (('Stephanie', 'Murray', 'murray@brandeis.edu'), 515), (('Milos', 'Dolnik', 'dolnik@brandeis.edu'), 489), (('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), 450), (('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu'), 439), (('Rachel V.E.', 'Woodruff', 'woodruff@brandeis.edu'), 422), (('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), 411), (('Daniel', 'Breen', 'dbreen91@brandeis.edu'), 375), (('Melissa', 'Kosinski-Collins', 'kosinski@brandeis.edu'), 365), (('Claudia', 'Novack', 'novack@brandeis.edu'), 355), (('Antonella', 'DiLillo', 'dilant@brandeis.edu'), 342), (('Jon', 'Chilingerian', 'chilinge@brandeis.edu'), 330), (('Ahmad', 'Namini', 'anamini@brandeis.edu'), 327), (('Iraklis', 'Tsekourakis', 'tsekourakis@brandeis.edu'), 316), (('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu'), 315), (('Peter', 'Mistark', 'pmistark@brandeis.edu'), 277), (('Brenda', 'Anderson', 'banders@brandeis.edu'), 275), (('Colleen', 'Hitchcock', 'hitchcock@

In [33]:
#Aarthi 5h
def topTwenty():
    gather = dict()
    for course in courses:
        dict1 =dict()
        if course['subject'] in gather:
            if course['coursenum'] in gather[course['subject']]:
                dict1 = gather[course['subject']]
                num = dict1[course['coursenum']]+course['enrolled']
                dict1[course['coursenum']] = num
                gather[course['subject']] = dict1
            else:
                dict1[course['coursenum']] = course['enrolled']
                gather[course['subject']] = dict1
        else:
            dict1[course['coursenum']] = course['enrolled']
            gather[course['subject']] = dict1
    #for subject in gather:
    print(gather)
topTwenty()

{'HRNS': {'coursenum': 3}, 'BUS': {'coursenum': 6}, 'ECON': {'coursenum': 14}, 'FIN': {'coursenum': 6}, 'HS': {'coursenum': 32}, 'ANTH': {'coursenum': 6}, 'CHEM': {'coursenum': 21}, 'PHIL': {'coursenum': 5}, 'HIST': {'coursenum': 3}, 'PSYC': {'coursenum': 6}, 'WGS': {'coursenum': 1}, 'ENG': {'coursenum': 1}, 'POL': {'coursenum': 7}, 'BIOL': {'coursenum': 12}, 'NBIO': {'coursenum': 29}, 'CHIN': {'coursenum': 19}, 'MATH': {'coursenum': 5}, 'MUS': {'coursenum': 4}, 'SOC': {'coursenum': 13}, 'BCBP': {'coursenum': 10}, 'BIOT': {'coursenum': 3}, 'ED': {'coursenum': 19}, 'CLAS': {'coursenum': 39}, 'COMH': {'coursenum': 0}, 'COSI': {'coursenum': 9}, 'GRK': {'coursenum': 6}, 'LAT': {'coursenum': 8}, 'THA': {'coursenum': 14}, 'PHYS': {'coursenum': 12}, 'NEJS': {'coursenum': 0}, 'NEUR': {'coursenum': 1}, 'PMED': {'coursenum': 0}, 'ESL': {'coursenum': 6}, 'BUS/FIN': {'coursenum': 9}, 'BUS/ECON': {'coursenum': 16}, 'ECON/FIN': {'coursenum': 24}, 'HS/POL': {'coursenum': 6}, 'HWL': {'coursenum': 784}

In [20]:
#Aarthi 5i: Create Your Own Question
#What is the mean number of students taking COSI courses last year?

import statistics
def meanSize():
    coursesize =[]
    
    for course in courses:
       coursesize.append(course['enrolled'])
       coursesize.sort()
    print((statistics.mean(coursesize)))
    
meanSize()


6.009983361064892
