# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [142]:
import json

In [143]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [144]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [145]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [146]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

## 5a How many faculty taught COSI courses last year?

In [147]:
list = {c['instructor'] for c in courses if c['subject']=='COSI' if c['enrolled']>0} 
len(list)

25

## 5b What is the total number of students taking COSI courses last year?

In [148]:
list = {c['enrolled'] for c in courses if c['subject']== 'COSI' if c['enrolled'] > 0}
count= 0
for i in list:
    count = count + i
print (count)

1950


## 5i Tenzin - A list of open courses in COSI

In [149]:
{c['name'] for c in courses if c['status_text']=='Closed' if c['subject']=='COSI' if c['enrolled']>10}

{'Capstone Project for Software Engineering',
 'Data Management for Data Science',
 'Deep Learning',
 'Discrete Structures',
 'Fundamentals of Artificial Intelligence',
 'Introduction to Problem Solving in Python',
 'Operating Systems'}

## 5c. 
what was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [150]:
import statistics
list_en = {c['enrolled'] for c in courses if c['subject']=='COSI' if c['enrolled']>=10}
print(statistics.median(list_en))


45.5


## 5d. 
Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.

In [151]:
sizes_dict = {}
for course in courses:
    sizes_dict[course['subject']] = (sizes_dict[course["subject"]] + course["enrolled"]) if course['subject'] in sizes_dict else 0 
    
courses_d = []
for subject_name, subject_size in sizes_dict.items():
    courses_d.append([subject_name, subject_size]) #map subject name to its size.eg,. {Chem, 1}

courses_by_size = sorted(courses_d, key = lambda course:  - course[1])
tuple([(course[1], course[0]) for course in courses_by_size[:11]])

((5298, 'HS'),
 (3075, 'BIOL'),
 (2747, 'BUS'),
 (2717, 'HWL'),
 (2315, 'ECON'),
 (2289, 'CHEM'),
 (2223, 'COSI'),
 (1775, 'MATH'),
 (1693, 'PSYC'),
 (1144, 'ANTH'),
 (1105, 'ENG'))

# 5g and 5h - Gisel

In [152]:
#Questions 5g and 5h below
def get_my_key(courses):
    return courses['enrolled']

sortedFile = sorted(courses, key = lambda course:  - course['enrolled'])

topFaculty = []
for s in sortedFile[:21]:
    topFaculty.append(s['instructor'])
print(topFaculty) 

topCourses = []
for c in sortedFile[:21]:
    topCourses.append(c['name'])
print(topCourses) 

[('Leah', 'Berkenwald', 'leahb@brandeis.edu'), ('Stephanie', 'Murray', 'murray@brandeis.edu'), ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), ('Stephanie', 'Murray', 'murray@brandeis.edu'), ('Stuart', 'Altman', 'altman@brandeis.edu'), ('Anne S', 'Berry', 'anneberry@brandeis.edu'), ('Ellen J', 'Wright', 'ejwright@brandeis.edu'), ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), ('Daniel', 'Breen', 'dbreen91@brandeis.edu'), ('Dan L', 'Perlman', 'perlman@brandeis.edu'), ('Colleen', 'Hitchcock', 'hitchcock@brandeis.edu'), ('Teresa Vann', 'Mitchell', 'tmitch@brandeis.edu'), ('Jennifer', 'Gutsell', 'jgutsell@brandeis.edu'), ('Peter', 'Mistark', 'pmistark@brandeis.edu'), ('Peter', 'Mistark', 'pmistark@brandeis.edu'), ('Michael Thomas', 'Marr', 'mmarr@brandeis.edu'), ('Paul', 'DiZio', 'dizio@brandeis.edu'), ('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu'), ('Michael', 'Strand', 'mstrand@b

# 5i - Gisel

Interesting Question - What coures have zero students enrolled?

In [4]:
zeroEnrolled = [c['name'] for c in courses if c['enrolled']== 0]

for x in zeroEnrolled:
    print(x)

SyntaxError: invalid syntax (3883169933.py, line 1)

## 5i - Sanjna Calculate mean of enrollment for bio

In [154]:
import statistics
list = [c['enrolled'] for c in courses if c['subject']=='BIOL' if c['enrolled']<=10]
print(statistics.mean(list))


0.7247191011235955


## 5i - Sampada List all BUS classes

In [155]:
list_bus = [c['name'] for c in courses if c['subject']=='BUS']
print(list_bus)

['Global Dexterity', 'Information Visualization', 'Sales and Sales Management', 'Digital Fabrication with Robotics', 'Python and Applications to Business Analytics', 'Analyzing Big Data I', 'Independent Study', 'Python and Applications to Business Analytics II', 'Fundamentals of Organizational Behavior', 'Real Estate Fundamentals', 'Information Visualization', 'Digital Marketing', 'Python and Applications to Business Analytics', 'Launching Your Global Career', 'Launching Your Global Career', 'Analyzing Big Data I', 'Corporate Governance: From Colossal Failures to Best Practices', 'Launching Your Global Career', 'Analyzing Big Data II', 'Entrepreneurship', 'Alliance, Acquisition, and Divestment Strategy', 'Business Dynamics: Managing in a Complex World', 'Field Project: Social Impact Innovation', 'Leadership Internships in Social Impact Organizations', 'Consumer Behavior', 'Marketing Management', 'Marketing Analytics', 'Independent Study', 'MBA Career Strategy and Management Communicati

## 5i - Jason list all courses Prof. Hickey Teaches

In [156]:
list_hickey = [c['name'] for c in courses if 'tjhickey@brandeis.edu' in c['instructor']]
print(list_hickey)

['Readings', 'Independent Study', "Master's Project", 'Dissertation Research', "Master's Research Internship", 'Independent Study', 'Senior Research', 'Research Internship and Analysis', 'Introduction to Problem Solving in Python', 'Readings', 'Independent Study', "Master's Research Internship", "Master's Project", 'Dissertation Research', 'Senior Research', 'Independent Study', 'Research Internship and Analysis', 'Introduction to 3-D Animation', 'Introduction to Problem Solving in Python', 'Web Application Development', 'Mobile Application Development']


#### 5e

Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10 subjects in terms of number of courses offered. 

(# of courses offered per subject) 
Subject with highest number of courses


In [157]:
sizes_dict = {}
for course in courses:
    sizes_dict[course["subject"]] = (sizes_dict[course["subject"]] + 1) if course['subject'] in sizes_dict else 0 
    #Create a counter  with subjects as keys ie {Biol: 1} and then, then use a sorter to get the b top ten in a for loop
    # and then use that to create a tuples
courses_1 = []
for subject_name, subject_size in sizes_dict.items():
    courses_1.append([subject_name, subject_size])
subjects_by_size = sorted(courses_1, key = lambda subject:- subject[1])
tuple([(course[1], course[0]) for course in subjects_by_size [1:11]])


((497, 'HIST'),
 (416, 'PSYC'),
 (402, 'NEUR'),
 (295, 'BCHM'),
 (287, 'PHYS'),
 (273, 'HS'),
 (271, 'COSI'),
 (265, 'MUS'),
 (264, 'ENG'),
 (262, 'BCBP'))

#### 5f

Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10 subjects in terms of number of faculty teaching courses in that subject


In [158]:
sizes_dict = {}
for course in courses:
    sizes_dict[course["instructor"]] = (sizes_dict[course["instructor"]] + 1) if course["instructor"] in sizes_dict else 0   
courses = []
for subject_name, subject_size in sizes_dict.items():
    courses.append([subject_name, subject_size])
subjects_by_size = sorted(courses, key = lambda subject:- subject[1])
tuple([(course[1], course[0]) for course in subjects_by_size [1:11]])


((48, ('Leslie Claire', 'Griffith', 'griffith@brandeis.edu')),
 (48, ('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu')),
 (46, ('Sacha', 'Nelson', 'nelson@brandeis.edu')),
 (46, ('Seth', 'Fraden', 'fraden@brandeis.edu')),
 (46, ('Paul', 'DiZio', 'dizio@brandeis.edu')),
 (45, ('John', 'Lisman', 'lisman@brandeis.edu')),
 (44, ('Michael', 'Rosbash', 'rosbash@brandeis.edu')),
 (44, ('Donald B.', 'Katz', 'dbkatz@brandeis.edu')),
 (44, ('Robert W', 'Sekuler', 'sekuler@brandeis.edu')),
 (43, ('Arthur', 'Wingfield', 'wingfiel@brandeis.edu')))