# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json
import statistics

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1123]

notice that the instructor and code are tuples now


{'limit': None,
 'times': [],
 'enrolled': 0,
 'details': '',
 'type': 'section',
 'status_text': 'Open',
 'section': '7',
 'waiting': 0,
 'instructor': ('Marcelle', 'Soares-Santos', 'marcelle@brandeis.edu'),
 'coinstructors': (),
 'code': ('PHYS', '280A'),
 'subject': 'PHYS',
 'coursenum': '280A',
 'name': 'Advanced Readings and Research',
 'independent_study': True,
 'term': '1203',
 'description': 'Specific sections for individual faculty members as requested. Usually offered every year.\nStaff'}

# PA01 - Python Data Analysis I
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* how many faculty taught COSI courses last year?
* what is the total number of students taking COSI courses last year?
* what was the median size of a COSI course last year (counting only those courses with at least 10 students)
* create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.
* do the same as in (d) but print the top 10 subjects in terms of number of courses offered
* do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject
* list the top 20 faculty in terms of number of students they taught
* list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)
* Create your own interesting question (each team member creates their own) and use Python to answer that question.

In [None]:
#Gillian
num_cosi_instructors = len({c['instructor'] for c in courses if (c['subject'] == 'COSI')})
print(num_cosi_instructors)

27


In [None]:
#Gillian
num_cosi_students = sum([e['enrolled'] for e in courses if e['subject'] == 'COSI'])
print(num_cosi_students)

2223


In [None]:
# Gillian
median = statistics.median(e['enrolled'] for e in courses if (e['subject'] == 'COSI') and (e['enrolled'] > 10))
print(median)

38.0


In [None]:
# Gillian
subject_list = list(all_subjects)
enrolled_by_subject = []
es_list = []
for s in subject_list:
    enrolled_by_subject.append(sum([e['enrolled'] for e in courses if e['subject'] == s]))
for i in range(len(subject_list)):
    es_list.append((enrolled_by_subject[i], subject_list[i]))
es_list.sort(reverse = True)
print(es_list[:10])

[(5318, 'HS'), (3085, 'BIOL'), (2766, 'BUS'), (2734, 'HWL'), (2322, 'CHEM'), (2315, 'ECON'), (2223, 'COSI'), (1785, 'MATH'), (1704, 'PSYC'), (1144, 'ANTH')]


In [None]:
# Gillian
names_by_subject = []
ns_list = []
for s in subject_list:
    names_by_subject.append(len({n['name'] for n in courses if n['subject'] == s}))
for i in range(len(subject_list)):
    ns_list.append((names_by_subject[i], subject_list[i]))
ns_list.sort(reverse = True)
print(ns_list[:10])

[(170, 'HS'), (67, 'ENG'), (62, 'BIOL'), (58, 'BUS'), (56, 'MUS'), (56, 'ANTH'), (55, 'PSYC'), (53, 'MATH'), (49, 'FA'), (49, 'ECON')]


In [None]:
#Anjola 
#list the top 20 faculty in terms of number of students they taught
faculty_by_subject = []
fs_list = []
for s in subject_list:
    faculty_by_subject.append(len({f['instructor'] for f in courses if f['subject'] == s}))
for i in range(len(subject_list)):
    fs_list.append((faculty_by_subject[i], subject_list[i]))
fs_list.sort(reverse = True)
print(fs_list[:10])

[(87, 'HS'), (67, 'BIOL'), (52, 'ECON'), (49, 'BCHM'), (47, 'HIST'), (47, 'BUS'), (46, 'BCBP'), (42, 'HWL'), (37, 'NEJS'), (37, 'MATH')]


In [None]:
#Anjola
#list the top 20 faculty in terms of number of students they taught

all_faculty_list= list({f['instructor'] for f in courses}) 
top_fs_list = []
enrolled_by_subject = []
list_of_faculty = []
for faculty in all_faculty_list:
    enrolled_by_subject.append(sum([e['enrolled'] for e in courses if e['instructor'] == faculty]))
for i in range(len(all_faculty_list)):
    top_fs_list.append((enrolled_by_subject[i],all_faculty_list[i]))
top_fs_list.sort(reverse = True)
for f in top_fs_list[:20]:
    list_of_faculty.append(f[1][0]+" " + f[1][1])
print(list_of_faculty)

['Leah Berkenwald', 'Kene Nathan Piasta', 'Stephanie Murray', 'Milos Dolnik', 'Maria de Boef Miara', 'Bryan Ingoglia', 'Rachel V.E. Woodruff', 'Timothy J Hickey', 'Daniel Breen', 'Melissa Kosinski-Collins', 'Claudia Novack', 'Antonella DiLillo', 'Jon Chilingerian', 'Ahmad Namini', 'Iraklis Tsekourakis', 'Geoffrey Clarke', 'Peter Mistark', 'Brenda Anderson', 'Colleen Hitchcock', 'Scott A. Redenius']


In [None]:
#Anjola 
#list the top 20 courses in terms of number of students taking that course
#(where you combine different sections and semesters,
#i.e. just use the subject and coursenum)
course_num_list = list({number['name'] for number in courses})
student_enrolled_subject_coursenum = []
tuple_top_course_list = []
top_course_list = []
for number in course_num_list:
    student_enrolled_subject_coursenum.append(sum([course['enrolled'] for course in courses if course['name'] == number]))

for i in range(len(course_num_list)):
    tuple_top_course_list.append((student_enrolled_subject_coursenum[i],course_num_list[i]))

tuple_top_course_list.sort(reverse = True)
for f in tuple_top_course_list[:20]:
    top_course_list.append(f[1])
    
print(top_course_list)

['Navigating Health and Safety', 'Introduction to Navigating Health and Safety', 'General Biology Laboratory', 'Dissertation Research', 'Genetics and Genomics', 'Introduction to Problem Solving in Python', 'Introduction to Psychology', 'Cells and Organisms', 'Techniques of Calculus (a)', 'Financial Accounting', 'Organic Chemistry Laboratory I', 'Organic Chemistry Laboratory II', 'Organic Chemistry I', 'Statistics', 'Organic Chemistry II', 'Advanced Programming Techniques in Java', 'General Chemistry Laboratory I', 'Senior Research', 'Introduction to Microeconomics', 'Applied Linear Algebra']


In [20]:
# Which courses has highest waitlist: this will allow us to know what we need to start expanding and creating more sections for
course_list = list({number['name'] for number in courses})
student_waitlist_course = []
tuple_top_course_list = []
top_course_list = []
for name in course_num_list:
    student_waitlist_course.append(sum([course['waiting'] for course in courses if course['name'] == name]))

for i in range(len(course_num_list)):
    tuple_top_course_list.append((student_waitlist_course[i],course_num_list[i]))
tuple_top_course_list.sort(reverse = True)
print(tuple_top_course_list[:20])


[(54, 'Human Anatomy'), (48, 'Biostatistics'), (32, 'Personal Safety/Self-Defense'), (30, 'Yoga'), (26, 'Introduction to Cognitive Neuroscience'), (23, 'The Biology of Morality'), (23, 'Money Management for Beginners'), (22, 'Statistics for Economic Analysis'), (21, 'Introduction to Probability and Statistics'), (19, 'Writing in Economics Practicum'), (18, 'Biomedical Ethics'), (15, 'Econometrics'), (15, 'Darwinian Dating: The Evolution of Human Attraction'), (14, 'Data Management for Data Science'), (13, 'Yogalates: A Fusion of Yoga/Pilates'), (13, 'The Cosmos'), (13, 'Sculpture Foundation: 3-D Design II'), (13, 'Cardio Workout'), (12, 'Oceanography'), (12, 'Morality and Capitalist Society')]
