# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json
import statistics

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [None]:
#A. how many faculty taught COSI courses last year

len({course['instructor'] for course in courses if course['subject']=='COSI'})

In [None]:
#B. what is the total number of students taking COSI courses last year?

cosi=[course['enrolled'] for course in courses if course['subject']=="COSI"]

sum(cosi)



In [None]:
#C. what was the median size of a COSI course last year (counting only those courses with at least 10 students)

cosi=[course['enrolled'] for course in courses if course['subject']=="COSI" and course['enrolled'] >= 10]


statistics.median(cosi)

In [None]:
#D. create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that 
#   subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.

subject = [course['subject'] for course in courses]

'''initialize a new dictionary'''
new_dict = dict.fromkeys(subject, 0)

'''record the number of students enrolled in the subject'''
for course in courses:
    new_dict[course['subject']] += course['enrolled']

'''making the dictionary to a list of tuples (E, S) where S is a subject and E is the number of students enrolled in
 courses in that subject'''
students_in_subjects = (list(new_dict.items()))

'''sorting the list by the number of students enrolled in the subject'''
top_10 = sorted(students_in_subjects, key = lambda x: x[1], reverse = True)[:10]

for course in top_10:
    print(course)

In [None]:
#E. do the same as in (d) but print the top 10 subjects in terms of number of courses offered

subject = [course['subject'] for course in courses]

new_dict = dict.fromkeys(subject, 0)

'''record the number of courses offered in that subject'''
for course in courses:
    new_dict[course['subject']] += 1

courses_in_subjects = (list(new_dict.items()))

'''sorting the list by the number of courses offered in the subject'''
top_10 = sorted(courses_in_subjects, key = lambda x: x[1], reverse = True)[:10]

for course in top_10:
    print(course)

In [None]:
#F. do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject

subject = [course['subject'] for course in courses]

new_dict = dict.fromkeys(subject,0)

'''record the number of faculty teaching courses in that subject'''
for subject_item in new_dict:
    new_dict[subject_item]+=len({course['instructor'] for course in courses if course['subject']==subject_item})
    
courses_in_subjects = (list(new_dict.items()))

'''sorting the list by the number of courses offered in the subject'''
top_10 = sorted(courses_in_subjects, key = lambda x: x[1], reverse = True)[:10]

for course in top_10:
    print(course)

In [None]:
#G. list the top 20 faculty in terms of number of students they taught

instructor = (course['instructor'] for course in courses) 

new_dict = dict.fromkeys(instructor, 0)

for course in courses:
    new_dict[course['instructor']] += course['enrolled'] 

top_20 = sorted(new_dict.items(), key = lambda x: x[1], reverse = True)[:20]

for course in top_20:
    print (course[0])

In [None]:
# H. list the top 20 courses in terms of number of students taking that course (where you 
#    combine different sections and semesters, i.e. just use the subject and course number

code = (course['code'] for course in courses)

new_dict = dict.fromkeys(code, 0)

for course in courses:
    new_dict[course['code']] += course['enrolled']

top_20 = sorted(new_dict.items(), key = lambda x: x[1], reverse = True)[:20]

for course in top_20:
    print(course[0])

In [None]:
#I. Create your own interesting question (each team member creates their own) and use Python
#   to answer that question.
#
#   The top 10 courses mearsured by the number of students in the waiting list -- Bohan 

top_10 = sorted(courses, key = lambda x: x['waiting'], reverse = True)[:10]
for course in top_10:
    print(course['code'])

In [None]:
#    list the top 10 faculty in terms of number of courses they taught -- Charlotte 

instructor = [course['instructor'] for course in courses]

new_dict = dict.fromkeys(instructor,0)

'''record the number of course taught by certain instructor'''
for course in courses:
    new_dict[course['instructor']]+=1
    
instructor_course_num = (list(new_dict.items()))

'''sorting the list by the number of courses taught by the instructor'''
top_10 = sorted(instructor_course_num, key = lambda x: x[1], reverse = True)[:10]

for course in top_10:
    print(course)

In [None]:
## i: how many faculty taught Math courses last year(unique)?(part i)

len({course['instructor'] for course in courses if course['subject']=='MATH'})



## This is a markdown file