# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json
import statistics

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1123]

# PA01 - Python Data Analysis I
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* how many faculty taught COSI courses last year?
* what is the total number of students taking COSI courses last year?
* what was the median size of a COSI course last year (counting only those courses with at least 10 students)
* create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.
* do the same as in (d) but print the top 10 subjects in terms of number of courses offered
* do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject
* list the top 20 faculty in terms of number of students they taught
* list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)
* Create your own interesting question (each team member creates their own) and use Python to answer that question.

In [None]:
#Gillian
#how many faculty taught COSI courses last year
#used set comprehension to find unique faculty who taught COSI courses
num_cosi_instructors = len({c['instructor'] for c in courses if (c['subject'] == 'COSI')})
print(num_cosi_instructors)

In [None]:
#Gillian
#total number of students taking COSI courses last year
num_cosi_students = sum([e['enrolled'] for e in courses if e['subject'] == 'COSI'])
print(num_cosi_students)

In [None]:
# Gillian
#median size of COSI course last year
median = statistics.median(e['enrolled'] for e in courses if (e['subject'] == 'COSI') and (e['enrolled'] > 10))
print(median)

In [None]:
# Gillian
#create a tuple list (Enrolled,Subject), sort it and print the tuple containing top 10
#created empty lists and added elements into them
#created tuple by adding elements from each list and sorted it to get top 10
subject_list = list(all_subjects)
enrolled_by_subject = []
es_list = []
for s in subject_list:
    enrolled_by_subject.append(sum([e['enrolled'] for e in courses if e['subject'] == s]))
for i in range(len(subject_list)):
    es_list.append((enrolled_by_subject[i], subject_list[i]))
es_list.sort(reverse = True)
print(es_list[:10])

In [None]:
# Gillian
#create a tuple with top 10 subjects in terms of number of courses offered
#used similar approach by creating empty list for subjects and number of courses offered
#created tuple by adding elements from these list and sorted it to get the top 10
names_by_subject = []
ns_list = []
for s in subject_list:
    names_by_subject.append(len({n['name'] for n in courses if n['subject'] == s}))
for i in range(len(subject_list)):
    ns_list.append((names_by_subject[i], subject_list[i]))
ns_list.sort(reverse = True)
print(ns_list[:10])

In [None]:
#Anjola 
#create a tuple of top 10 subjects in terms of number of faculty teaching courses in that subject
#used empty list for faculty and subject
#created tuple by adding elements from these list and sorted it to get the top 10
faculty_by_subject = []
fs_list = []
for s in subject_list:
    faculty_by_subject.append(len({f['instructor'] for f in courses if f['subject'] == s}))
for i in range(len(subject_list)):
    fs_list.append((faculty_by_subject[i], subject_list[i]))
fs_list.sort(reverse = True)
print(fs_list[:10])

In [None]:
#Anjola
#list the top 20 faculty in terms of number of students they taught
#used set comprehension to get unique faculty list
#created empty list for enrolled students in courses 
#created tuple by adding elements from the enrolled list and faculty list
#sorted the tuple and used for loop and indexing to access only faculty name
all_faculty_list= list({f['instructor'] for f in courses}) 
top_fs_list = []
enrolled_by_subject = []
list_of_faculty = []
for faculty in all_faculty_list:
    enrolled_by_subject.append(sum([e['enrolled'] for e in courses if e['instructor'] == faculty]))
for i in range(len(all_faculty_list)):
    top_fs_list.append((enrolled_by_subject[i],all_faculty_list[i]))
top_fs_list.sort(reverse = True)
for f in top_fs_list[:20]:
    list_of_faculty.append(f[1][0]+" " + f[1][1])
print(list_of_faculty)

In [None]:
#Anjola 
#list the top 20 courses in terms of number of students taking that course
#(where you combine different sections and semesters,
#i.e. just use the subject and coursenum)
#Similar approach as previous section
#created empty list, created tuple by adding elements, sorting the tuple, and using index to just access top 20 courses
course_num_list = list({number['name'] for number in courses})
student_enrolled_subject_coursenum = []
tuple_top_course_list = []
top_course_list = []
for number in course_num_list:
    student_enrolled_subject_coursenum.append(sum([course['enrolled'] for course in courses if course['name'] == number]))

for i in range(len(course_num_list)):
    tuple_top_course_list.append((student_enrolled_subject_coursenum[i],course_num_list[i]))

tuple_top_course_list.sort(reverse = True)
for f in tuple_top_course_list[:20]:
    top_course_list.append(f[1])
    
print(top_course_list)

In [None]:
# find courses that has top 20 highest waitlist
# this will allow us to know what we need to start expanding and create more sections
# #created empty list, created tuple by adding elements, sorting the tuple, by number of students in waitlist 
course_list = list({number['name'] for number in courses})
student_waitlist_course = []
tuple_top_course_list = []
top_course_list = []
for name in course_num_list:
    student_waitlist_course.append(sum([course['waiting'] for course in courses if course['name'] == name]))

for i in range(len(course_num_list)):
    tuple_top_course_list.append((student_waitlist_course[i],course_num_list[i]))
tuple_top_course_list.sort(reverse = True)
print(tuple_top_course_list[:20])
