# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [None]:
#method for finding median of a list
def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2
    return l[half]

In [None]:
faculty_members = set([])
cnt_of_students_in_COSI = 0

median_COSI_courses = [course['enrolled'] for course in courses if course['subject'] == 'COSI' and course['enrolled'] >= 10]
subject_and_students = [tuple([course['enrolled'], course['subject']]) for course in courses if course['enrolled'] >= 10]


for course in courses:
    if course['subject'] == 'COSI':
        cnt_of_students_in_COSI += course['enrolled']
        faculty_members.add(course['instructor'])

dic = {}
for course in courses:
    if course['subject'] in dic:
       dic[course['subject']] = dic[course['subject']] + course['enrolled']
    else: 
        dic.update({course['subject']: course['enrolled']})
        

tupleDic = [(v, k) for k, v in dic.items()]

#print("list = " + str(list))

subject_dic = {}
for course in courses:
    if course['subject'] in subject_dic: 
        subject_dic[course['subject']] += 1
    else:
        subject_dic[course['subject']] = 1

rev_sub_dic = [(v, k) for k, v in subject_dic.items()]

#counts professors only once, using set in the dictionary to maintain uniqueness
instructor_dic = {}
for course in courses:
    if course['subject'] in instructor_dic:
        instructor_dic[course['subject']].add(course['instructor'])
    else:
        instructor_dic[course['subject']] = set([course['instructor']])

instructor_dic = [(len(v), k) for k , v in instructor_dic.items()]

instructor_and_students = {}
for course in courses:
    if course['instructor'] in instructor_and_students:
        instructor_and_students[course['instructor']] += course['enrolled']
    else:
        instructor_and_students[course['instructor']] = course['enrolled']
instructor_and_students = [(v,k[0]+' ' +k[1]) for k , v in instructor_and_students.items()]

top_courses = {}
for course in courses:
    if course['code'] in top_courses:
        top_courses[course['code']] += 1
    else:
        top_courses[course['code']] = 1
        
top_courses = [(v,' '.join(k)) for k , v in top_courses.items()]

In [None]:
print("a. Number of faculty who taught courses last year: "+ str(len(faculty_members)))
print()
print("b. Number of students who enrolled in COSI classes: " + str(cnt_of_students_in_COSI))
print()
print("c. Median number of students enrolled in COSI classes who have an enrollment size of >=10: " + str(median(median_COSI_courses)))
print()
print("d. top 10 subjects in terms of number of students taught: " , sorted(tupleDic, reverse = True)[:10])
print()
print("e. This list contains top 10 most taught subjects for courses last year: ", sorted(rev_sub_dic , reverse = True)[:10])
print()
print("f. This list contains top 10 most taught subjects for courses last year: ", sorted(instructor_dic , reverse = True)[:10])
print()
print("g. top 20 facutlty " , sorted(instructor_and_students, reverse = True)[:20])
print()
print("h. top 20 courses taken by students", sorted(top_courses , reverse = True)[:20])