# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [None]:
len({c['subject'] for c in courses})

In [None]:
len({c['instructor'] for c in courses})

In [None]:
{c['subject']for c in courses}

In [None]:
{c['term']for c in courses}

In [None]:
enrollment = sorted(courses, key = lambda course: -course['enrolled'])
[(courses['enrolled'], courses['name']) for courses in enrollment[:5]]

title(self,phrase) -- filters courses containing the phrase in their title


<h1> 5A: What is the total number of students taking COSI courses last year? </h1>
(Done By John)

In [None]:
total = 0
for x in courses:
    if x['subject'] == "COSI":
        total+=x['enrolled']
print(total)


<h1> 5B: How many faculty taught COSI courses last year? </h1>
(Done By John)

In [None]:
setter = set(())
for x in courses:
    if x['subject'] == "COSI":
        setter.add(x['instructor'])
print(len(setter))


<h1> 5C: What was the median size of a COSI course last year (counting only those courses with at least 10 students) </h1>
(Done By John)

In [None]:
arr = []
for x in courses:
    if x['subject'] == "COSI" and x['enrolled'] >= 10:
        arr.append(x['enrolled'])
arr.sort()
print(arr[len(arr)//2])

<h1> 5D: Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. </h1>
(Done By Fritz)

In [None]:
diction = {}
for x in courses:
    # add the number of people enrolled into a dictionary
    if x['subject'] not in diction:
        diction[x['subject']] = x['enrolled']
    else:
        diction[x['subject']] += x['enrolled']
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), reverse=True, key=lambda item: item[1]))
new_dict = dict([(value, key) for key, value in dictionary.items()])
print(list(new_dict.items())[:10])

<h1> 5E: Do the same as in (d) but print the top 10 subjects in terms of number of courses offered. </h1>
(Done By Fritz)

In [None]:
diction = {}
for x in courses:
    # adding 1 to show the number of courses
    if x['subject'] not in diction:
        diction[x['subject']] = 1
    else:
        diction[x['subject']] += 1
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), reverse=True, key=lambda item: item[1]))
new_dict = dict([(value, key) for key, value in dictionary.items()])
print(list(new_dict.items())[:10])

<h1> 5F: Do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject.</h1>
(Done By Fritz)

In [None]:
diction = {}
for x in courses:
    # adding instructors into a set to remove repeats
    if x['subject'] not in diction:
        diction[x['subject']] = set(x['instructor'])
    else:
        diction[x['subject']].add(x['instructor'])
# setting the values of the dictionary to the length of the set
for key in diction.keys():
    diction[key] = len(diction[key])
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), reverse=True, key=lambda item: item[1]))
new_dict = dict([(value, key) for key, value in dictionary.items()])
print(list(new_dict.items())[:10])

<h1> 5G: List the top 20 faculty in terms of number of students they taught </h1>
(Done By Matthew)

This was me! (Matthew)

In [None]:
faculty = {course['instructor'] for course in courses}
numperProf = [(sum(course['enrolled'] for course in courses if course['instructor'] == c), c) for c in faculty]
numperProf.sort(key = lambda pair: -pair[0])
[(professor, number) for (number, professor) in numperProf[:20]]

<h1> 5H: List the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number) </h1>
(Done By Matthew)

In [None]:
enrollment = sorted(courses, key = lambda course: -course['enrolled'])
[(courses['subject'], courses['coursenum']) for courses in enrollment[:5]]

<h1> Fritz's Interesting Question: Which subjects have the shortest class descriptions (sorted based on how many characters are in the description)? 
Sort and print the top 10 </h1>

In [None]:
diction = {}
for x in courses:
    # adding length of descriptions into a dictionary
    if x['subject'] not in diction:
        diction[x['subject']] = len(x['description'])
    else:
        diction[x['subject']] += len(x['description'])
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), key=lambda item: item[1]))
print(list(dictionary.items())[:10])

<h1> John's Interesting Question: What courses did Professor Edward Bayone teach last year? </h1>

In [None]:
setter = set(())

for x in courses:
    if x['instructor'][1] == "Bayone":
        setter.add(x['subject'] + " " + x['coursenum'] + " " +  x['name'])
print(setter)
        

<h1> Matthew's Interesting question: Only find classes that are open </h1>

In [None]:
[(c['name'], c['status_text']) for c in courses if c['status_text'] == "Open"]

<h1>Pedro's Interesting Question (Choosing two Professors, compare the classes the professor taught?) </h1>

In [None]:
profOne = input("First Professor?")
profTwo = input("Second Professor?")

setterOne = set(())
setterTwo = set(())
if profOne != '':
    for x in courses:
        if x['instructor'][1] == profOne:
            setterOne.add(x['subject'] + " " + x['coursenum'] + " " +  x['name'])
    for x in courses:
        if x['instructor'][1] == profTwo:
            setterTwo.add(x['subject'] + " " + x['coursenum'] + " " + x['name'])
    print(str(profOne) + ": \n" + str(setterOne))

    print(str(profTwo) + ": \n" + str(setterTwo))