# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [None]:
import json

In [None]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [None]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [None]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [None]:
print('notice that the instructor and code are tuples now')
courses[1246]

# Question 5: explore the data set

Interested topics:
* how many faculty taught COSI courses last year?
* what is the total number of students taking COSI courses last year?
* what was the median size of a COSI course last year (counting only those courses with at least 10 students)
* create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.
* do the same as in (d) but print the top 10 subjects in terms of number of courses offered
* do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject
* list the top 20 faculty in terms of number of students they taught
* list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)
* Create your own interesting question (each team member creates their own) and use Python to answer that question.
    * What are the top 10 popular COSI courses (measure by amount of enrolled students)?
    * How many students began to learn how to code last year (i.e. how many students took COSI 10a last year)?
    * Which courses had a waitlist greater than 15 people?
    * Which course had the biggest waitlist?
    * How many courses are offeredon Mondays and Wednesdays?
    


# a. How many faculty taught COSI courses last year?

In [None]:
res = {course['instructor'] for course in courses if course['subject'] == 'COSI'}
print(len(res))

# b. What is the total number of students taking COSI courses last year?

In [None]:
COSI_Students = [course['enrolled'] for course in courses if course['subject'] == 'COSI']

print (sum(COSI_Students))


# c. What was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [None]:
from statistics import median

res = median(course['enrolled'] for course in courses if course['subject'] == 'COSI' and course['enrolled'] >= 10)
print(res)

# d. Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.

In [None]:
a = {c['subject'] for c in courses}
l = []
for i in a:
    b = [c['enrolled'] for c in courses if c['subject'] == i]
    tot = 0
    for j in b:
        tot += j
    l.append((i, tot))
courses_sorted = sorted(l, key = lambda course: -course[1])
print(courses_sorted[:10])

# e. Do the same as in (d) but print the top 10 subjects in terms of number of courses offered

In [None]:
a = {c['subject'] for c in courses}
l = []
for i in a:
    l.append((i, len({c['coursenum'] for c in courses if c['subject'] == i})))
courses_c = sorted(l, key = lambda course: -course[1])
courses_c[:10]

# f. Do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject

In [None]:
a = {c['subject'] for c in courses}
l = []
for i in a:
    l.append((i, len({c['instructor'] for c in courses if c['subject'] == i})))
courses_faculty = sorted(l, key = lambda course: -course[1])
courses_faculty[:10]

# g. List the top 20 faculty in terms of number of students they taught

In [None]:
from collections import defaultdict

tmp = defaultdict(int)
for course in courses:
    tmp[course['instructor']] += course['enrolled']
res = list(((k[0], k[1]), tmp[k]) for k in tmp)
res.sort(key=lambda x: -x[1])
print([item[0] for item in res][:20])

# h. list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)

In [None]:
from collections import defaultdict

tmp = defaultdict(int)
for course in courses:
    tmp[course['code']] += course['enrolled']
res = list(tmp.items())
res.sort(key=lambda x: -x[1])
print([item[0] for item in res][:20])

# i. Create your own interesting question (each team member creates their own) and use Python to answer that question.

What are the top 10 popular COSI courses (measure by amount of enrolled students)

In [None]:
from collections import defaultdict

tmp = defaultdict(int)

for course in courses:
    if course['subject'] == 'COSI' :
        key = course['coursenum'] + ": " + course['name']
        val = course['enrolled']
        tmp[key] = val
res = list(tmp.items())
res.sort(key=lambda x: -x[1])
print(res[:10])

How many students began to learn how to code last year (i.e. how many students took COSI 10a last year)?

In [None]:
totalstudents=0
for s in courses:
    if s['name']=='Introduction to Problem Solving in Python':
        totalstudents+=s['enrolled']
print (totalstudents)

Which courses had a waitlist greater than 15 people?

In [None]:
waitlist={course['name'] for course in courses if course['waiting']>15}
print (waitlist)

Which course had the biggest waitlist?

In [None]:
courseName=""
nStudents=0
for course in courses:
    if course['waiting'] > nStudents:
        courseName=course['subject']+course['coursenum']+ ": " +course['name']
        nStudents=course['waiting']
print (courseName+" had the biggest waitlist of "+str(nStudents)+" students.")


Which course has the greatest enrollement limit?

In [None]:
name=''
eLimit=0
for course in courses:
    if course['limit']!= None and course['limit']>eLimit:
        eLimit=course['limit']
        name=course['subject']+course['coursenum']+ ": " +course['name']
print (name+" had the greatest enrollment limit of "+str(eLimit))

How many courses are offeredon Mondays and Wednesdays?

In [None]:
mw={course['name'] for course in courses if len(course['times'])>0 and'm' in course['times'][0]['days'] and 'w' in course ['times'][0]['days']}
print (len(mw))
