# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [43]:
import json

In [44]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [45]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [46]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [47]:
print('notice that the instructor and code are tuples now')
courses[1246]


notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [48]:
subjects = []
[subjects.append(course['subject']) for course in courses if course.get('subject') not in subjects]
print(subjects)

In [49]:
terms = []
[terms.append(course['term']) for course in courses if course.get('term') not in terms]
print(terms)

In [50]:
instructors = []
[instructors.append(course['instructor']) for course in courses if course.get('instructor') not in instructors]
print(instructors)

In [51]:
# Elizabeth 5i
consent = []
[consent.append(course['name']) for course in courses if 'Consent' in course['status_text']]
print(consent[:10])

In [52]:
# Elizabeth 5i
noconsent = []
[noconsent.append(course['name']) for course in courses if 'Consent' not in course['status_text']]
print(noconsent[:10])

In [53]:
#Leora 5i - Total Number of students taking Math Courses total and in 2020
a =set(course[('instructor')] for course in courses if "MATH" in course['subject'])
b =tuple(course for course in courses if "MATH" in course['subject'])
c = set(course['instructor'] for course in b if '20' in course['coursenum'])
len(a)
print("There are: "+ str(len(a))+" Math Teachers in total")
print("There are: "+ str(len(c))+" Math Teachers teaching in 2020")


In [54]:
#Aarthi 5c
import statistics
def medianSize():
    coursesize =[]

    for course in courses:
       coursesize.append(course['enrolled'])
       coursesize.sort()
    print((statistics.median(coursesize)))
    
medianSize()    

In [55]:
#Aarthi 5d
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
   #for course in courses:
    gather = dict()
    for course in courses:
        if course['subject'] in gather:
            num = gather[course['subject']]
            gather[course['subject']] = num+course['enrolled']
        else:
            gather[course['subject']] = course['enrolled']
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-11:-1]
    print(list(reversed(subjectData)))
           
sortedTuples()

In [56]:
#Aarthi 5e
#Leora helped figure this out. This was a team effort
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
   #for course in courses:
    gather = dict()
    for course in courses:
        if course['subject'] in gather:
            num = gather[course['subject']]
            gather[course['subject']] = num+1
        else:
            gather[course['subject']] =1
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-11:-1]
    print(list(reversed(subjectData)))
           
sortedTuples()


In [57]:
#Aarthi 5f
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
    #for course in courses:
    gather = dict()
    for course in courses:
        if course['instructor'] in gather:
            num = gather[course['instructor']]
            gather[course['instructor']] = num+1
        else:
            gather[course['instructor']] =1
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-11:-1]
    print(list(reversed(subjectData)))
        
           
sortedTuples()


In [58]:
#Aarthi 5g
def sortedTuples():
    subjectList = []
    numCoursesList = []
  
   
    #for course in courses:
    gather = dict()
    for course in courses:
        if course['instructor'] in gather:
            num = gather[course['instructor']]
            gather[course['instructor']] = num+course['enrolled']
        else:
            gather[course['instructor']] =course['enrolled']
    subjects = list(gather)
    courseNum =[]
    for subject in gather:
        courseNum.append(gather[subject])
    subjectData = sorted((list(zip(subjects, courseNum))), key=lambda x:x[1])[-21:-1]
    print(list(reversed(subjectData)))
sortedTuples()

In [59]:
#Leora 5h
def topTwenty():
    gather = dict()
    for course in courses:
        dict1 =dict()
        if course['subject'] in gather:
            if course['coursenum'] in gather[course['subject']]:
                dict1 = gather[course['subject']]
                num = dict1[course['coursenum']]+course['enrolled']
                dict1[course['coursenum']] = num
                gather[course['subject']] = dict1
            else:
                dict1 = gather[course['subject']]
                dict1[course['coursenum']] = course['enrolled']
                gather[course['subject']] = dict1
        else:
            if len(gather)==0:
                dict1[course['coursenum']] = course['enrolled']
                gather[course['subject']] = dict1
            else:
                dict1[course['coursenum']] = course['enrolled']
                gather[course['subject']] = dict1
    minNum = []
    courseNum = []
    for dicter in gather:
        #print(gather[dicter])
        for courser in gather[dicter]:
            courseNum.append(courser)
            minNum.append(gather[dicter][courser])
            #courseNum.append(list(gather[dicter][courser]))
        #courseNum.append(minNum)
    subjectData = sorted((list(zip(courseNum, minNum))), key=lambda x:x[1])[-21:-1]
    print(list(reversed(subjectData)))
topTwenty()


# This is Markdown

I wrote a markdown

* yes a markdown

In [60]:
#Aarthi 5i: Create Your Own Question
#What is the mean number of students taking COSI courses last year?

import statistics
def meanSize():
    coursesize =[]
    
    for course in courses:
       coursesize.append(course['enrolled'])
       coursesize.sort()
    print((statistics.mean(coursesize)))
    
meanSize()
