# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json
from schedule import *
from course_search import *

getting archived regdata from file


In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

# Which terms are represented?

In [6]:
terms = {c['term'] for c in courses}
print(terms)

{'1203', '1211', '1212'}


# What are all the subjects

In [7]:
subjects = {c['subject'] for c in courses}
print("There are " + str(len(subjects)) + " subjects. They are:")
print(subjects)

There are 120 subjects. They are:
{'AMST', 'RUCD', 'EAS', 'ENVS', 'AAPI/HIS', 'POL', 'BUS', 'CAST', 'ECON/FIN', 'BUS/FIN', 'KOR', 'ANTH/WGS', 'SJSP', 'HISP', 'CBIO', 'ECON', 'IMES', 'CHSC', 'HS/POL', 'HUM/UWS', 'RECS', 'YDSH', 'FREN', 'ESL', 'RIAS', 'HIST', 'HRNS', 'HBRW', 'RCOM', 'GECS', 'HS', 'ITAL', 'LAT', 'AAPI/WGS', 'RDFT', 'JOUR', 'LGLS', 'AMST/MUS', 'COMP', 'IGS', 'BUS/ECON', 'CHIN', 'GRK', 'HIST/SOC', 'RPJM', 'PSYC', 'JAPN', 'LING', 'NPSY', 'CLAS/ENG', 'RDMD', 'IIM', 'BIOT', 'RMGT', 'PHIL', 'AMST/ENG', 'HUM', 'NEJS', 'BCBP', 'ECS', 'POL/WGS', 'RHIN', 'REL', 'BIPH', 'HOID', 'EL', 'THA', 'ENG', 'CA', 'MUS', 'AAAS', 'ARBC', 'BISC', 'RSAN', 'BIOL', 'CHEM', 'SOC', 'ECS/ENG', 'INT', 'FIN', 'HIST/WGS', 'EBIO', 'ANTH', 'UWS', 'COSI', 'NEUR', 'PMED', 'CLAS', 'RECS/THA', 'AAAS/HIS', 'MATH', 'AAAS/WGS', 'PAX', 'NBIO', 'PHYS', 'WGS', 'ED', 'CLAS/NEJ', 'BCHM', 'LALS', 'MERS', 'FILM', 'COMH', 'GER', 'QBIO', 'HSSP', 'AAS/AAPI', 'QR', 'HWL', 'SAS', 'RBIF', 'GS', 'FA', 'RUS', 'BIBC', 'COML', 'A

# 5.a: How many instructors taught at Brandeis last year?

In [8]:
# I'm not sure which terms are considered to be last year, so this includes all terms
instructors = {c['instructor'] for c in courses}
print("There are " + str(len(instructors)) + " instuctors.")

There are 904 instuctors.


In [9]:
instructors = {c['instructor'] for c in courses if c['enrolled']>=10}
print("There are " + str(len(instructors)) + " instuctors that taught a class with at least 10 students.")

There are 652 instuctors that taught a class with at least 10 students.


# What are the 5 largest course sections?

In [10]:
largest_courses = sorted(courses, key = lambda course: course['enrolled'], reverse=True)
[(course['enrolled'], course['name']) for course in largest_courses[:5]]

[(784, 'Introduction to Navigating Health and Safety'),
 (186, 'Organic Chemistry I'),
 (186, 'Physiology'),
 (181, 'Cells and Organisms'),
 (180, 'Organic Chemistry II')]

# 5.b: What is the total number of students taking COSI courses last year?

In [11]:
cosi_students_nums = [c['enrolled'] for c in courses if c['subject'] == 'COSI']
sum = 0
for num in cosi_students_nums:
    sum += num
print("There were " + str(sum) + " students taking COSI last year." )

There were 2223 students taking COSI last year.


# 5.c: What was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [12]:
cosi_courses = [c for c in courses if c['subject'] == 'COSI' if c['enrolled'] >= 10]
sortedCC = sorted(cosi_courses, key = lambda course: course['enrolled'])
midIndex = len(sortedCC) // 2
if ( len(sortedCC) % 2 != 0):
    median = sortedCC[midIndex]['enrolled']
else:
    a = sortedCC[midIndex]['enrolled']
    b = sortedCC[midIndex+1]['enrolled']
    median = (a + b) / 2
print("The median size of a COSI course last year was " + str(median) + ".")

The median size of a COSI course last year was 37.


# 5.h: List the top 20 courses in terms of number of students taking that course across semesters and sections

In [54]:
tops = {}
for c in courses:
    name = c['name']
    enrolled = c['enrolled']
    if name in tops:
        tops[name] = tops[name] + enrolled
    else:
        tops[name] = enrolled
sorted_tops = [(k,v) for k, v in sorted(tops.items(), key=lambda x: x[1], reverse=True)]
count = 1
for course in sorted_tops:
    if (count > 20):
        break
    print(str(count) + ". " + course[0] + " - " + str(course[1]))
    count += 1

1. Navigating Health and Safety - 940
2. Introduction to Navigating Health and Safety - 879
3. General Biology Laboratory - 536
4. Dissertation Research - 381
5. Genetics and Genomics - 358
6. Introduction to Problem Solving in Python - 343
7. Introduction to Psychology - 336
8. Cells and Organisms - 287
9. Techniques of Calculus (a) - 280
10. Financial Accounting - 247
11. Organic Chemistry Laboratory I - 245
12. Organic Chemistry Laboratory II - 239
13. Organic Chemistry I - 236
14. Statistics - 231
15. Organic Chemistry II - 226
16. Advanced Programming Techniques in Java - 225
17. General Chemistry Laboratory I - 208
18. Senior Research - 207
19. Introduction to Microeconomics - 207
20. Applied Linear Algebra - 204


# 6.a,b,c: Showing the title, description and custom filter

In [14]:
s = Schedule(courses)
#print(s.courses[:3])
titles = s.title("computer")
descriptions = s.description("human")
ind_studies = s.independent_study_filter(True)
print(titles.courses[0]['name'])
print("-----")
print(descriptions.courses[0]['description'])
print("-----")
print(ind_studies.courses[0]['name'] + " - " + str(ind_studies.courses[0]['independent_study']))

Computer Simulations and Risk Assessment
-----
Meets for one-half semester and yields half-course credit. 

Examines the relation between democracy and development geared for development practitioners and policy-makers. Students will discuss if democracy is essential for sustainable development and, if so, what kinds of democracy should be promoted in developing countries. The major critiques of aid and development theory rooted in secular democracy, free-market economies, and human rights will be explored. Usually offered every year.
Rajesh Sampath
-----
Readings in Jewish Professional Leadership - True
