# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json
from schedule import *
from course_search import *

getting archived regdata from file


In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Which terms are represented?

In [6]:
terms = {c['term'] for c in courses}
print(terms)

{'1211', '1212', '1203'}


# What are all the subjects

In [7]:
subjects = {c['subject'] for c in courses}
print("There are " + str(len(subjects)) + " subjects. They are:")
print(subjects)

There are 120 subjects. They are:
{'ECON', 'THA', 'PAX', 'CLAS/NEJ', 'ENG', 'YDSH', 'AAPI/HIS', 'HS', 'BIOL', 'JAPN', 'NPSY', 'CA', 'HUM/UWS', 'HS/POL', 'QR', 'POL/WGS', 'HUM', 'GER', 'RHIN', 'IIM', 'LAT', 'MATH', 'MERS', 'BUS/FIN', 'QBIO', 'RECS/THA', 'RDFT', 'LGLS', 'CHSC', 'WGS', 'RSAN', 'ECON/FIN', 'UWS', 'BCBP', 'INT', 'HSSP', 'FILM', 'BCHM', 'SAS', 'PHYS', 'ECS/ENG', 'BUS', 'REL', 'NEUR', 'AMST/MUS', 'BIOT', 'RIDT', 'RMGT', 'HIST', 'ED', 'CAST', 'RBOT', 'SJSP', 'AAPI', 'POL', 'CHEM', 'COSI', 'JOUR', 'HIST/SOC', 'ENVS', 'IGS', 'RDMD', 'ITAL', 'ANTH', 'BIPH', 'RUS', 'RUCD', 'FIN', 'GS', 'NBIO', 'AAPI/WGS', 'AAAS/WGS', 'BUS/ECON', 'RIAS', 'ANTH/WGS', 'MUS', 'GECS', 'AAAS', 'HISP', 'AAAS/HIS', 'HBRW', 'RPJM', 'EL', 'HIST/WGS', 'AMST', 'EBIO', 'RBIF', 'LALS', 'ESL', 'CBIO', 'HRNS', 'KOR', 'RSEG', 'GRK', 'FA', 'AMST/ENG', 'RECS', 'AAS/AAPI', 'SOC', 'HWL', 'CLAS/ENG', 'BISC', 'CLAS', 'PHIL', 'CHIN', 'COML', 'ARBC', 'LING', 'EAS', 'PSYC', 'RCOM', 'IMES', 'COMP', 'COMH', 'HOID', 'ECS', 'P

# 5.a: How many instructors taught at Brandeis last year?

In [8]:
# I'm not sure which terms are considered to be last year, so this includes all terms
instructors = {c['instructor'] for c in courses}
print("There are " + str(len(instructors)) + " instuctors.")

There are 904 instuctors.


In [9]:
instructors = {c['instructor'] for c in courses if c['enrolled']>=10}
print("There are " + str(len(instructors)) + " instuctors that taught a class with at least 10 students.")

There are 652 instuctors that taught a class with at least 10 students.


# What are the 5 largest course sections?

In [10]:
largest_courses = sorted(courses, key = lambda course: course['enrolled'], reverse=True)
[(course['enrolled'], course['name']) for course in largest_courses[:5]]

[(784, 'Introduction to Navigating Health and Safety'),
 (186, 'Organic Chemistry I'),
 (186, 'Physiology'),
 (181, 'Cells and Organisms'),
 (180, 'Organic Chemistry II')]

# 5.b: What is the total number of students taking COSI courses last year?

In [11]:
cosi_students=[course['enrolled'] for course in courses if course['subject']=="COSI"]
print("There were "+str(sum(cosi_students))+" students who took COSI courses.")

There were 2223 students who took COSI courses.


# 5.c: What was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [12]:
cosi_enrolled=sorted([course['enrolled'] for course in courses if course['subject']=="COSI" and course['enrolled']>=10])
median = (cosi_enrolled[len(cosi_enrolled)//2-1]/2.0+cosi_enrolled[len(cosi_enrolled)//2]/2.0, (cosi_enrolled)[len(cosi_enrolled)//2])[len(cosi_enrolled) % 2]
print("The median size of a COSI course last year was " + str(median) + ".")

The median size of a COSI course last year was 37.


# 5.d: Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.


In [13]:
def num_enrollments(subject):
    enroll = [course['enrolled'] for course in courses if course['subject']==subject]
    return sum(enroll)

subject_enrollments = sorted([(num_enrollments(subject),subject) for subject in subjects],reverse=True)[:10]
print("Top 10 subjects in terms of number of students taught:\n",[sub[1] for sub in subject_enrollments],"\n---------\n","More info (# of students,subject):\n", subject_enrollments)

Top 10 subjects in terms of number of students taught:
 ['HS', 'BIOL', 'BUS', 'HWL', 'CHEM', 'ECON', 'COSI', 'MATH', 'PSYC', 'ANTH'] 
---------
 More info (# of students,subject):
 [(5318, 'HS'), (3085, 'BIOL'), (2766, 'BUS'), (2734, 'HWL'), (2322, 'CHEM'), (2315, 'ECON'), (2223, 'COSI'), (1785, 'MATH'), (1704, 'PSYC'), (1144, 'ANTH')]


# 5.e: Do the same as in (d) but print the top 10 subjects in terms of number of courses offered

In [14]:
def courses_offered(subject):
    return len([c for c in courses if c['subject']==subject])
subject_courses = sorted([(courses_offered(subject),subject) for subject in subjects],reverse=True)[:10]
print("Top 10 subjects in terms of number of courses offered:\n",[sub[1] for sub in subject_courses],"\n---------\n","More info (# of courses,subject):\n", subject_courses)


Top 10 subjects in terms of number of courses offered:
 ['BIOL', 'HIST', 'PSYC', 'NEUR', 'BCHM', 'PHYS', 'HS', 'COSI', 'MUS', 'ENG'] 
---------
 More info (# of courses,subject):
 [(613, 'BIOL'), (498, 'HIST'), (417, 'PSYC'), (403, 'NEUR'), (296, 'BCHM'), (288, 'PHYS'), (274, 'HS'), (272, 'COSI'), (266, 'MUS'), (265, 'ENG')]


# 5.f: Do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject

In [15]:
def num_faculty(subject):
    return len({course['instructor'] for course in courses if course['subject']==subject})
faculty_subject = sorted([(num_faculty(subject),subject) for subject in subjects],reverse=True)[:10]
print("Top 10 subjects in terms of number of faculty teaching courses:\n",[sub[1] for sub in faculty_subject],"\n---------\n","More info (# of faculty,subject):\n", faculty_subject)
# print(sorted(faculty_subject,reverse=True)[:10])

Top 10 subjects in terms of number of faculty teaching courses:
 ['HS', 'BIOL', 'ECON', 'BCHM', 'HIST', 'BUS', 'BCBP', 'HWL', 'NEJS', 'MATH'] 
---------
 More info (# of faculty,subject):
 [(87, 'HS'), (67, 'BIOL'), (52, 'ECON'), (49, 'BCHM'), (47, 'HIST'), (47, 'BUS'), (46, 'BCBP'), (42, 'HWL'), (37, 'NEJS'), (37, 'MATH')]


# 5.g: List the top 20 faculty in terms of number of students they taught

In [16]:
def students_taught(instructor):
    return sum({course['enrolled'] for course in courses if course['instructor']==instructor})
top_facu=sorted([(students_taught(instructor),instructor) for instructor in instructors],reverse=True)[:10]
print("Top 20 faculty in terms of number of students they taught:\n",[sub[1] for sub in top_facu],"\n---------\n","More info (# of enrollment,faculty):\n",top_facu)

Top 20 faculty in terms of number of students they taught:
 [('Leah', 'Berkenwald', 'leahb@brandeis.edu'), ('Stephanie', 'Murray', 'murray@brandeis.edu'), ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), ('Milos', 'Dolnik', 'dolnik@brandeis.edu'), ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), ('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu'), ('Claudia', 'Novack', 'novack@brandeis.edu'), ('Antonella', 'DiLillo', 'dilant@brandeis.edu'), ('Rachel V.E.', 'Woodruff', 'woodruff@brandeis.edu'), ('Daniel', 'Breen', 'dbreen91@brandeis.edu')] 
---------
 More info (# of enrollment,faculty):
 [(926, ('Leah', 'Berkenwald', 'leahb@brandeis.edu')), (515, ('Stephanie', 'Murray', 'murray@brandeis.edu')), (450, ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu')), (403, ('Milos', 'Dolnik', 'dolnik@brandeis.edu')), (401, ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu')), (388, ('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu')), (355, ('Claudia', 'Novack', 'novack@brandeis.edu')), (341, ('Antonella', 'DiL

# 5.h: List the top 20 courses in terms of number of students taking that course across semesters and sections

In [17]:
tops = {}
for c in courses:
    name = c['name']
    enrolled = c['enrolled']
    if name in tops:
        tops[name] = tops[name] + enrolled
    else:
        tops[name] = enrolled
sorted_tops = [(k,v) for k, v in sorted(tops.items(), key=lambda x: x[1], reverse=True)]
count = 1
for course in sorted_tops:
    if (count > 20):
        break
    print(str(count) + ". " + course[0] + " - " + str(course[1]))
    count += 1

1. Navigating Health and Safety - 940
2. Introduction to Navigating Health and Safety - 879
3. General Biology Laboratory - 536
4. Dissertation Research - 381
5. Genetics and Genomics - 358
6. Introduction to Problem Solving in Python - 343
7. Introduction to Psychology - 336
8. Cells and Organisms - 287
9. Techniques of Calculus (a) - 280
10. Financial Accounting - 247
11. Organic Chemistry Laboratory I - 245
12. Organic Chemistry Laboratory II - 239
13. Organic Chemistry I - 236
14. Statistics - 231
15. Organic Chemistry II - 226
16. Advanced Programming Techniques in Java - 225
17. General Chemistry Laboratory I - 208
18. Senior Research - 207
19. Introduction to Microeconomics - 207
20. Applied Linear Algebra - 204


# 5.i Creative Questions

Angelo's question - What is the average amount of students in a COSI courses that have at least 1 student?

In [18]:
cosi_enrollments = [c['enrolled'] for c in courses if c['subject'] == "COSI" if c['enrolled'] > 0]
sum3 = 0
for val in cosi_enrollments:
    sum3 += val
avg = sum3 // len(cosi_enrollments)
print("The average amount of students in a COSI course that has students is " + str(avg) + "!")

The average amount of students in a COSI course that has students is 18!



Su Lei's question - What percentage of available courses required instrucutor's signature to enroll?

In [19]:
sign_required=[course for course in courses if "Instructor's Signature Required".lower() in course['details'].lower()]
percentage = len(sign_required)/len(courses)
print("{:.2f}".format(percentage),"% of the courses required instrutor's signature.")

0.38 % of the courses required instrutor's signature.


# 6.a,b,c: Showing the title, description and custom filter

In [20]:
s = Schedule(courses)
#print(s.courses[:3])
titles = s.title("computer")
descriptions = s.description("human")
ind_studies = s.independent_study_filter(True)
print(titles.courses[0]['name'])
print("-----")
print(descriptions.courses[0]['description'])
print("-----")
print(ind_studies.courses[0]['name'] + " - " + str(ind_studies.courses[0]['independent_study']))

Computer Simulations and Risk Assessment
-----
Meets for one-half semester and yields half-course credit. 

Examines the relation between democracy and development geared for development practitioners and policy-makers. Students will discuss if democracy is essential for sustainable development and, if so, what kinds of democracy should be promoted in developing countries. The major critiques of aid and development theory rooted in secular democracy, free-market economies, and human rights will be explored. Usually offered every year.
Rajesh Sampath
-----
Readings in Jewish Professional Leadership - True


In [21]:
atest = s.code("223A")
stest = s.code("MATH")
print(atest.courses[0])
print("-----")
print(stest.courses[0])
print("-----")

{'limit': None, 'times': [], 'enrolled': 0, 'details': "Instructor's Signature Required.", 'type': 'section', 'status_text': 'Open Consent Req.', 'section': '1', 'waiting': 0, 'instructor': ('Janet', 'McIntosh', 'janetmc@brandeis.edu'), 'coinstructors': (), 'code': ('ANTH', '223A'), 'subject': 'ANTH', 'coursenum': '223A', 'name': 'Readings and Research in Anthropology', 'independent_study': True, 'term': '1203', 'description': 'Janet McIntosh'}
-----
-----
