# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)


## Structure of a course
Next we look at the fields of each course dictionary and their values

print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [3]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [4]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [5]:
[course for course in courses if 'Java' in course['name']]

[{'limit': 80,
  'times': [{'start': 600, 'days': ['m', 'w'], 'end': 690},
   {'start': 630, 'end': 720, 'type': 'Recitiation', 'days': ['f']}],
  'enrolled': 43,
  'details': 'See Course Catalog for prerequisites.\nOpen to graduate students in the post-baccalaureate computer science program.\nInstruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
  'type': 'section',
  'status_text': 'Open',
  'section': '1',
  'waiting': 0,
  'instructor': ('Antonella', 'DiLillo', 'dilant@brandeis.edu'),
  'coinstructors': (),
  'code': ('COSI', '12B'),
  'subject': 'COSI',
  'coursenum': '12B',
  'name': 'Advanced Programming Techniques in Java',
  'independent_study': False,
  'term': '1211',
  'description': 'Prerequisite: COSI 10a or successful completion of the COSI online placement exam.\n\nStudies advanced programming concepts and techniques utilizing the Java programming language. The course covers software engineer

Weidong 5.a:
* how many faculty taught COSI courses last year?

In [6]:
len({c['instructor'] for c in courses if c['subject']=='COSI'})

27

Weidong 5.b
* what is the total number of students taking COSI courses last year?

In [7]:
sum([c['enrolled'] for c in courses if c['subject']=='COSI'])

2223

Jingqian 5.c
* what was the median size of a COSI course last year (counting only those courses with at least 10 students)

In [8]:
course_size = [course for course in courses if course['enrolled'] >= 10 and course['subject'] == 'COSI']
course_size_sorted = sorted(course_size, key = lambda x : x['enrolled'])
length = len(course_size_sorted)
if (length % 2 == 0):
    median_size = (course_size_sorted[length//2]['enrolled'] + course_size_sorted[length//2 - 1]['enrolled'])/2
else:
    median_size = course_size_sorted[length//2]['enrolled'] 
int(median_size)

37

# Head

In [21]:
print("abc")

abc


Jingqian 5.d
* create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10. This shows the top 10 subjects in terms of number of students taught.

In [9]:
subjects = {course['subject'] for course in courses}
tuples = []
for subject in subjects:
    number = 0
    for course in courses:
        if course['subject'] == subject:
            number += course['enrolled']
    tuples.append((subject, number))
tuples_sorted = sorted(tuples, key = lambda x : -x[1])
tuples_sorted[:10]

[('HS', 5318),
 ('BIOL', 3085),
 ('BUS', 2766),
 ('HWL', 2734),
 ('CHEM', 2322),
 ('ECON', 2315),
 ('COSI', 2223),
 ('MATH', 1785),
 ('PSYC', 1704),
 ('ANTH', 1144)]

Katherine 5.g
* list the top 20 faculty in terms of number of students they taught

In [10]:
all_instructors = {course['instructor']: 0 for course in courses}
for course in courses:
    all_instructors[course['instructor']] = all_instructors[course['instructor']] + int(course['enrolled'])
by_student_size = [instructors for instructors, enrolled in sorted(all_instructors.items(), key=lambda item: -item[1])]
top_twenty = by_student_size[:20]
print(top_twenty)


[('Leah', 'Berkenwald', 'leahb@brandeis.edu'), ('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'), ('Stephanie', 'Murray', 'murray@brandeis.edu'), ('Milos', 'Dolnik', 'dolnik@brandeis.edu'), ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'), ('Bryan', 'Ingoglia', 'ingoglia@brandeis.edu'), ('Rachel V.E.', 'Woodruff', 'woodruff@brandeis.edu'), ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), ('Daniel', 'Breen', 'dbreen91@brandeis.edu'), ('Melissa', 'Kosinski-Collins', 'kosinski@brandeis.edu'), ('Claudia', 'Novack', 'novack@brandeis.edu'), ('Antonella', 'DiLillo', 'dilant@brandeis.edu'), ('Jon', 'Chilingerian', 'chilinge@brandeis.edu'), ('Ahmad', 'Namini', 'anamini@brandeis.edu'), ('Iraklis', 'Tsekourakis', 'tsekourakis@brandeis.edu'), ('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu'), ('Peter', 'Mistark', 'pmistark@brandeis.edu'), ('Brenda', 'Anderson', 'banders@brandeis.edu'), ('Colleen', 'Hitchcock', 'hitchcock@brandeis.edu'), ('Scott A.', 'Redenius', 'redenius@brandeis.edu')]


Katherine 5.h
* list the top 20 courses in terms of number of students taking that course

In [11]:
all_courses = {str(course['subject'] + course['coursenum']): 0 for course in courses}
for course in courses:
    all_courses[str(course['subject'] + course['coursenum'])] = all_courses[str(course['subject'] + course['coursenum'])] + int(course['enrolled'])
by_student_size = [course for course, enrolled in sorted(all_courses.items(), key=lambda item: -item[1])]
top_twenty = by_student_size[:20]
print(top_twenty)

['HWL1', 'HWL1-PRE', 'BIOL14A', 'COSI10A', 'PSYC10A', 'BIOL15B', 'MATH10A', 'BIOL18B', 'BIOL18A', 'CHEM29A', 'CHEM29B', 'CHEM25A', 'PSYC51A', 'CHEM25B', 'COSI12B', 'BUS6A', 'CHEM18A', 'ECON10A', 'MATH15A', 'ANTH1A']


Katherine 5.i
* number of faculty who have classes on Fridays

In [12]:
len({course['instructor'] for course in courses if (len(course['times']) == 1 and 'tu' in course['times'][0]['days'])})

398

Jingqian 5.i
* Find the top 10 courses most difficult to enroll (with most waiting students)

In [13]:
course_with_waiting = [course for course in courses if course['waiting'] > 0]
course_with_waiting_sorted = sorted(course_with_waiting, key = lambda x : -x['waiting'])
course_with_waiting_sorted[:10]

[{'limit': 25,
  'times': [{'days': ['w', 'm'], 'end': 570, 'start': 480},
   {'end': 1170, 'days': ['th'], 'start': 1080}],
  'enrolled': 25,
  'details': 'See Course Catalog for prerequisites.\nInstruction for this course will be offered in a hybrid combination of in person and remote sessions, which may vary by course and over the duration of the semester. Some courses will have sessions at which in-person and remote students will participate at the same time, and others will arrange for some separate class meetings for in-person and remote students. Enrollment is open to students who will be on campus and students who will be studying remotely up to the enrollment limit.',
  'type': 'section',
  'status_text': 'Closed Consent Req.',
  'section': '2',
  'waiting': 27,
  'instructor': ('Kene Nathan', 'Piasta', 'kpiasta@brandeis.edu'),
  'coinstructors': (),
  'code': ('BIOL', '51A'),
  'subject': 'BIOL',
  'coursenum': '51A',
  'name': 'Biostatistics',
  'independent_study': False,
 

Weidong 5.i
* List all courses which contain keyword 'java' in their description

In [14]:
[c for c in courses if 'Java' in c['name']]

[{'limit': 80,
  'times': [{'start': 600, 'days': ['m', 'w'], 'end': 690},
   {'start': 630, 'end': 720, 'type': 'Recitiation', 'days': ['f']}],
  'enrolled': 43,
  'details': 'See Course Catalog for prerequisites.\nOpen to graduate students in the post-baccalaureate computer science program.\nInstruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
  'type': 'section',
  'status_text': 'Open',
  'section': '1',
  'waiting': 0,
  'instructor': ('Antonella', 'DiLillo', 'dilant@brandeis.edu'),
  'coinstructors': (),
  'code': ('COSI', '12B'),
  'subject': 'COSI',
  'coursenum': '12B',
  'name': 'Advanced Programming Techniques in Java',
  'independent_study': False,
  'term': '1211',
  'description': 'Prerequisite: COSI 10a or successful completion of the COSI online placement exam.\n\nStudies advanced programming concepts and techniques utilizing the Java programming language. The course covers software engineer

Weidong 5.i
* List all the COSI course have more than 10students and it's instructor's first name,and student number,sorted by student number

In [15]:
 list = {(c['name'],c['instructor'][0],c['enrolled']) for c in courses if c['subject']=='COSI' and c['enrolled']>=10}
 list_sort = sorted(list, key = lambda course : -course[2])
 print(list_sort)

[('Introduction to 3-D Animation', 'Timothy J', 166), ('Introduction to Problem Solving in Python', 'Timothy J', 150), ('Data Structures and the Fundamentals of Computing', 'Antonella', 102), ('Discrete Structures', 'Mitch', 101), ('Data Structures and the Fundamentals of Computing', 'Antonella', 99), ('Advanced Programming Techniques in Java', 'Antonella', 96), ('Operating Systems', 'Iraklis', 94), ('Introduction to Problem Solving in Python', 'Jordan', 92), ('Introduction to the Theory of Computation', 'James A', 87), ('Fundamentals of Natural Language Processing I', 'Constantine', 84), ('Operating Systems', 'Iraklis', 78), ('Introduction to Problem Solving in Python', 'Iraklis', 67), ('Advanced Programming Techniques in Java', 'Iraklis', 66), ('Structure and Interpretation of Computer Programs', 'Harry', 61), ('Statistical Machine Learning', 'Pengyu', 56), ('Practical Machine Learning with Big Data', 'Pengyu', 48), ('Software Entrepreneurship', 'Ralph', 47), ('Computer-Supported Coo

Weidong 5.i
* top10 popular instructor in computer science department in terms of enrolled student number

In [16]:
instructor = {c['instructor']:0 for c in courses if c['subject']=='COSI'}
for c in courses:
    if(c['subject']=='COSI'):
        instructor[c['instructor']] = instructor[c['instructor']] + c['enrolled']

instructor_sorted = sorted(instructor.items(),key = lambda x:-x[1])
print(instructor_sorted[:10])

[(('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'), 411), (('Antonella', 'DiLillo', 'dilant@brandeis.edu'), 342), (('Iraklis', 'Tsekourakis', 'tsekourakis@brandeis.edu'), 316), (('Ralph', 'Salas', 'rpsalas@brandeis.edu'), 128), (('Jordan', 'Pollack', 'pollack@brandeis.edu'), 126), (('Pengyu', 'Hong', 'hongpeng@brandeis.edu'), 114), (('Constantine', 'Lignos', 'lignos@brandeis.edu'), 106), (('James A', 'Storer', 'storer@brandeis.edu'), 104), (('Mitch', 'Cherniack', 'mfc@brandeis.edu'), 101), (('Harry', 'Mairson', 'mairson@brandeis.edu'), 85)]


### 5e.	do the same as in (d) but print the top 10 subjects in terms of number of courses offered - Jian He

In [17]:
list = {}
for c in courses:
    if c["subject"] in list:
         list[c["subject"]] += 1
    else:
        list[c["subject"]] = 0
    
list = sorted(list.items(), key = lambda x: x[1])
for s in reversed(list[-10:]):
    print(s)


('BIOL', 612)
('HIST', 497)
('PSYC', 416)
('NEUR', 402)
('BCHM', 295)
('PHYS', 287)
('HS', 273)
('COSI', 271)
('MUS', 265)
('ENG', 264)


### 5f.	do the same as in (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject - Jian He

In [18]:
list = {}
for c in courses:
    if c["subject"] in list:
         list[c["subject"]].add(c["instructor"][2])
    else:
        list[c["subject"]] = set()
    
list = sorted(list.items(), key = lambda x: len(x[1]))
for s in reversed(list[-10:]):
    print(s[0], len(s[1]))

HS 84
BIOL 67
ECON 51
BCHM 49
HIST 47
BCBP 46
BUS 46
HWL 42
NEJS 37
MATH 37


### 5i.	How many independent study courses has Brandeis offered? - Jian He

In [19]:
count = 0
list = set()
for c in courses:
    if c["independent_study"]:
        count += 1
        list.add(c["name"])

print(len(list))

165
