# Courses Demo
This Jupyter notebook is for exploring the data set courses20-21.json
which consists of all Brandeis courses in the 20-21 academic year (Fall20, Spr21, Sum21) 
which had at least 1 student enrolled.

First we need to read the json file into a list of Python dictionaries

In [1]:
import json

In [2]:
with open("courses20-21.json","r",encoding='utf-8') as jsonfile:
    courses = json.load(jsonfile)

## Structure of a course
Next we look at the fields of each course dictionary and their values

In [3]:
print('there are',len(courses),'courses in the dataset')
print('here is the data for course 1246')
courses[1246]

there are 7813 courses in the dataset
here is the data for course 1246


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ['An', 'Huang', 'anhuang@brandeis.edu'],
 'coinstructors': [],
 'code': ['MATH', '223A'],
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

## Cleaning the data
If we want to sort courses by instructor or by code, we need to replace the lists with tuples (which are immutable lists)

In [4]:
for course in courses:
        course['instructor'] = tuple(course['instructor'])
        course['coinstructors'] = tuple([tuple(f) for f in course['coinstructors']])
        course['code']= tuple(course['code'])

In [5]:
print('notice that the instructor and code are tuples now')
courses[1246]

notice that the instructor and code are tuples now


{'limit': 28,
 'times': [{'start': 1080, 'end': 1170, 'days': ['w', 'm']}],
 'enrolled': 4,
 'details': 'Instruction for this course will be offered remotely. Meeting times for this course are listed in the schedule of classes (in ET).',
 'type': 'section',
 'status_text': 'Open',
 'section': '1',
 'waiting': 0,
 'instructor': ('An', 'Huang', 'anhuang@brandeis.edu'),
 'coinstructors': (),
 'code': ('MATH', '223A'),
 'subject': 'MATH',
 'coursenum': '223A',
 'name': 'Lie Algebras: Representation Theory',
 'independent_study': False,
 'term': '1203',
 'description': "Theorems of Engel and Lie. Semisimple Lie algebras, Cartan's criterion. Universal enveloping algebras, PBW theorem, Serre's construction. Representation theory. Other topics as time permits. Usually offered every second year.\nAn Huang"}

# Exploring the data set
Now we will show how to use straight python to explore the data set and answer some interesting questions. Next week we will start learning Pandas/Numpy which are packages that make it easier to explore large dataset efficiently.

Here are some questions we can try to asnwer:
* what are all of the subjects of courses (e.g. COSI, MATH, JAPN, PHIL, ...)
* which terms are represented?
* how many instructors taught at Brandeis last year?
* what were the five largest course sections?
* what were the five largest courses (where we combine sections)?
* which are the five largest subjects measured by number of courses offered?
* which are the five largest courses measured by number of students taught?
* which course had the most sections taught in 20-21?
* who are the top five faculty in terms of number of students taught?
* etc.

In [6]:
len({c['subject'] for c in courses})

120

In [7]:
len({c['instructor'] for c in courses})

904

In [8]:
{c['subject']for c in courses}

{'AAAS',
 'AAAS/HIS',
 'AAAS/WGS',
 'AAPI',
 'AAPI/HIS',
 'AAPI/WGS',
 'AAS/AAPI',
 'AMST',
 'AMST/ENG',
 'AMST/MUS',
 'ANTH',
 'ANTH/WGS',
 'ARBC',
 'BCBP',
 'BCHM',
 'BIBC',
 'BIOL',
 'BIOT',
 'BIPH',
 'BISC',
 'BUS',
 'BUS/ECON',
 'BUS/FIN',
 'CA',
 'CAST',
 'CBIO',
 'CHEM',
 'CHIN',
 'CHSC',
 'CLAS',
 'CLAS/ENG',
 'CLAS/NEJ',
 'COMH',
 'COML',
 'COMP',
 'COSI',
 'EAS',
 'EBIO',
 'ECON',
 'ECON/FIN',
 'ECS',
 'ECS/ENG',
 'ED',
 'EL',
 'ENG',
 'ENVS',
 'ESL',
 'FA',
 'FILM',
 'FIN',
 'FREN',
 'GECS',
 'GER',
 'GRK',
 'GS',
 'HBRW',
 'HISP',
 'HIST',
 'HIST/SOC',
 'HIST/WGS',
 'HOID',
 'HRNS',
 'HS',
 'HS/POL',
 'HSSP',
 'HUM',
 'HUM/UWS',
 'HWL',
 'IGS',
 'IIM',
 'IMES',
 'INT',
 'ITAL',
 'JAPN',
 'JOUR',
 'KOR',
 'LALS',
 'LAT',
 'LGLS',
 'LING',
 'MATH',
 'MERS',
 'MUS',
 'NBIO',
 'NEJS',
 'NEUR',
 'NPSY',
 'PAX',
 'PHIL',
 'PHYS',
 'PMED',
 'POL',
 'POL/WGS',
 'PSYC',
 'QBIO',
 'QR',
 'RBIF',
 'RBOT',
 'RCOM',
 'RDFT',
 'RDMD',
 'RECS',
 'RECS/THA',
 'REL',
 'RHIN',
 'RIAS',
 'RID

In [9]:
{c['term']for c in courses}

{'1203', '1211', '1212'}

In [10]:
enrollment = sorted(courses, key = lambda course: -course['enrolled'])
[(courses['enrolled'], courses['name']) for courses in enrollment[:5]]

[(784, 'Introduction to Navigating Health and Safety'),
 (186, 'Organic Chemistry I'),
 (186, 'Physiology'),
 (181, 'Cells and Organisms'),
 (180, 'Organic Chemistry II')]

**<h1>Matthew</h1>**
list the top 20 faculty in terms of number of students they taught

list the top 20 courses in terms of number of students taking that course (where you combine different sections and semesters, i.e. just use the subject and course number)

Create your own interesting question (each team member creates their own) and use Python to answer that question.

Create your own filter (each team member creates their own


In [116]:
faculty = sorted(courses, key = lambda course: -course['enrolled'])
numperProf = [(sum(courses['enrolled'] for course in courses if courses['instructor'] == faculty), courses['instructor']) for courses in faculty]
numperProf.sort(key = lambda pair: -pair[0])
([professor for (number, professor) in numperProf[:20]])

[('Leah', 'Berkenwald', 'leahb@brandeis.edu'),
 ('Stephanie', 'Murray', 'murray@brandeis.edu'),
 ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'),
 ('Maria', 'de Boef Miara', 'mmiara@brandeis.edu'),
 ('Stephanie', 'Murray', 'murray@brandeis.edu'),
 ('Stuart', 'Altman', 'altman@brandeis.edu'),
 ('Anne S', 'Berry', 'anneberry@brandeis.edu'),
 ('Ellen J', 'Wright', 'ejwright@brandeis.edu'),
 ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'),
 ('Timothy J', 'Hickey', 'tjhickey@brandeis.edu'),
 ('Daniel', 'Breen', 'dbreen91@brandeis.edu'),
 ('Dan L', 'Perlman', 'perlman@brandeis.edu'),
 ('Colleen', 'Hitchcock', 'hitchcock@brandeis.edu'),
 ('Teresa Vann', 'Mitchell', 'tmitch@brandeis.edu'),
 ('Jennifer', 'Gutsell', 'jgutsell@brandeis.edu'),
 ('Peter', 'Mistark', 'pmistark@brandeis.edu'),
 ('Peter', 'Mistark', 'pmistark@brandeis.edu'),
 ('Michael Thomas', 'Marr', 'mmarr@brandeis.edu'),
 ('Paul', 'DiZio', 'dizio@brandeis.edu'),
 ('Geoffrey', 'Clarke', 'geoffclarke@brandeis.edu')]

In [12]:
enrollment = sorted(courses, key = lambda course: -course['enrolled'])
[(courses['subject'], courses['coursenum']) for courses in enrollment[:5]]

[('HWL', '1-PRE'),
 ('CHEM', '25A'),
 ('BIOL', '42A'),
 ('BIOL', '15B'),
 ('CHEM', '25B')]

Interesting question: Only find classes that are open

In [55]:
[(c['name'], c['status_text']) for c in courses if c['status_text'] == "Open"][:20]

[('Global Dexterity', 'Open'),
 ('Information Visualization', 'Open'),
 ('Sales and Sales Management', 'Open'),
 ('Corporate Financial Modeling', 'Open'),
 ('Technical Analysis', 'Open'),
 ('Python and Applications to Business Analytics', 'Open'),
 ('Python and Applications to Business Analytics II', 'Open'),
 ('Jewish Community and Jewish Identity', 'Open'),
 ('Real Estate Finance', 'Open'),
 ('Private Equity', 'Open'),
 ('Measurement of Inequality to Health and Development', 'Open'),
 ('Coexistence Research Methods', 'Open'),
 ('Democracy and Development', 'Open'),
 ('Evaluating Survey Data Using Stata: Questioning Answers', 'Open'),
 ('Introduction to Geographic Information Systems', 'Open'),
 ('Labor Income, Labor Power, and Labor Markets', 'Open'),
 ('Introduction to Microeconomics in Global Health', 'Open'),
 ('Multilevel Modeling Methods', 'Open'),
 ('Cost-Effectiveness', 'Open'),
 ('Kingian Nonviolence and Reconciliation', 'Open')]

<h1>Pedro</h1>
Create your own interesting question (each team member creates their own) and use Python to answer that question. (Choosing two Professors, compare the classes the professor taught?)

In [28]:
profOne = input("First Professor?")
profTwo = input("Second Professor?")

setterOne = set(())
setterTwo = set(())

for x in courses:
    if x['instructor'][1] == profOne:
        setterOne.add(x['subject'] + " " + x['coursenum'] + " " +  x['name'])
for x in courses:
    if x['instructor'][1] == profTwo:
        setterTwo.add(x['subject'] + " " + x['coursenum'] + " " + x['name'])
print(str(profOne) + ": \n" + str(setterOne))

print(str(profTwo) + ": \n" + str(setterTwo))

First Professor? Hickey
Second Professor? Bayone


Hickey: 
{'COSI 10A Introduction to Problem Solving in Python', 'COSI 210A Independent Study', "COSI 300B Master's Project", "COSI 300A Master's Project", 'COSI 400D Dissertation Research', 'COSI 98A Independent Study', 'COSI 200A Readings', 'COSI 93A Research Internship and Analysis', 'COSI 98B Independent Study', 'COSI 99D Senior Research', 'COSI 200B Readings', 'COSI 152A Web Application Development', 'COSI 153A Mobile Application Development', "COSI 293G Master's Research Internship", 'COSI 164A Introduction to 3-D Animation'}
Bayone: 
{'BUS 10A Business Fundamentals', 'BUS 237F International Real Estate: The Mature Markets', 'BUS 98A Independent Study', 'BUS 98B Independent Study', 'BUS 235F Real Estate Fundamentals', 'FIN 242F Credit Risk Analysis'}


title(self,phrase) -- filters courses containing the phrase in their title


In [14]:
phrase = 'econ'
titled = [c['name'] for c in courses if phrase in c['name']]
print(titled[0])


def title(phrase):
    titled = [c['name'] for c in courses if phrase in c['name']]
    return titled

print(title('econ')[:5])

Introduction to Microeconomics in Global Health
['Introduction to Microeconomics in Global Health', 'Kingian Nonviolence and Reconciliation', 'Advanced Microeconomics I', 'Advanced Macroeconomics I', 'Computational Linguistics Second Year Seminar']


description(self,phrase) - filters courses containing the phrase in the description

In [15]:
def description(phrase):
    descripted = [c['name'] for c in courses if phrase in c['description']]
    return descripted

print(description('econ')[:5])

['Jewish Community and Jewish Identity', 'Democracy and Development', 'Evaluating Survey Data Using Stata: Questioning Answers', 'Introduction to Microeconomics in Global Health', 'Financial Statement Analysis']


description -- filter by phrase in the description


In [16]:
def description(phrase):
    descripted = [c['name'] for c in courses if phrase in c['description']]
    return descripted

print(description('econ')[:5])

['Jewish Community and Jewish Identity', 'Democracy and Development', 'Evaluating Survey Data Using Stata: Questioning Answers', 'Introduction to Microeconomics in Global Health', 'Financial Statement Analysis']


personal filter - check if it is an independent study

In [None]:
def independent(self, phrase):
        '''search the course list to find phrase in description(created by Pedro)'''
        return Schedule([c for c in self.courses if phrase == c['independent_study']])

elif command in ['in', 'independent']:
            '''created by Pedro'''
            phrase = input("Are you searching for independent studies? <y/n>")
            schedule = schedule.independent(phrase)

<h1>Fritz</h1>
Create a list of tuples (E,S) where S is a subject and E is the number of students enrolled in courses in that subject, sort it and print the top 10.

Do the same as in (d) but print the top 10 subjects in terms of number of courses offered.

Do the same as (d) but print the top 10 subjects in terms of number of faculty teaching courses in that subject.

Create your own interesting question (each team member creates their own) and use Python to answer that question.


In [17]:
diction = {}
for x in courses:
    # add the number of people enrolled into a dictionary
    if x['subject'] not in diction:
        diction[x['subject']] = x['enrolled']
    else:
        diction[x['subject']] += x['enrolled']
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), reverse=True, key=lambda item: item[1]))
print(list(dictionary.items())[:10])

[('HS', 5318), ('BIOL', 3085), ('BUS', 2766), ('HWL', 2734), ('CHEM', 2322), ('ECON', 2315), ('COSI', 2223), ('MATH', 1785), ('PSYC', 1704), ('ANTH', 1144)]


In [18]:
diction = {}
for x in courses:
    # adding 1 to show the number of courses
    if x['subject'] not in diction:
        diction[x['subject']] = 1
    else:
        diction[x['subject']] += 1
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), reverse=True, key=lambda item: item[1]))
print(list(dictionary.items())[:10])

[('BIOL', 613), ('HIST', 498), ('PSYC', 417), ('NEUR', 403), ('BCHM', 296), ('PHYS', 288), ('HS', 274), ('COSI', 272), ('MUS', 266), ('ENG', 265)]


In [19]:
diction = {}
for x in courses:
    # adding instructors into a set to remove repeats
    if x['subject'] not in diction:
        diction[x['subject']] = set(x['instructor'])
    else:
        diction[x['subject']].add(x['instructor'])
# setting the values of the dictionary to the length of the set
for key in diction.keys():
    diction[key] = len(diction[key])
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), reverse=True, key=lambda item: item[1]))
print(list(dictionary.items())[:10])

[('HS', 90), ('BIOL', 70), ('ECON', 53), ('BCHM', 52), ('BUS', 50), ('HIST', 50), ('BCBP', 49), ('HWL', 45), ('MATH', 40), ('NEJS', 40)]


Interesting Question: Which subjects have the shortest class descriptions (sorted based on how many characters are in the description)? 
Sort and print the top 10

In [20]:
diction = {}
for x in courses:
    # adding length of descriptions into a dictionary
    if x['subject'] not in diction:
        diction[x['subject']] = len(x['description'])
    else:
        diction[x['subject']] += len(x['description'])
# sorts and prints out dictionary in a list of tuples
dictionary = dict(sorted(diction.items(), key=lambda item: item[1]))
print(list(dictionary.items())[:10])

[('AAPI', 37), ('MERS', 66), ('GS', 126), ('HOID', 132), ('HUM', 220), ('CLAS/ENG', 222), ('GECS', 262), ('ECS/ENG', 346), ('CLAS/NEJ', 356), ('PMED', 380)]


<h1>John</h1>

What is the total number of students taking COSI courses last year?

How many faculty taught COSI courses last year?

what was the median size of a COSI course last year (counting only those courses with at least 10 students)

Create your own interesting question (each team member creates their own) and use Python to answer that question.
(What Courses Did Profressor Edward Bayone teach last year)

Create your own filter (each team member creates their own)



In [21]:
total = 0
for x in courses:
    if x['subject'] == "COSI":
        total+=x['enrolled']
print(total)


2223


In [22]:
setter = set(())
for x in courses:
    if x['subject'] == "COSI":
        setter.add(x['instructor'])
print(len(setter))


27


In [23]:
arr = []
for x in courses:
    if x['subject'] == "COSI" and x['enrolled'] >= 10:
        arr.append(x['enrolled'])
arr.sort()
print(arr[len(arr)//2])

37


In [24]:
setter = set(())

for x in courses:
    if x['instructor'][1] == "Bayone":
        setter.add(x['subject'] + " " + x['coursenum'] + " " +  x['name'])
print(setter)
        

{'BUS 10A Business Fundamentals', 'BUS 237F International Real Estate: The Mature Markets', 'BUS 98A Independent Study', 'BUS 98B Independent Study', 'BUS 235F Real Estate Fundamentals', 'FIN 242F Credit Risk Analysis'}
