# End-To-End Example: Data Analysis of iSchool Classes

In this end-to-end example we will perform a data analysis in Python Pandas we will attempt to answer the following questions:

- What percentage of the schedule are undergrad (course number 500 or lower)?
- What undergrad classes are on Friday? or at 8AM?

Things we will demonstrate:

- `read_html()` for basic web scraping
- dealing with 5 pages of data
- `append()` multiple `DataFrames` together
- Feature engineering (adding a column to the `DataFrame`)

The iSchool schedule of classes can be found here: https://ischool.syr.edu/classes 


In [1]:
import pandas as pd

# this turns off warning messages
import warnings
warnings.filterwarnings('ignore')

In [5]:
# just figure out how to get the data
website = 'https://ischool.syr.edu/classes/?page=1'
data = pd.read_html(website)
data[0]

Unnamed: 0,Course,Section,Class,Credits,Title,Instructor(s),Time,Day,Room(s)
0,GET302,M001,37463,3.0,Global Financial Sys Arch,vcschoon,5:00pm - 7:45pm,Tu,Hinds Hall 011
1,GET460,M001,41946,3.0,Global Technology Abroad,Laurie A Ferger,12:00am - 12:00am,,
2,GET460,M002,41948,3.0,Global Technology Abroad,Paul Brian Gandel,12:00am - 12:00am,,
3,GET602,M001,37464,3.0,Global Financial Sys Arch,vcschoon,5:00pm - 7:45pm,Tu,Hinds Hall 011
4,GET660,M001,41947,3.0,Global Technology Abroad,Laurie A Ferger,12:00am - 12:00am,,
5,GET660,M002,41949,3.0,Global Technology Abroad,Paul Brian Gandel,12:00am - 12:00am,,
6,IDS401,M001,37545,3.0,What's the Big Idea?,Michael A D'Eredita,5:00pm - 7:50pm,Tu,FALK ROOM 201
7,IDS402,M001,41538,3.0,Idea2Startup,John DuRoss Liddy,9:30am - 12:15pm,F,Hinds Hall 011
8,IDS403,M001,37431,1.0,Startup Sandbox,John DuRoss Liddy,2:15pm - 5:05pm,F,Syracuse Technology Garden
9,IDS460,M002,37571,3.0,Entretech - NYC,John DuRoss Liddy,12:00am - 12:00am,,


In [9]:
# let's generate links to the other pages
website = 'https://ischool.syr.edu/classes/?page='
classes = pd.DataFrame()
for i in [1,2,3,4,5,6,7]:
    link = website + str(i)
    page_classes = pd.read_html(link)
    classes = classes.append(page_classes[0], ignore_index=True)

classes.to_csv('ischool-classes.csv')

In [11]:
# let's read them all and append them to a single data frame
classes.sample(5)

Unnamed: 0,Course,Section,Class,Credits,Title,Instructor(s),Time,Day,Room(s)
176,IST625,M402,42612,3.0,Enterprise Risk Management,Michael Larche,12:00am - 8:30pm,M,Online Online
18,IST195,M004,37444,3.0,LAB: Information Technologies,Jeff Rubin,10:35am - 11:30am,F,Hinds Hall 010
172,IST625,M002,37495,3.0,Enterprise Risk Management,Frank Jr Marullo,5:15pm - 8:05pm,W,Hinds Hall 117
256,IST707,M401,42439,3.0,Data Analytics,ProfGates,12:00am - 4:30pm,Su,Online Online
240,IST687,M408,37654,3.0,Introduction to Data Science,John W Santerre,12:00am - 8:30pm,Tu,Online Online


In [15]:
classes['Subject'] = classes['Course'].str[0:3]
classes['Number'] = classes['Course'].str[3:]



In [18]:
classes['Type'] = ""
classes['Type'][ classes['Number'] >= '500'] = 'GRAD'
classes['Type'][ classes['Number'] < '500'] = 'UGRAD'


In [37]:
ist = classes[ classes['Subject'] == 'IST' ]
istug = ist[ ist['Type'] == 'UGRAD'] 
istug_nof = istug [istug['Day'].str.find("F") ==-1] 
istug_nof_or8am = istug_nof[~ istug_nof['Time'].str.startswith('8:00am')] 

In [38]:
istug_nof_or8am

Unnamed: 0,Course,Section,Class,Credits,Title,Instructor(s),Time,Day,Room(s),Subject,Number,Type
15,IST101,M001,37501,1.0,First-Year Forum,Julie Walas,12:45pm - 1:40pm,M,Hinds Hall 120,IST,101,UGRAD
16,IST195,M001,37392,3.0,Information Technologies,Jeff Rubin,9:30am - 10:25am,MW,Huntington Beard Crouse Giff,IST,195,UGRAD
24,IST233,M001,37411,3.0,Intro to Computer Networking,S Bruce Boardman,12:45pm - 1:40pm,MW,Hall of Languages 207,IST,233,UGRAD
25,IST233,M008,37437,3.0,Intro to Computer Networking,Jeffrey T Girard,5:00pm - 6:20pm,TuTh,Hinds Hall 010,IST,233,UGRAD
30,IST256,M001,37499,3.0,Appl.Prog.For Information Syst,Michael Fudge,3:45pm - 5:05pm,M,School of Management 007,IST,256,UGRAD
31,IST256,M003,37537,3.0,Appl.Prog.For Information Syst,Angela Usha Ramnarine-Rieks,3:45pm - 5:05pm,W,CH101,IST,256,UGRAD
32,IST256,M004,37538,3.0,Appl.Prog.For Information Syst,Angela Usha Ramnarine-Rieks,12:30pm - 1:50pm,Th,LINK200,IST,256,UGRAD
33,IST256,M005,37539,3.0,Appl.Prog.For Information Syst,Wade Stringer,5:00pm - 6:20pm,Th,Hinds Hall 117,IST,256,UGRAD
34,IST256,M006,37540,3.0,Appl.Prog.For Information Syst,Wade Stringer,9:30am - 10:50am,W,Hinds Hall 011,IST,256,UGRAD
35,IST256,M007,37546,3.0,Appl.Prog.For Information Syst,Nick Lyga,11:00am - 12:20pm,Th,Hinds Hall 111,IST,256,UGRAD
