# Tuition and Fees from 2010 - 2020

As shown on http://otcads.umd.edu/bfa/budgetinfo3.htm, there is a historical tuition and fees page (shown here: http://otcads.umd.edu/bfa/FY20%20Working%20Budget/web/TuitHist%202010%20to%202020%20(Excel).htm).  
This dataset has tuition rates for all kinds of students from 2010 to 2020. 

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import pickle

## Scraping

In [2]:
url = 'http://otcads.umd.edu/bfa/FY20%20Working%20Budget/web/TuitHist%202010%20to%202020%20(Excel)_files/sheet001.htm'
r = requests.get(url)

In [3]:
soup = BeautifulSoup(r.text,'html.parser')

In [4]:
# Extracting the only table there is.
len(soup.find_all('table'))

1

In [5]:
table = soup.find('table')

In [6]:
str_table = []
for e in table.find_all('td'):
    if(e.text.strip()):
        str_table.append(e.text.strip())

In order to find the parts of the text that is necessary to copy into the dataframe, keywords like 'Undergrad Resident' were used to find the starting and ending indexes. The code directly below can be uncommented to show all the indexes of the parsed text.

## Parsing

In [7]:
tuition_fees = pd.DataFrame(columns=['Fee Type (Total for Fall and Spring)', 'Student Type'] + list(map(str, range(2010, 2021))))

i = str_table.index('Undergrad Resident')
while(i <= str_table.index('Mandatory Fees') - 1):
    descr = str_table[i]
    skip = 0
    if(str_table[i + 12] == '(fee per credit hour)'):
        descr = descr + ' (fee per credit hour)'
        skip = 1
        
    years = list(map(str, range(2010, 2021)))
    values = str_table[i+1:i+12]
    dictionary = dict(zip(years, values))
    dictionary['Student Type'] = descr
    
    dictionary['Fee Type (Total for Fall and Spring)'] = 'Standard Tuition Rates'
    
    tuition_fees = tuition_fees.append(dictionary, ignore_index=True)
    i = i + 12 + skip 
    
i = str_table.index('Undergrad FT')
while(i <= len(str_table) - 2):
    descr = str_table[i]
    
    years = list(map(str, range(2010, 2021)))
    values = str_table[i+1:i+12]
    dictionary = dict(zip(years, values))
    dictionary['Student Type'] = descr
    
    dictionary['Fee Type (Total for Fall and Spring)'] = 'Mandatory Fees'
    
    tuition_fees = tuition_fees.append(dictionary, ignore_index=True)
    i = i + 12

## Finalizing & Fine-tuning

The data is now being correctly type casted.

In [8]:
for year in list(range(2010, 2021)):
    tuition_fees[str(year)] = tuition_fees[str(year)].apply(lambda x : x.replace(',', ''))
    tuition_fees[str(year)] = tuition_fees[str(year)].astype(float)

In [9]:
tuition_fees['Student Type'] = tuition_fees['Student Type'].apply(lambda x : x
                                                                  .replace('FT', 'Full-Time')
                                                                  .replace('PT', 'Part-Time')
                                                                  .replace('Non-Res', 'Non-Resident'))

In [10]:
tuition_fees

Unnamed: 0,Fee Type (Total for Fall and Spring),Student Type,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Standard Tuition Rates,Undergrad Resident,6566.0,6763.0,6966.0,7175.0,7390.0,7764.0,8152.0,8315.0,8481.0,8651.0,8824.0
1,Standard Tuition Rates,Undergrad Non-Resident,22503.0,23178.0,24337.0,25554.0,26576.0,27905.0,29300.0,30179.0,31688.0,33272.0,34936.0
2,Standard Tuition Rates,Undergrad Part-Time Resident (fee per credit h...,273.0,282.0,290.0,299.0,308.0,324.0,340.0,346.0,353.0,360.0,367.0
3,Standard Tuition Rates,Undergrad Part-Time Non-Resident (fee per cred...,938.0,966.0,1014.0,1065.0,1108.0,1163.0,1221.0,1258.0,1321.0,1387.0,1456.0
4,Standard Tuition Rates,Graduate Resident (fee per credit hour),471.0,500.0,525.0,551.0,573.0,602.0,632.0,651.0,683.0,717.0,731.0
5,Standard Tuition Rates,Graduate Non-Resident (fee per credit hour),1016.0,1077.0,1131.0,1188.0,1236.0,1298.0,1363.0,1404.0,1474.0,1548.0,1625.0
6,Mandatory Fees,Undergrad Full-Time,1487.0,1653.0,1689.0,1733.0,1771.0,1815.0,1844.0,1866.0,1918.0,1944.0,1955.0
7,Mandatory Fees,Undergrad Part-Time,678.0,761.0,779.0,799.0,818.0,840.0,855.0,866.0,893.0,906.0,910.0
8,Mandatory Fees,Graduate Full-Time,1188.0,1351.0,1383.0,1413.0,1446.0,1490.0,1521.0,1538.0,1590.0,1620.0,1635.0
9,Mandatory Fees,Graduate Part-Time,675.0,757.0,773.0,788.0,806.0,829.0,846.0,855.0,881.0,898.0,902.0


# Transpose

The transpose will be taken to make 'Year' into one column to make plots easier. This will also make each column a timeseries, which is much easier to handle.

In [11]:
tuition_fees = tuition_fees.transpose()
tuition_fees.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Fee Type (Total for Fall and Spring),Standard Tuition Rates,Standard Tuition Rates,Standard Tuition Rates,Standard Tuition Rates,Standard Tuition Rates,Standard Tuition Rates,Mandatory Fees,Mandatory Fees,Mandatory Fees,Mandatory Fees
Student Type,Undergrad Resident,Undergrad Non-Resident,Undergrad Part-Time Resident (fee per credit h...,Undergrad Part-Time Non-Resident (fee per cred...,Graduate Resident (fee per credit hour),Graduate Non-Resident (fee per credit hour),Undergrad Full-Time,Undergrad Part-Time,Graduate Full-Time,Graduate Part-Time
2010,6566,22503,273,938,471,1016,1487,678,1188,675
2011,6763,23178,282,966,500,1077,1653,761,1351,757
2012,6966,24337,290,1014,525,1131,1689,779,1383,773


In [12]:
list1 = list(tuition_fees.iloc[0].values)
list2 = list(tuition_fees.iloc[1].values)
new_columns = ([str(a) + b for a,b in zip(list(map(lambda x : x + ": ", list1)),list2)])
tuition_fees.columns = new_columns

In [13]:
tuition_fees = tuition_fees.drop(['Fee Type (Total for Fall and Spring)', 'Student Type'])
tuition_fees = tuition_fees.reset_index()
tuition_fees = tuition_fees.rename(columns={'index' : 'Year'})

In [14]:
tuition_fees

Unnamed: 0,Year,Standard Tuition Rates: Undergrad Resident,Standard Tuition Rates: Undergrad Non-Resident,Standard Tuition Rates: Undergrad Part-Time Resident (fee per credit hour),Standard Tuition Rates: Undergrad Part-Time Non-Resident (fee per credit hour),Standard Tuition Rates: Graduate Resident (fee per credit hour),Standard Tuition Rates: Graduate Non-Resident (fee per credit hour),Mandatory Fees: Undergrad Full-Time,Mandatory Fees: Undergrad Part-Time,Mandatory Fees: Graduate Full-Time,Mandatory Fees: Graduate Part-Time
0,2010,6566,22503,273,938,471,1016,1487,678,1188,675
1,2011,6763,23178,282,966,500,1077,1653,761,1351,757
2,2012,6966,24337,290,1014,525,1131,1689,779,1383,773
3,2013,7175,25554,299,1065,551,1188,1733,799,1413,788
4,2014,7390,26576,308,1108,573,1236,1771,818,1446,806
5,2015,7764,27905,324,1163,602,1298,1815,840,1490,829
6,2016,8152,29300,340,1221,632,1363,1844,855,1521,846
7,2017,8315,30179,346,1258,651,1404,1866,866,1538,855
8,2018,8481,31688,353,1321,683,1474,1918,893,1590,881
9,2019,8651,33272,360,1387,717,1548,1944,906,1620,898


In [15]:
tuition_fees.to_pickle('df/tuition_fees')