# Data Analysis of the Rising Cost of Rent in American College Towns

## _Data Wrangling Part 1_

## Rent Prices Data
This section utilizes Fair Market Rent Data from the U.S. Department of Housing and Urban Development (HUD)'s Office of Office of Policy Development and Research (https://www.huduser.gov/portal/datasets/fmr.html#history).

## Background on the data
Here are the meanings of some of the columns present in the data:
- `areaname`: geographic area name
- `cntyname`: county name
- `pmsaname`: primary metropolitan statistical area name
- `fmrxx_y`: xx represents year; y represents number of bedrooms ranging from 0(efficiency) to 4 bedrooms
- `fmr_area`: shows the fmr percentile measured
- `pop2017`: estimated population 2017
- `pop2000`: population from 2000 census

### First, we import data and start to clean

In [2]:
import numpy as np
import pandas as pd

In [3]:
rent_prices = pd.read_csv('rent_prices.csv')
rent_prices.head(3)

Unnamed: 0,fips2010,fips2000,areaname22,name,msa22,fmr22_0,fmr22_1,fmr22_2,fmr22_3,fmr22_4,...,pop2010,fmr_area,census_region,pmsaname,cntyname,pop2017,pop2000,id_agis3,id_agis2,id
0,100199999.0,100199999.0,"Montgomery, AL MSA",Autauga County,METRO33860M33860,643.0,764.0,914.0,1156.0,1494.0,...,54571.0,5240.0,3.0,"Montgomery, AL MSA",Autauga County,55035.0,43671.0,MSA5240,MSA5240,100000001.0
1,100399999.0,100399999.0,"Daphne-Fairhope-Foley, AL MSA",Baldwin County,METRO19300M19300,772.0,777.0,1017.0,1348.0,1715.0,...,182265.0,5160.0,3.0,"Mobile, AL MSA",Baldwin County,203360.0,140415.0,MSA5160,MSA5160,100000003.0
2,100599999.0,100599999.0,"Barbour County, AL",Barbour County,NCNTY01005N01005,532.0,536.0,705.0,871.0,980.0,...,27457.0,10000005.0,3.0,"Barbour County, AL",Barbour County,26200.0,29038.0,CNTY01005,CNTY01005,100000005.0


### Delete unneeded rows

We only need the district information and the historical rent prices. The rent prices columns all start with fmr. We need to see all the columns that don't start with fmr so that we can know the columns that will be dropped.

In [4]:
to_be_dropped = [columns for columns in rent_prices.columns if not (columns.startswith('fmr'))]
print (to_be_dropped)

['fips2010', 'fips2000', 'areaname22', 'name', 'msa22', 'msa21', 'msa20', 'msa19', 'msa18', 'msa17', 'msa16', 'msa15', 'msa14', 'msa13', 'msa12', 'msa11', 'msa10', 'msa09', 'msa08', 'msa07', 'msa06', 'msa05', 'msa04', 'msa03', 'msa02', 'msa01', 'msa00', 'msa99', 'msa98', 'msa97', 'msa96', 'msa95', 'msa94', 'msa93', 'msa92', 'msa91', 'msa90', 'msa89', 'msa88', 'msa87', 'msa86', 'msa85', 'msa83', 'cbsasub', 'areaname', 'state', 'cousub', 'msa', 'county', 'pop2010', 'census_region', 'pmsaname', 'cntyname', 'pop2017', 'pop2000', 'id_agis3', 'id_agis2', 'id']


Let's take a peep at the columns that we want to drop

In [5]:
rent_prices[['areaname','state','cousub','county','census_region',
             'pmsaname','cntyname','areaname22','name','cbsasub']].head()

Unnamed: 0,areaname,state,cousub,county,census_region,pmsaname,cntyname,areaname22,name,cbsasub
0,"Montgomery, AL MSA",1.0,99999.0,1.0,3.0,"Montgomery, AL MSA",Autauga County,"Montgomery, AL MSA",Autauga County,METRO33860M33860
1,"Daphne-Fairhope-Foley, AL MSA",1.0,99999.0,3.0,3.0,"Mobile, AL MSA",Baldwin County,"Daphne-Fairhope-Foley, AL MSA",Baldwin County,METRO19300M19300
2,"Barbour County, AL",1.0,99999.0,5.0,3.0,"Barbour County, AL",Barbour County,"Barbour County, AL",Barbour County,NCNTY01005N01005
3,"Birmingham-Hoover, AL HUD Metro FMR Area",1.0,99999.0,7.0,3.0,"Bibb County, AL",Bibb County,"Birmingham-Hoover, AL HUD Metro FMR Area",Bibb County,METRO13820M13820
4,"Birmingham-Hoover, AL HUD Metro FMR Area",1.0,99999.0,9.0,3.0,"Birmingham, AL MSA",Blount County,"Birmingham-Hoover, AL HUD Metro FMR Area",Blount County,METRO13820M13820


We want to keep name, cntynme and areaname columns because they are all different identifiers for a region. This will come in handy when we compare with other datasets.

In [6]:
rent_prices.drop(columns = [col for col in rent_prices.columns if (col.startswith(('fips','msa','id','pop')))],inplace=True)
rent_prices.drop(columns = ['state','cousub','county','census_region','areaname22','cbsasub','name'],inplace=True)

The fmr columns that don't have _ represent fmr percentile and are not needed. The fmr_area column is an efficiency measurement that is not needed. We shall try to delete them before proceeding.

In [7]:
rent_prices.drop(columns = [col for col in rent_prices.columns if col.startswith('fmr') if len(col)==5], inplace=True)
rent_prices.drop(columns = 'fmr_area', inplace=True)
rent_prices.head()

Unnamed: 0,fmr22_0,fmr22_1,fmr22_2,fmr22_3,fmr22_4,fmr21_0,fmr21_1,fmr21_2,fmr21_3,fmr21_4,...,fmr85_3,fmr85_4,fmr83_0,fmr83_1,fmr83_2,fmr83_3,fmr83_4,areaname,pmsaname,cntyname
0,643.0,764.0,914.0,1156.0,1494.0,640.0,766.0,908.0,1148.0,1520.0,...,344.0,382.0,186.0,227.0,269.0,332.0,370.0,"Montgomery, AL MSA","Montgomery, AL MSA",Autauga County
1,772.0,777.0,1017.0,1348.0,1715.0,718.0,723.0,922.0,1249.0,1584.0,...,393.0,439.0,217.0,257.0,309.0,380.0,425.0,"Daphne-Fairhope-Foley, AL MSA","Mobile, AL MSA",Baldwin County
2,532.0,536.0,705.0,871.0,980.0,488.0,492.0,648.0,806.0,907.0,...,387.0,426.0,212.0,257.0,300.0,374.0,413.0,"Barbour County, AL","Barbour County, AL",Barbour County
3,765.0,820.0,943.0,1220.0,1316.0,817.0,871.0,1002.0,1303.0,1409.0,...,400.0,447.0,218.0,265.0,312.0,387.0,433.0,"Birmingham-Hoover, AL HUD Metro FMR Area","Bibb County, AL",Bibb County
4,765.0,820.0,943.0,1220.0,1316.0,817.0,871.0,1002.0,1303.0,1409.0,...,417.0,462.0,229.0,280.0,327.0,404.0,448.0,"Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham, AL MSA",Blount County


### Change column names

We should change the fmr columns into dates to better visualize the data

In [8]:
old_col = rent_prices.columns
new_col = ['20'+i[-4:] for i in [col for col in rent_prices.columns if col.startswith('fmr2')]]\
         +['19'+i[-4:] for i in [col for col in rent_prices.columns if col.startswith('fmr9')]]\
         +['19'+i[-4:] for i in [col for col in rent_prices.columns if col.startswith('fmr8')]]\
         +['20'+i[-4:] for i in [col for col in rent_prices.columns if col.startswith('fmr1')]]\
         +['20'+i[-4:] for i in [col for col in rent_prices.columns if col.startswith('fmr0')]]

rent_prices.rename(columns = dict(zip(old_col,new_col)),inplace=True)

In [9]:
rent_prices.sort_index(axis=1,ascending=False,inplace=True)
rent_prices.head(1)

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3,2022_2,2022_1,2022_0,2021_4,2021_3,...,1985_4,1985_3,1985_2,1985_1,1985_0,1983_4,1983_3,1983_2,1983_1,1983_0
0,"Montgomery, AL MSA",Autauga County,"Montgomery, AL MSA",1494.0,1156.0,914.0,764.0,643.0,1520.0,1148.0,...,1020.0,776.0,583.0,517.0,440.0,881.0,731.0,537.0,454.0,425.0


### Identify null values

We need to check the data for null values and see if the summary statistics make sense

In [10]:
rent_prices.describe(include='all')

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3,2022_2,2022_1,2022_0,2021_4,2021_3,...,1985_4,1985_3,1985_2,1985_1,1985_0,1983_4,1983_3,1983_2,1983_1,1983_0
count,4757,4757,4766,4765.0,4765.0,4765.0,4765.0,4767.0,4766.0,4766.0,...,4738.0,4738.0,4738.0,4740.0,4738.0,4736.0,4736.0,4736.0,4736.0,4738.0
unique,2674,1959,2598,,,,,,,,...,,,,,,,,,,
top,"Boston, MA--NH PMSA",Washington County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",,,,,,,,...,,,,,,,,,,
freq,129,104,114,,,,,,,,...,,,,,,,,,,
mean,,,,1481.819098,1288.199161,991.09276,793.927177,724.39593,1436.278011,1251.494964,...,887.020473,784.441537,608.49599,502.509156,446.455255,885.76837,767.869299,591.293708,473.361909,397.414268
std,,,,503.423202,433.9106,348.472099,287.402187,256.837765,514.150816,447.021624,...,295.683966,250.893247,204.548023,171.364501,154.539266,349.519727,293.129406,228.428755,183.796258,157.456783
min,,,,574.0,549.0,422.0,370.0,356.0,571.0,524.0,...,408.0,396.0,307.0,272.0,215.0,431.0,390.0,309.0,263.0,213.0
25%,,,,1122.0,1000.0,757.0,600.0,555.0,1085.0,972.0,...,681.25,605.0,467.0,385.0,342.0,630.0,556.0,436.0,346.0,296.0
50%,,,,1313.0,1130.0,867.0,699.0,639.0,1260.0,1084.0,...,802.0,707.0,537.0,446.5,407.0,778.0,675.0,531.0,415.0,353.0
75%,,,,1729.0,1472.0,1122.0,884.0,811.5,1648.0,1404.0,...,1035.0,897.0,682.0,558.0,499.0,1037.75,892.0,678.0,539.0,439.0


In [11]:
rent_prices.info(verbose=True,show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4773 entries, 0 to 4772
Data columns (total 198 columns):
 #    Column    Non-Null Count  Dtype  
---   ------    --------------  -----  
 0    pmsaname  4757 non-null   object 
 1    cntyname  4757 non-null   object 
 2    areaname  4766 non-null   object 
 3    2022_4    4765 non-null   float64
 4    2022_3    4765 non-null   float64
 5    2022_2    4765 non-null   float64
 6    2022_1    4765 non-null   float64
 7    2022_0    4767 non-null   float64
 8    2021_4    4766 non-null   float64
 9    2021_3    4766 non-null   float64
 10   2021_2    4766 non-null   float64
 11   2021_1    4766 non-null   float64
 12   2021_0    4768 non-null   float64
 13   2020_4    4766 non-null   float64
 14   2020_3    4766 non-null   float64
 15   2020_2    4766 non-null   float64
 16   2020_1    4766 non-null   float64
 17   2020_0    4768 non-null   float64
 18   2019_4    4734 non-null   float64
 19   2019_3    4734 non-null   float64
 20   2019_2

In [12]:
rent_prices.shape

(4773, 198)

We can see that almost every column has a nonnull entry and we need to decide what to do with them. We can also see that the name columns have a lot of repeated values but we don't know what that means for the data yet.

First, let's deal with the null values.

In [13]:
rent_prices.columns[rent_prices.isnull().any()]
#results show that each column has at least one missing value.

Index(['pmsaname', 'cntyname', 'areaname', '2022_4', '2022_3', '2022_2',
       '2022_1', '2022_0', '2021_4', '2021_3',
       ...
       '1985_4', '1985_3', '1985_2', '1985_1', '1985_0', '1983_4', '1983_3',
       '1983_2', '1983_1', '1983_0'],
      dtype='object', length=198)

In [14]:
pd.options.display.min_rows = 198
rent_prices.isnull().sum()
#it looks like the number of null values per column is not a lot so they should be easy to deal with.
#to be sure we'll look at the largest number of null values per column

pmsaname    16
cntyname    16
areaname     7
2022_4       8
2022_3       8
2022_2       8
2022_1       8
2022_0       6
2021_4       7
2021_3       7
2021_2       7
2021_1       7
2021_0       5
2020_4       7
2020_3       7
2020_2       7
2020_1       7
2020_0       5
2019_4      39
2019_3      39
2019_2      39
2019_1      39
2019_0      37
2018_4      39
2018_3      39
2018_2      39
2018_1      39
2018_0      37
2017_4      39
2017_3      39
            ..
1989_4      22
1989_3      22
1989_2      22
1989_1      22
1989_0      20
1988_4      22
1988_3      22
1988_2      22
1988_1      22
1988_0      20
1987_4      22
1987_3      22
1987_2      22
1987_1      22
1987_0      20
1986_4      22
1986_3      22
1986_2      22
1986_1      22
1986_0      20
1985_4      35
1985_3      35
1985_2      35
1985_1      33
1985_0      35
1983_4      37
1983_3      37
1983_2      37
1983_1      37
1983_0      35
Length: 198, dtype: int64

In [15]:
rent_prices.isnull().sum().nlargest(30)
#we can see that we have over 50 null values for some years

2002_4    61
2002_3    61
2002_2    61
2002_1    61
2002_0    61
2001_4    60
2001_3    60
2001_2    60
2001_1    60
2001_0    60
2000_4    60
2000_3    60
2000_2    60
2000_1    60
2000_0    60
2006_4    58
2006_3    58
2006_2    58
2006_1    58
2005_4    58
2005_3    58
2005_2    58
2005_1    58
2005_0    58
2004_4    58
2004_3    58
2004_2    58
2004_1    58
2004_0    58
2003_4    58
dtype: int64

We cannot see all the information at a glance because we have 198 rows which is over pandas display limits.
Let's divide the null columns into ranges so we know what we're dealing with.

In [16]:
null_cols = rent_prices.columns[rent_prices.isnull().any()]
print(str(len([col for col in null_cols if rent_prices[col].isnull().sum()>50])) + ' columns with na >50')
print(str(len([col for col in null_cols if rent_prices[col].isnull().sum()>30
              if rent_prices[col].isnull().sum()<50])) + ' columns with 30<na<50')
print(str(len([col for col in null_cols if rent_prices[col].isnull().sum()>0
              if rent_prices[col].isnull().sum()<30])) + ' columns with 0<na<30')

50 columns with na >50
60 columns with 30<na<50
88 columns with 0<na<30


### Fix null values

For the name columns, we can delete all the columns that are blank in all 3 name columns, and replace the na values in the rest with each other. 

In [17]:
#Area name has the least null values so we will fill pmsaname and cntyname with areaname.
rent_prices = rent_prices.assign(pmsaname=lambda x: np.where(x.pmsaname.isnull(),x.areaname,x.pmsaname),
                   cntyname=lambda x: np.where(x.cntyname.isnull(),x.areaname,x.cntyname))
print(rent_prices.pmsaname.isnull().sum())
print(rent_prices.cntyname.isnull().sum())
#All the name columns now have equal number of missing values which makes me suspect that they are the same rows.
#We have no need for rows with no name columns so we will drop them, but first let's check them out

7
7


In [18]:
rent_prices[rent_prices.pmsaname.isnull()]
#let's see what's happening around each row

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3,2022_2,2022_1,2022_0,2021_4,2021_3,...,1985_4,1985_3,1985_2,1985_1,1985_0,1983_4,1983_3,1983_2,1983_1,1983_0
2090,,,,2990.0,2726.0,2205.0,1826.0,1658.0,,,...,,,,,,,,,,
2149,,,,2990.0,2726.0,2205.0,1826.0,1658.0,,,...,,,,,,,,,,
2159,,,,2505.0,2181.0,1723.0,1309.0,1145.0,,,...,,,,,,,,,,
4769,,,,,,,,935.0,,,...,,,,559.0,,,,,,489.0
4770,,,,,,,,1081.4,,,...,,,,569.4,,,,,,540.8
4771,,,,,,,,,,,...,,,,,,,,,,
4772,,,,,,,,,,,...,,,,,,,,,,


In [19]:
rent_prices.loc[2088:2092,['pmsaname','cntyname','areaname','2022_4','2022_3']]
#we need to delete the rows that have no name as they are not significant to our data.
#we can also see that there are duplicated rows in the data across the entire columns and across the name columns

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3
2088,"Boston, MA--NH PMSA",Middlesex County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",2990.0,2726.0
2089,"Boston, MA--NH PMSA",Middlesex County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",,
2090,,,,2990.0,2726.0
2091,"Lowell, MA--NH PMSA",Middlesex County,"Lowell, MA HUD Metro FMR Area",2404.0,2192.0
2092,"Boston, MA--NH PMSA",Middlesex County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",2990.0,2726.0


In [20]:
rent_prices.loc[2147:2151,['pmsaname','cntyname','areaname','2022_4','2022_3']]
#same issue as above

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3
2147,"Boston, MA--NH PMSA",Norfolk County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",2990.0,2726.0
2148,"Boston, MA--NH PMSA",Norfolk County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",,
2149,,,,2990.0,2726.0
2150,"Boston, MA--NH PMSA",Norfolk County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",2990.0,2726.0
2151,"Boston, MA--NH PMSA",Norfolk County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",2990.0,2726.0


In [21]:
rent_prices.loc[2157:2161,['pmsaname','cntyname','areaname','2022_4','2022_3']]
#same issue as above

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3
2157,"Brockton, MA PMSA",Plymouth County,"Brockton, MA HUD Metro FMR Area",2505.0,2181.0
2158,"Brockton, MA PMSA",Plymouth County,"Brockton, MA HUD Metro FMR Area",,
2159,,,,2505.0,2181.0
2160,"Brockton, MA PMSA",Plymouth County,"Brockton, MA HUD Metro FMR Area",2505.0,2181.0
2161,"Boston, MA--NH PMSA",Plymouth County,"Boston-Cambridge-Quincy, MA-NH HUD Metro FMR Area",2990.0,2726.0


In [22]:
rent_prices.loc[4767:4775,['pmsaname','cntyname','areaname','2022_4','2022_3']]
#all nan values

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3
4767,"St. John/St. Thomas, VI",St. John,"St. John Island, VI",2305.0,2101.0
4768,"St. John/St. Thomas, VI",St. Thomas,"St. Thomas Island, VI",1699.0,1549.0
4769,,,,,
4770,,,,,
4771,,,,,
4772,,,,,


In [23]:
rent_prices = rent_prices.drop([2090,2149,2159,4769,4770,4771,4772])
rent_prices.tail(3)
#we drop the rows that have null values across all name columns

Unnamed: 0,pmsaname,cntyname,areaname,2022_4,2022_3,2022_2,2022_1,2022_0,2021_4,2021_3,...,1985_4,1985_3,1985_2,1985_1,1985_0,1983_4,1983_3,1983_2,1983_1,1983_0
4766,"St. Croix, VI",St. Croix,"St. Croix Island, VI",1467.0,1338.0,1082.0,886.0,868.0,1410.0,1294.0,...,833.0,728.0,583.0,481.0,462.0,1019.0,909.0,729.0,618.0,509.0
4767,"St. John/St. Thomas, VI",St. John,"St. John Island, VI",2305.0,2101.0,1700.0,1368.0,1154.0,2214.0,2031.0,...,1046.0,1001.0,808.0,628.0,525.0,1307.0,1167.0,934.0,792.0,654.0
4768,"St. John/St. Thomas, VI",St. Thomas,"St. Thomas Island, VI",1699.0,1549.0,1253.0,1001.0,832.0,1633.0,1498.0,...,1046.0,1001.0,808.0,628.0,525.0,1307.0,1167.0,934.0,792.0,654.0


Now it's time to delete duplicate values across the rows

In [24]:
rent_prices[rent_prices.duplicated()].shape[0]
#shows the number of duplicated rows

1470

In [25]:
rent_prices[rent_prices.duplicated(subset=['pmsaname','cntyname','areaname'])].shape[0]
#however the names are duplicated 7 more times and we need to delete those as well

1477

In [26]:
rent_prices = rent_prices.drop_duplicates(subset=['pmsaname','cntyname','areaname'])
rent_prices.shape
#we now have 3289 rows
#now let's reanalyze the na values.
#in hindsight, this should have been done first.

(3289, 198)

For the rent prices information, missing data means that US HUDS was unable to get the data from that region or that the data does not exist. We have at most 61 missing values out of 4773. We can replace these missing values with 0. However, doing that will make calculations for average rent prices wrong. 

We will use a fillna method. However, we need to ensure that the fillna method only takes from the same year, otherwise average rent price calculations will be wrong.


In [27]:
coln = rent_prices.columns
for i in coln:
    if str(i[-1])=='0':
        rent_prices.fillna(method='ffill',axis=1)
    else:
        rent_prices.fillna(method='bfill',axis=1)
rent_prices.isnull().sum()
#this function ensures that fill values only belong to that year

pmsaname     0
cntyname     0
areaname     0
2022_4       0
2022_3       0
2022_2       0
2022_1       0
2022_0       0
2021_4       0
2021_3       0
2021_2       0
2021_1       0
2021_0       0
2020_4       0
2020_3       0
2020_2       0
2020_1       0
2020_0       0
2019_4      30
2019_3      30
2019_2      30
2019_1      30
2019_0      30
2018_4      30
2018_3      30
2018_2      30
2018_1      30
2018_0      30
2017_4      30
2017_3      30
            ..
1989_4      13
1989_3      13
1989_2      13
1989_1      13
1989_0      13
1988_4      13
1988_3      13
1988_2      13
1988_1      13
1988_0      13
1987_4      13
1987_3      13
1987_2      13
1987_1      13
1987_0      13
1986_4      13
1986_3      13
1986_2      13
1986_1      13
1986_0      13
1985_4      26
1985_3      26
1985_2      26
1985_1      26
1985_0      26
1983_4      28
1983_3      28
1983_2      28
1983_1      28
1983_0      28
Length: 198, dtype: int64

The function was not enough to completely remove all na values because some towns have no data available for a full year. We need to decide what to do in this case. 

We will use ffill for the remaining na values so that rent price is kept constant for areas where information is missing. We could've done this for all other na values as well since total number is not very significant for analysis. However, been very detailed is essential at this stage of learning experience.

In [28]:
rent_prices = rent_prices.fillna(method='ffill',axis=1)
rent_prices.isnull().sum()
#no more null values.

pmsaname    0
cntyname    0
areaname    0
2022_4      0
2022_3      0
2022_2      0
2022_1      0
2022_0      0
2021_4      0
2021_3      0
2021_2      0
2021_1      0
2021_0      0
2020_4      0
2020_3      0
2020_2      0
2020_1      0
2020_0      0
2019_4      0
2019_3      0
2019_2      0
2019_1      0
2019_0      0
2018_4      0
2018_3      0
2018_2      0
2018_1      0
2018_0      0
2017_4      0
2017_3      0
           ..
1989_4      0
1989_3      0
1989_2      0
1989_1      0
1989_0      0
1988_4      0
1988_3      0
1988_2      0
1988_1      0
1988_0      0
1987_4      0
1987_3      0
1987_2      0
1987_1      0
1987_0      0
1986_4      0
1986_3      0
1986_2      0
1986_1      0
1986_0      0
1985_4      0
1985_3      0
1985_2      0
1985_1      0
1985_0      0
1983_4      0
1983_3      0
1983_2      0
1983_1      0
1983_0      0
Length: 198, dtype: int64

### Reshape Data

To make meaning of our data, we need to e need to group our data by year and bedroom types.

We can start by creating a new row for the bedroom types based on the value of the column names. 

In [29]:
coln = rent_prices.columns
bedrm = []
for i in coln:
    if str(i[-1])=='0':
        bedrm.append('Efficiency')
    elif str(i[-1])=='1':
        bedrm.append('1-bed')
    elif str(i[-1])=='2':
        bedrm.append('2-bed')
    elif str(i[-1])=='3':
        bedrm.append('3-bed')
    elif str(i[-1])=='4':
        bedrm.append('4-bed')
    elif str(i[-1])=='e':
        bedrm.append('Room type')
rent_prices.loc[-1] = bedrm
rent_prices.index = rent_prices.index + 1
rent_prices = rent_prices.sort_index()
#above functions create a new row 'room type' just below the columns

Now that we have the room type column, we can take out the type code from the year values.

In [30]:
new_coln = ['pmsaname','cntyname','areaname']+[col[0:4] for col in rent_prices.columns if not col.endswith('name')]
rent_prices.rename(columns = dict(zip(coln,new_coln)),inplace=True)
rent_prices.head(1)

Unnamed: 0,pmsaname,cntyname,areaname,2022,2022.1,2022.2,2022.3,2022.4,2021,2021.1,...,1985,1985.1,1985.2,1985.3,1985.4,1983,1983.1,1983.2,1983.3,1983.4
0,Room type,Room type,Room type,4-bed,3-bed,2-bed,1-bed,Efficiency,4-bed,3-bed,...,4-bed,3-bed,2-bed,1-bed,Efficiency,4-bed,3-bed,2-bed,1-bed,Efficiency


Our data is complicated because it has several possible indexes: year, bedroom type, pmsaname, cntyname, areaname. We have to reshape in such a way that this is effectively communicated.

We will need to have multiindexes on both rows and columns

In [31]:
rent_prices = rent_prices.T
rent_prices.set_index([rent_prices.index,rent_prices[0]],inplace=True)
rent_prices.drop([0],axis=1,inplace=True)
rent_prices.head(10)
#this creates a row index with year and bedroom type

Unnamed: 0_level_0,Unnamed: 1_level_0,1,2,3,4,5,6,7,8,9,10,...,4760,4761,4762,4763,4764,4765,4766,4767,4768,4769
Unnamed: 0_level_1,0,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
pmsaname,Room type,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
cntyname,Room type,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
areaname,Room type,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
2022,4-bed,1494.0,1715.0,980.0,1316.0,1316.0,1158.0,1043.0,1111.0,1168.0,1006.0,...,676.0,868.0,868.0,631.0,750.0,868.0,574.0,1467.0,2305.0,1699.0
2022,3-bed,1156.0,1348.0,871.0,1220.0,1220.0,968.0,871.0,988.0,1074.0,982.0,...,574.0,719.0,719.0,575.0,646.0,719.0,572.0,1338.0,2101.0,1549.0
2022,2-bed,914.0,1017.0,705.0,943.0,943.0,783.0,705.0,744.0,861.0,705.0,...,462.0,538.0,538.0,422.0,455.0,538.0,422.0,1082.0,1700.0,1253.0
2022,1-bed,764.0,777.0,536.0,820.0,820.0,602.0,619.0,565.0,659.0,536.0,...,402.0,466.0,466.0,370.0,399.0,466.0,370.0,886.0,1368.0,1001.0
2022,Efficiency,643.0,772.0,532.0,765.0,765.0,589.0,530.0,562.0,655.0,530.0,...,387.0,428.0,428.0,364.0,392.0,428.0,362.0,868.0,1154.0,832.0
2021,4-bed,1520.0,1584.0,907.0,1409.0,1409.0,1034.0,930.0,1094.0,1046.0,880.0,...,576.0,909.0,909.0,623.0,738.0,909.0,630.0,1410.0,2214.0,1633.0
2021,3-bed,1148.0,1249.0,806.0,1303.0,1303.0,877.0,789.0,944.0,956.0,877.0,...,529.0,740.0,740.0,561.0,641.0,740.0,596.0,1294.0,2031.0,1498.0


In [32]:
rent_prices.columns = [rent_prices.iloc[0,:].tolist(),rent_prices.iloc[1,:].tolist(),rent_prices.iloc[2,:].tolist()]
rent_prices.head(10)
#this creates a column index with the town names
#however, we have duplicate rows which we will proceed to delete

Unnamed: 0_level_0,Unnamed: 1_level_0,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
Unnamed: 0_level_1,Unnamed: 1_level_1,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
Unnamed: 0_level_2,Unnamed: 1_level_2,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
Unnamed: 0_level_3,0,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
pmsaname,Room type,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
cntyname,Room type,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
areaname,Room type,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
2022,4-bed,1494.0,1715.0,980.0,1316.0,1316.0,1158.0,1043.0,1111.0,1168.0,1006.0,...,676.0,868.0,868.0,631.0,750.0,868.0,574.0,1467.0,2305.0,1699.0
2022,3-bed,1156.0,1348.0,871.0,1220.0,1220.0,968.0,871.0,988.0,1074.0,982.0,...,574.0,719.0,719.0,575.0,646.0,719.0,572.0,1338.0,2101.0,1549.0
2022,2-bed,914.0,1017.0,705.0,943.0,943.0,783.0,705.0,744.0,861.0,705.0,...,462.0,538.0,538.0,422.0,455.0,538.0,422.0,1082.0,1700.0,1253.0
2022,1-bed,764.0,777.0,536.0,820.0,820.0,602.0,619.0,565.0,659.0,536.0,...,402.0,466.0,466.0,370.0,399.0,466.0,370.0,886.0,1368.0,1001.0
2022,Efficiency,643.0,772.0,532.0,765.0,765.0,589.0,530.0,562.0,655.0,530.0,...,387.0,428.0,428.0,364.0,392.0,428.0,362.0,868.0,1154.0,832.0
2021,4-bed,1520.0,1584.0,907.0,1409.0,1409.0,1034.0,930.0,1094.0,1046.0,880.0,...,576.0,909.0,909.0,623.0,738.0,909.0,630.0,1410.0,2214.0,1633.0
2021,3-bed,1148.0,1249.0,806.0,1303.0,1303.0,877.0,789.0,944.0,956.0,877.0,...,529.0,740.0,740.0,561.0,641.0,740.0,596.0,1294.0,2031.0,1498.0


In [33]:
rent_prices = rent_prices.drop([('pmsaname',  'Room type'),
            ('cntyname',  'Room type'),
            ('areaname',  'Room type')])
rent_prices.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
Unnamed: 0_level_1,Unnamed: 1_level_1,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
Unnamed: 0_level_2,Unnamed: 1_level_2,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
Unnamed: 0_level_3,0,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
2022,4-bed,1494.0,1715.0,980.0,1316.0,1316.0,1158.0,1043.0,1111.0,1168.0,1006.0,...,676.0,868.0,868.0,631.0,750.0,868.0,574.0,1467.0,2305.0,1699.0
2022,3-bed,1156.0,1348.0,871.0,1220.0,1220.0,968.0,871.0,988.0,1074.0,982.0,...,574.0,719.0,719.0,575.0,646.0,719.0,572.0,1338.0,2101.0,1549.0
2022,2-bed,914.0,1017.0,705.0,943.0,943.0,783.0,705.0,744.0,861.0,705.0,...,462.0,538.0,538.0,422.0,455.0,538.0,422.0,1082.0,1700.0,1253.0
2022,1-bed,764.0,777.0,536.0,820.0,820.0,602.0,619.0,565.0,659.0,536.0,...,402.0,466.0,466.0,370.0,399.0,466.0,370.0,886.0,1368.0,1001.0
2022,Efficiency,643.0,772.0,532.0,765.0,765.0,589.0,530.0,562.0,655.0,530.0,...,387.0,428.0,428.0,364.0,392.0,428.0,362.0,868.0,1154.0,832.0
2021,4-bed,1520.0,1584.0,907.0,1409.0,1409.0,1034.0,930.0,1094.0,1046.0,880.0,...,576.0,909.0,909.0,623.0,738.0,909.0,630.0,1410.0,2214.0,1633.0
2021,3-bed,1148.0,1249.0,806.0,1303.0,1303.0,877.0,789.0,944.0,956.0,877.0,...,529.0,740.0,740.0,561.0,641.0,740.0,596.0,1294.0,2031.0,1498.0
2021,2-bed,908.0,922.0,648.0,1002.0,1002.0,705.0,634.0,723.0,744.0,634.0,...,425.0,556.0,556.0,421.0,448.0,556.0,421.0,1040.0,1633.0,1204.0
2021,1-bed,766.0,723.0,492.0,871.0,871.0,549.0,556.0,549.0,620.0,481.0,...,373.0,477.0,477.0,369.0,393.0,477.0,369.0,852.0,1314.0,962.0
2021,Efficiency,640.0,718.0,488.0,817.0,817.0,532.0,479.0,482.0,534.0,479.0,...,362.0,436.0,436.0,361.0,384.0,436.0,361.0,835.0,1109.0,797.0


In [34]:
rent_prices.index.set_names(['Year', 'Roomtype'], inplace=True)
rent_prices.head(10)
#we set names for the index so our data looks neat

Unnamed: 0_level_0,Unnamed: 1_level_0,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
Unnamed: 0_level_1,Unnamed: 1_level_1,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
Unnamed: 0_level_2,Unnamed: 1_level_2,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
Year,Roomtype,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
2022,4-bed,1494.0,1715.0,980.0,1316.0,1316.0,1158.0,1043.0,1111.0,1168.0,1006.0,...,676.0,868.0,868.0,631.0,750.0,868.0,574.0,1467.0,2305.0,1699.0
2022,3-bed,1156.0,1348.0,871.0,1220.0,1220.0,968.0,871.0,988.0,1074.0,982.0,...,574.0,719.0,719.0,575.0,646.0,719.0,572.0,1338.0,2101.0,1549.0
2022,2-bed,914.0,1017.0,705.0,943.0,943.0,783.0,705.0,744.0,861.0,705.0,...,462.0,538.0,538.0,422.0,455.0,538.0,422.0,1082.0,1700.0,1253.0
2022,1-bed,764.0,777.0,536.0,820.0,820.0,602.0,619.0,565.0,659.0,536.0,...,402.0,466.0,466.0,370.0,399.0,466.0,370.0,886.0,1368.0,1001.0
2022,Efficiency,643.0,772.0,532.0,765.0,765.0,589.0,530.0,562.0,655.0,530.0,...,387.0,428.0,428.0,364.0,392.0,428.0,362.0,868.0,1154.0,832.0
2021,4-bed,1520.0,1584.0,907.0,1409.0,1409.0,1034.0,930.0,1094.0,1046.0,880.0,...,576.0,909.0,909.0,623.0,738.0,909.0,630.0,1410.0,2214.0,1633.0
2021,3-bed,1148.0,1249.0,806.0,1303.0,1303.0,877.0,789.0,944.0,956.0,877.0,...,529.0,740.0,740.0,561.0,641.0,740.0,596.0,1294.0,2031.0,1498.0
2021,2-bed,908.0,922.0,648.0,1002.0,1002.0,705.0,634.0,723.0,744.0,634.0,...,425.0,556.0,556.0,421.0,448.0,556.0,421.0,1040.0,1633.0,1204.0
2021,1-bed,766.0,723.0,492.0,871.0,871.0,549.0,556.0,549.0,620.0,481.0,...,373.0,477.0,477.0,369.0,393.0,477.0,369.0,852.0,1314.0,962.0
2021,Efficiency,640.0,718.0,488.0,817.0,817.0,532.0,479.0,482.0,534.0,479.0,...,362.0,436.0,436.0,361.0,384.0,436.0,361.0,835.0,1109.0,797.0


In [162]:
rent_prices.columns.set_names(['pmsaname','cntyname','areaname'],inplace=True)
#we also set the column names

Because we have text data above the year index (in the column index) we cannot convert year to datetime. During analysis we will have to section off our data if we want to use datetime functions.

## University Enrollment Data
This section utilizes Fall Enrollment data from the National Center for Education Statistics' Integrated Post Secondary Education Data System (IPEDS) https://nces.ed.gov/ipeds/datacenter/InstitutionByName.aspx.

## Background on the data
We created a data group that included degree-granting US institutions only, with all degrees of urbanization except:
- City: Large
- City: Midsize
- Suburb: Large
- Suburb: Midsize

Institution locations was downloaded as a separate .UID file which was then converted to a .txt file.

In [35]:
enrollment = pd.read_csv('enrollment.csv')
enrollment.head(3)

Unnamed: 0,UnitID,Institution Name,Full time total (EF2020 All students total),Full time total (EF2019_RV All students total),Full time total (EF2018_RV All students total),Full time total (EF2017_RV All students total),Full time total (EF2016_RV All students total),Full time total (EF2015_RV All students total),Full time total (EF2014_RV All students total),Full time total (EF2013_RV All students total),...,Full time total (EF1991 All students total),Full time total (EF1990 All students total),Full time total (EF1989 All students total),Full time total (EF1988 All students total),Full time total (EF1987 All students total),Full time total (EF1986 All students total),Full time total (EF1985NW All students total),Full time total (EF1984NW All students total),Full time total (EF1980 All students total),Unnamed: 40
0,177834,A T Still University of Health Sciences,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,2448.0,2177.0,...,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0,
1,180203,Aaniiih Nakoda College,94.0,99.0,108.0,106.0,114.0,100.0,106.0,121.0,...,115.0,94.0,91.0,196.0,,,,,,
2,138558,Abraham Baldwin Agricultural College,2472.0,2480.0,2660.0,2470.0,2514.0,2484.0,2469.0,2350.0,...,1896.0,1799.0,1571.0,1428.0,1298.0,1377.0,1484.0,1656.0,2162.0,


The location of the institutions is present in a separate .txt file. We need to import and merge with our enrollment data.

In [36]:
uni_data = pd.read_csv('uni_data.txt',delimiter='|')
uni_data.to_csv('uni_data.csv',index=None,header=['UnitID','Institution Name','City','State'])
uni_location = pd.read_csv('uni_data.csv')
uni_location.head(3)

Unnamed: 0,UnitID,Institution Name,City,State
0,100812,Athens State University,Athens,AL
1,100858,Auburn University,Auburn,AL
2,101028,Chattahoochee Valley Community College,Phenix City,AL


In [37]:
enroll_data = enrollment.merge(uni_location.drop(['Institution Name'],axis=1), on='UnitID')
enroll_data.head(3)

Unnamed: 0,UnitID,Institution Name,Full time total (EF2020 All students total),Full time total (EF2019_RV All students total),Full time total (EF2018_RV All students total),Full time total (EF2017_RV All students total),Full time total (EF2016_RV All students total),Full time total (EF2015_RV All students total),Full time total (EF2014_RV All students total),Full time total (EF2013_RV All students total),...,Full time total (EF1989 All students total),Full time total (EF1988 All students total),Full time total (EF1987 All students total),Full time total (EF1986 All students total),Full time total (EF1985NW All students total),Full time total (EF1984NW All students total),Full time total (EF1980 All students total),Unnamed: 40,City,State
0,177834,A T Still University of Health Sciences,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,2448.0,2177.0,...,532.0,514.0,506.0,517.0,515.0,523.0,504.0,,Kirksville,MO
1,180203,Aaniiih Nakoda College,94.0,99.0,108.0,106.0,114.0,100.0,106.0,121.0,...,91.0,196.0,,,,,,,Harlem,MT
2,138558,Abraham Baldwin Agricultural College,2472.0,2480.0,2660.0,2470.0,2514.0,2484.0,2469.0,2350.0,...,1571.0,1428.0,1298.0,1377.0,1484.0,1656.0,2162.0,,Tifton,GA


### Arrange Columns

Let us rearrange the columns a bit to make it look neat

In [38]:
col1 = enroll_data.pop('City')
col2 = enroll_data.pop('State')
enroll_data.insert(2,'City',col1)
enroll_data.insert(3,'State',col2)
enroll_data.head(3)

Unnamed: 0,UnitID,Institution Name,City,State,Full time total (EF2020 All students total),Full time total (EF2019_RV All students total),Full time total (EF2018_RV All students total),Full time total (EF2017_RV All students total),Full time total (EF2016_RV All students total),Full time total (EF2015_RV All students total),...,Full time total (EF1991 All students total),Full time total (EF1990 All students total),Full time total (EF1989 All students total),Full time total (EF1988 All students total),Full time total (EF1987 All students total),Full time total (EF1986 All students total),Full time total (EF1985NW All students total),Full time total (EF1984NW All students total),Full time total (EF1980 All students total),Unnamed: 40
0,177834,A T Still University of Health Sciences,Kirksville,MO,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,...,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0,
1,180203,Aaniiih Nakoda College,Harlem,MT,94.0,99.0,108.0,106.0,114.0,100.0,...,115.0,94.0,91.0,196.0,,,,,,
2,138558,Abraham Baldwin Agricultural College,Tifton,GA,2472.0,2480.0,2660.0,2470.0,2514.0,2484.0,...,1896.0,1799.0,1571.0,1428.0,1298.0,1377.0,1484.0,1656.0,2162.0,


Let's extract 'Year' from the column headers and we will also drop the unnamed column

In [39]:
enroll_data.drop(columns='Unnamed: 40',inplace=True)
enroll_data.head(1)

Unnamed: 0,UnitID,Institution Name,City,State,Full time total (EF2020 All students total),Full time total (EF2019_RV All students total),Full time total (EF2018_RV All students total),Full time total (EF2017_RV All students total),Full time total (EF2016_RV All students total),Full time total (EF2015_RV All students total),...,Full time total (EF1992 All students total),Full time total (EF1991 All students total),Full time total (EF1990 All students total),Full time total (EF1989 All students total),Full time total (EF1988 All students total),Full time total (EF1987 All students total),Full time total (EF1986 All students total),Full time total (EF1985NW All students total),Full time total (EF1984NW All students total),Full time total (EF1980 All students total)
0,177834,A T Still University of Health Sciences,Kirksville,MO,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,...,557.0,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0


In [40]:
enroll_data.columns = ['UnitID','Institution Name','City','State']\
                     +[col[19:23] for col in enroll_data.columns if  col.startswith('Full')]
enroll_data.head(2)

Unnamed: 0,UnitID,Institution Name,City,State,2020,2019,2018,2017,2016,2015,...,1992,1991,1990,1989,1988,1987,1986,1985,1984,1980
0,177834,A T Still University of Health Sciences,Kirksville,MO,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,...,557.0,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0
1,180203,Aaniiih Nakoda College,Harlem,MT,94.0,99.0,108.0,106.0,114.0,100.0,...,135.0,115.0,94.0,91.0,196.0,,,,,


### Identify Problems in the DataFrame

Now let us see the summary statistics of our data and adjust for null values

In [41]:
enroll_data.describe(include='all')

Unnamed: 0,UnitID,Institution Name,City,State,2020,2019,2018,2017,2016,2015,...,1992,1991,1990,1989,1988,1987,1986,1985,1984,1980
count,1655.0,1655,1655,1655,1604.0,1610.0,1615.0,1609.0,1600.0,1592.0,...,1419.0,1411.0,1398.0,1385.0,1381.0,1372.0,1346.0,1248.0,1244.0,1213.0
unique,,1641,1154,50,,,,,,,...,,,,,,,,,,
top,,Stevens-Henager College,Albany,PA,,,,,,,...,,,,,,,,,,
freq,,4,13,92,,,,,,,...,,,,,,,,,,
mean,217579.473112,,,,2398.616584,2483.496273,2494.054489,2523.377253,2556.67375,2590.436558,...,2121.624383,2106.876683,2063.298283,2027.621661,1967.041999,1907.948251,1880.96211,2010.003205,2020.630225,2071.189613
std,98927.042363,,,,4864.879949,4891.920166,4654.226287,4621.963344,4606.026848,4587.824388,...,3502.625082,3546.670384,3535.991936,3511.551573,3444.956798,3388.574577,3301.670263,3452.448693,3432.54743,3471.022008
min,100812.0,,,,3.0,3.0,4.0,5.0,1.0,1.0,...,2.0,12.0,6.0,7.0,3.0,9.0,3.0,13.0,19.0,28.0
25%,156412.5,,,,467.5,502.0,516.5,534.0,552.25,578.5,...,529.5,516.0,495.5,487.0,477.0,445.75,447.25,505.0,500.25,551.0
50%,196121.0,,,,1044.5,1115.0,1124.0,1168.0,1194.5,1228.0,...,1020.0,1015.0,965.0,936.0,882.0,842.0,825.0,885.0,892.5,968.0
75%,228463.5,,,,2120.0,2256.25,2325.0,2346.0,2409.5,2446.5,...,1965.5,1963.5,1868.5,1880.0,1768.0,1721.5,1730.0,1864.5,1879.75,1924.0


From the above, we can see that we have duplicate institution name, city and state. Duplicate city and state is not a problem as there can be more than one institution in a city and state. However, each institution name should be unique. We will have to investigate the possibility that different institutions in different locations have the same.

We do not have even data. There are null values for most years. Information available for institutions lowers as the years go down. 

There are some institutions with as low as 2 people enrolled. This does not make sense for our analysis. We will have to decide what to do about those.

### Fix Identified Problems

First let's see the duplicated institutions

In [42]:
enroll_data[enroll_data.duplicated(subset=['Institution Name','City','State'])].shape
#this shows that we don't have any duplicated institutions. 
#Institutions with the same name in our data are located in different cities.

(0, 42)

In [43]:
enroll_data.info(verbose=True,show_counts=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1655 entries, 0 to 1654
Data columns (total 42 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   UnitID            1655 non-null   int64  
 1   Institution Name  1655 non-null   object 
 2   City              1655 non-null   object 
 3   State             1655 non-null   object 
 4   2020              1604 non-null   float64
 5   2019              1610 non-null   float64
 6   2018              1615 non-null   float64
 7   2017              1609 non-null   float64
 8   2016              1600 non-null   float64
 9   2015              1592 non-null   float64
 10  2014              1580 non-null   float64
 11  2013              1570 non-null   float64
 12  2012              1558 non-null   float64
 13  2011              1547 non-null   float64
 14  2010              1541 non-null   float64
 15  2009              1534 non-null   float64
 16  2008              1519 non-null   float64


Null values could mean that enrollment information was not available for that year. Or they could mean that the university was not operating in that year. We'll need to see details of the null values.

An institution cannot have data for periods before its first enrollment.
Therefore, we should identify the first year of report for each institution, and then fill the nan values before that year with 0.

In [44]:
enroll_data = enroll_data[enroll_data.columns[::-1]]
#above reverses the columns so that we have 1980 as first column
enroll_data = enroll_data.mask(enroll_data.notna().cumsum(axis=1).eq(0),0)
#notna shows the values that are null as False, and cumsum takes the cum sum of the notna dataframe
#hence, the first consecutive null values will all have a value of 0 up to the first non null value
#.eq(0) converts the 0 values to True. so the first consecutive null values will all have a value of True
#the mask function then replaces all values of True with 0.
#hence the first consecutive null values will all have a value of True.
enroll_data = enroll_data[enroll_data.columns[::-1]]
#above reverses the columns back
enroll_data.head()

Unnamed: 0,UnitID,Institution Name,City,State,2020,2019,2018,2017,2016,2015,...,1992,1991,1990,1989,1988,1987,1986,1985,1984,1980
0,177834,A T Still University of Health Sciences,Kirksville,MO,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,...,557.0,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0
1,180203,Aaniiih Nakoda College,Harlem,MT,94.0,99.0,108.0,106.0,114.0,100.0,...,135.0,115.0,94.0,91.0,196.0,0.0,0.0,0.0,0.0,0.0
2,138558,Abraham Baldwin Agricultural College,Tifton,GA,2472.0,2480.0,2660.0,2470.0,2514.0,2484.0,...,1976.0,1896.0,1799.0,1571.0,1428.0,1298.0,1377.0,1484.0,1656.0,2162.0
3,172866,Academy College,Bloomington,MN,87.0,96.0,81.0,45.0,57.0,43.0,...,57.0,82.0,83.0,89.0,73.0,102.0,73.0,0.0,0.0,0.0
4,439969,Acupuncture and Massage College,Miami,FL,171.0,206.0,165.0,151.0,154.0,149.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now we are sure that the remaining na values are due to missing data and not due to a university not existing at the time.

We need to decide what to do with the remaining NA values.

We should set a maximum threshold for NA values per institution, and use a method to fill the rest.

In [45]:
#this gives the index of allinstituions that have na values in half of the years.
#we have just four institutions so we can take them out of our analysis
drop_rows = enroll_data.assign(tot = enroll_data.isna().sum(axis=1)).query('tot > 21').index
enroll_data.drop(index=drop_rows,inplace=True)
enroll_data.head(3)

Unnamed: 0,UnitID,Institution Name,City,State,2020,2019,2018,2017,2016,2015,...,1992,1991,1990,1989,1988,1987,1986,1985,1984,1980
0,177834,A T Still University of Health Sciences,Kirksville,MO,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,...,557.0,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0
1,180203,Aaniiih Nakoda College,Harlem,MT,94.0,99.0,108.0,106.0,114.0,100.0,...,135.0,115.0,94.0,91.0,196.0,0.0,0.0,0.0,0.0,0.0
2,138558,Abraham Baldwin Agricultural College,Tifton,GA,2472.0,2480.0,2660.0,2470.0,2514.0,2484.0,...,1976.0,1896.0,1799.0,1571.0,1428.0,1298.0,1377.0,1484.0,1656.0,2162.0


In [46]:
enroll_data = enroll_data.fillna(method='bfill',axis=1)
#we fill each year with the value from previous year

### Reshape Data

We are not interested in institutions that never had more than 500 students enrolled across each year. So let us remove those institutions from our analysis.

In [47]:
drop_rows2 = enroll_data.assign(lim = lambda x: (x.loc[:,'2020':'1980']>500).sum(axis=1)).query('lim==0').index
len(drop_rows2)
enroll_data.drop(index=drop_rows2,inplace=True)
enroll_data.head(3)
#This shows that 301 institutions never had more than 500 students enrolled during the time period of analysis.
#We are not interested in these instituitons

Unnamed: 0,UnitID,Institution Name,City,State,2020,2019,2018,2017,2016,2015,...,1992,1991,1990,1989,1988,1987,1986,1985,1984,1980
0,177834,A T Still University of Health Sciences,Kirksville,MO,3014.0,3026.0,2849.0,2766.0,2679.0,2559.0,...,557.0,557.0,542.0,532.0,514.0,506.0,517.0,515.0,523.0,504.0
2,138558,Abraham Baldwin Agricultural College,Tifton,GA,2472.0,2480.0,2660.0,2470.0,2514.0,2484.0,...,1976.0,1896.0,1799.0,1571.0,1428.0,1298.0,1377.0,1484.0,1656.0,2162.0
5,126182,Adams State University,Alamosa,CO,1876.0,1905.0,1878.0,1864.0,2026.0,2087.0,...,1974.0,2004.0,2012.0,2051.0,2024.0,1693.0,1711.0,1547.0,1511.0,1737.0


# Data Exploration
First we will analyze rent_prices and enrollment data separately to fully understand the information present in both datasets.
Then, we will see if there's correlation or dependencies between both datasets.

## Rent Price Data

We want to know what locations are expensive, cheap, have severe fluctuations/constant prices.

### Average rent price

**1) What are the average rent prices per room category in each year?**

**2) What are the average rent prices across all categories for each area in each year?**

**3) What are the average rent prices for all areas across all categories in each year?**

Below we can see the average rent price per room type. But this result doesn't mean anything for our analysis. What is more interesting to consumers will be the average rent price per location.

In [62]:
avg_pertype = rent_prices.mean(axis=1)
avg_pertype

Year  Roomtype  
2022  4-bed         1354.731833
      3-bed         1175.837033
      2-bed          892.772575
      1-bed          717.729705
      Efficiency     659.718455
2021  4-bed         1317.847978
      3-bed         1144.194892
      2-bed          864.459106
      1-bed          694.009729
      Efficiency     634.415020
2020  4-bed         1282.885680
      3-bed         1113.073883
      2-bed          837.375190
      1-bed          672.164792
      Efficiency     609.454241
2019  4-bed          777.067194
      3-bed          679.090301
      2-bed          518.662207
      1-bed          419.347218
      Efficiency     355.262694
2018  4-bed          751.238978
      3-bed          656.674977
      2-bed          501.738826
      1-bed          405.627242
      Efficiency     343.712983
2017  4-bed          722.486470
      3-bed          631.637580
      2-bed          482.620249
      1-bed          390.266038
      Efficiency     330.717239
                       

In [467]:
rent_1bed = rent_prices.xs('1-bed',level=1,drop_level=False)
rent_1bed.head()
#rent_1bed = rent_prices.sort_index(axis=0).loc[(slice(None),slice('1-bed')),:]
#rent_1bed.head()
#rent_1bed = rent_prices.sort_index(axis=0).loc[pd.IndexSlice[:,'1-bed'],:]
#rent_1bed.head()
#rent_1bed_county = rent_prices.sort_index(axis=0).loc[(slice(None),slice('1-bed')),:]

#Different ways of slicing multiindex to separate data into roomtype

Unnamed: 0_level_0,pmsaname,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
Unnamed: 0_level_1,cntyname,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
Unnamed: 0_level_2,areaname,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
Year,Roomtype,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
2022,1-bed,764.0,777.0,536.0,820.0,820.0,602.0,619.0,565.0,659.0,536.0,...,402.0,466.0,466.0,370.0,399.0,466.0,370.0,886.0,1368.0,1001.0
2021,1-bed,766.0,723.0,492.0,871.0,871.0,549.0,556.0,549.0,620.0,481.0,...,373.0,477.0,477.0,369.0,393.0,477.0,369.0,852.0,1314.0,962.0
2020,1-bed,702.0,749.0,481.0,861.0,861.0,484.0,521.0,531.0,593.0,465.0,...,374.0,451.0,451.0,349.0,388.0,451.0,359.0,815.0,1258.0,921.0
2019,1-bed,453.0,454.0,311.0,311.0,479.0,311.0,311.0,329.0,311.0,311.0,...,263.0,439.0,439.0,263.0,327.0,439.0,327.0,618.0,792.0,792.0
2018,1-bed,441.0,442.0,302.0,302.0,467.0,302.0,302.0,321.0,302.0,302.0,...,256.0,428.0,428.0,256.0,319.0,428.0,319.0,600.0,770.0,770.0


Here, we can see the average rent price for each location. This is what is useful to consumers. This is what consumers will use to make decisions regarding where to rent>

In [64]:
avg_price = rent_prices.groupby('Year').mean()
avg_price

Unnamed: 0_level_0,"Montgomery, AL MSA","Mobile, AL MSA","Barbour County, AL","Bibb County, AL","Birmingham, AL MSA","Bullock County, AL","Butler County, AL","Anniston, AL MSA","Chambers County, AL","Cherokee County, AL",...,Puerto Rico HUD Nonmetro FMR Area,"San Juan--Bayamón, PR PMSA","San Juan--Bayamón, PR PMSA",Puerto Rico HUD Nonmetro FMR Area,"Ponce, PR MSA","San Juan--Bayamón, PR PMSA","Ponce, PR MSA","St. Croix, VI","St. John/St. Thomas, VI","St. John/St. Thomas, VI"
Unnamed: 0_level_1,Autauga County,Baldwin County,Barbour County,Bibb County,Blount County,Bullock County,Butler County,Calhoun County,Chambers County,Cherokee County,...,Utuado Municipio,Vega Alta Municipio,Vega Baja Municipio,Vieques Municipio,Villalba Municipio,Yabucoa Municipio,Yauco Municipio,St. Croix,St. John,St. Thomas
Unnamed: 0_level_2,"Montgomery, AL MSA","Daphne-Fairhope-Foley, AL MSA","Barbour County, AL","Birmingham-Hoover, AL HUD Metro FMR Area","Birmingham-Hoover, AL HUD Metro FMR Area","Bullock County, AL","Butler County, AL","Anniston-Oxford-Jacksonville, AL MSA","Chambers County, AL","Cherokee County, AL",...,"Utuado Municipio, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area",Puerto Rico HUD Nonmetro Area,"Ponce, PR HUD Metro FMR Area","San Juan-Guaynabo, PR HUD Metro FMR Area","Yauco, PR HUD Metro FMR Area","St. Croix Island, VI","St. John Island, VI","St. Thomas Island, VI"
Year,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
1983,605.6,582.6,394.4,408.2,612.6,394.4,394.4,450.0,394.6,394.4,...,321.2,540.0,540.0,321.2,401.8,540.0,401.8,756.8,970.8,970.8
1985,667.2,624.2,440.0,449.4,611.0,436.2,436.2,498.6,445.6,452.2,...,338.0,462.4,462.4,338.0,384.8,462.4,384.8,617.4,801.6,801.6
1986,684.8,666.8,452.4,613.4,613.4,449.4,449.4,512.8,477.4,465.4,...,393.0,519.8,519.8,393.0,504.6,519.8,392.6,635.2,824.6,824.6
1987,712.2,693.6,471.0,637.6,637.6,467.2,467.2,533.2,497.2,483.8,...,408.6,568.4,568.4,408.6,525.2,568.4,408.2,660.8,857.2,857.2
1988,754.6,738.8,502.0,728.6,728.6,496.4,496.4,568.8,527.8,512.0,...,426.6,581.2,581.2,426.6,547.0,581.2,426.0,725.6,939.6,939.6
1989,807.0,789.2,536.0,737.0,737.0,530.8,530.8,607.8,564.0,546.6,...,443.4,584.2,584.2,443.4,568.6,584.2,442.6,753.0,976.4,976.4
1990,840.4,822.8,558.0,776.0,776.0,553.0,553.0,632.8,588.2,568.8,...,445.6,647.4,647.4,445.6,571.0,647.4,445.0,772.0,1002.2,1002.2
1991,840.4,822.8,558.0,829.8,829.8,553.0,553.0,632.8,588.2,568.8,...,477.8,627.2,627.2,477.8,612.0,627.2,477.2,763.6,990.2,990.2
1992,826.6,744.0,516.8,792.0,792.0,501.6,501.6,683.6,571.6,507.0,...,474.6,622.6,622.6,474.6,607.4,622.6,474.2,752.0,974.4,974.4
1993,892.2,830.2,590.4,838.4,838.4,670.4,638.0,652.0,652.6,654.4,...,500.4,656.4,656.4,500.4,640.6,656.4,499.6,787.8,1022.0,1022.0


Here we can see the average rent price for each year across all locations. We notice that rent price significantly dipped in the year 2000 and picked up gradually. There is also a significant jump in average rent price from 2020. We will analyze this later.

In [66]:
avg_peryear = avg_price.mean(axis=1)
avg_peryear

Year
1983    553.438249
1985    587.287321
1986    618.555366
1987    641.614351
1988    682.523503
1989    706.397993
1990    725.935725
1991    736.317057
1992    719.740833
1993    770.590696
1994    774.886774
1995    787.768319
1996    815.371967
1997    846.062268
1998    867.739434
1999    884.591304
2000    330.745698
2001    344.977805
2002    366.311341
2003    379.863545
2004    392.648769
2005    402.035816
2006    412.864518
2007    426.084950
2008    441.830404
2009    452.291943
2010    434.552083
2011    431.384980
2012    464.948252
2013    474.612770
2014    485.803284
2015    494.889450
2016    499.957312
2017    511.545515
2018    531.798601
2019    549.885923
2020    902.990757
2021    930.985345
2022    960.157920
dtype: float64

## Locations

**What makes sense for our analysis is the average rent price in each location. Let's answer some more questions:**

**1. What are the top 10 most expensive areas to rent in per year? How often do the same locations show up?**<br>
**2. What are the top 10 least expensive areas to rent in per year? How often do the same lcoations show up?**<br>
**3. What are the top 10 areas that have had the highest percentage increase in rent?**<br>
**4. What are the top 10 areas that have had the highest perentage decrease in rent?**<br>
**5. What are the top 10 areas that have had the smallest percentage changes in rent?**<br>

In [449]:
rank_avg = avg_price.T
rank_avg.head()
#let's transpose our data for better visualisation

Unnamed: 0,Unnamed: 1,Year,1983,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
"Montgomery, AL MSA",Autauga County,"Montgomery, AL MSA",605.6,667.2,684.8,712.2,754.6,807.0,840.4,840.4,826.6,892.2,...,534.0,548.4,559.4,562.6,572.2,588.8,603.8,917.4,996.4,994.2
"Mobile, AL MSA",Baldwin County,"Daphne-Fairhope-Foley, AL MSA",582.6,624.2,666.8,693.6,738.8,789.2,822.8,822.8,744.0,830.2,...,472.2,528.0,538.2,541.2,550.2,566.0,580.8,1045.2,1039.2,1125.8
"Barbour County, AL",Barbour County,"Barbour County, AL",394.4,440.0,452.4,471.0,502.0,536.0,558.0,558.0,516.8,590.4,...,357.4,365.0,369.8,369.8,373.2,383.0,394.4,661.0,668.2,724.8
"Bibb County, AL",Bibb County,"Birmingham-Hoover, AL HUD Metro FMR Area",408.2,449.4,613.4,637.6,728.6,737.0,776.0,829.8,792.0,838.4,...,369.6,377.6,382.4,382.4,386.2,396.2,408.2,1073.4,1080.4,1012.8
"Birmingham, AL MSA",Blount County,"Birmingham-Hoover, AL HUD Metro FMR Area",612.6,611.0,613.4,637.6,728.6,737.0,776.0,829.8,792.0,838.4,...,504.0,517.2,527.4,569.2,579.0,595.8,610.6,1073.4,1080.4,1012.8


The function below returns the top ten locations with highest rent prices for each year.
We observe that there are several repeated valeus across the years. 

In [472]:
def high(df):
    names = []
    years = df.columns.tolist()
    for year in years:
        values = df.nlargest(10,year).index.tolist()
        names.append(values)
    highest_ten = pd.DataFrame(dict(zip(df.columns,names)), index = list(range(1,11)))
    return highest_ten
highest_ten = high(rank_avg)
highest_ten
#this function first extracts the years into a list.
#it then finds the index of the 10 most expensive location per year
#it then creates a dataframe to better visualise the data

Unnamed: 0,1983,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
1,"(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Francisco, CA PMSA, Marin County, San Fra...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(San Francisco, CA PMSA, Marin County, San Fra...","(Honolulu, HI MSA, Honolulu County, Urban Hono...",...,"(Kalawao County, HI, Kalawao County, Kalawao C...","(Nantucket County, MA, Nantucket County, Nantu...","(Nantucket County, MA, Nantucket County, Nantu...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, Marin County, San Fra..."
2,"(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, Marin County, San Fra...","(Orange County, CA PMSA, Orange County, Santa...","(Orange County, CA PMSA, Orange County, Santa...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, Marin County, San Fra...",...,"(Nantucket County, MA, Nantucket County, Nantu...","(Kalawao County, HI, Kalawao County, Kalawao C...","(Kalawao County, HI, Kalawao County, Kalawao C...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Francisco County,..."
3,"(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Francisco County,...","(Ventura, CA PMSA, Ventura County, Oxnard-Thou...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Orange County, CA PMSA, Orange County, Santa...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Francisco County,...",...,"(Maui County, HI, Maui County, Maui County, HI...","(Maui County, HI, Maui County, Maui County, HI...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Mateo County, San..."
4,"(San Francisco, CA PMSA, San Mateo County, San...","(Ventura, CA PMSA, Ventura County, Oxnard-Thou...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, Marin County, San Fra...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Mateo County, San...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(San Francisco, CA PMSA, San Mateo County, San...",...,"(San Miguel County, CO, San Miguel County, San...","(San Miguel County, CO, San Miguel County, San...","(Maui County, HI, Maui County, Maui County, HI...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Francisco, CA PMSA, San Mateo County, San...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ..."
5,"(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Orange County, CA PMSA, Orange County, Santa...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Francisco County,...","(San Francisco, CA PMSA, San Mateo County, San...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Nantucket County, MA, Nantucket County, Nantu...",...,"(Northwest Arctic Borough, AK, Northwest Arcti...","(Northwest Arctic Borough, AK, Northwest Arcti...","(San Francisco, CA PMSA, Marin County, San Fra...","(Nantucket County, MA, Nantucket County, Nantu...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(San Jose, CA PMSA, Santa Clara County, San Jo..."
6,"(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Ventura, CA PMSA, Ventura County, Oxnard-Thou...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Mateo County, San...","(San Francisco, CA PMSA, San Mateo County, San...","(Orange County, CA PMSA, Orange County, Santa...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(Orange County, CA PMSA, Orange County, Santa...","(Kauai County, HI, Kauai County, Kauai County,...",...,"(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(San Francisco, CA PMSA, San Francisco County,...","(Kalawao County, HI, Kalawao County, Kalawao C...","(Nantucket County, MA, Nantucket County, Nantu...","(Nantucket County, MA, Nantucket County, Nantu...","(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Santa Barbara--Santa Maria--Lompoc, CA MSA, S..."
7,"(Oakland, CA PMSA, Contra Costa County, Oaklan...","(Oakland, CA PMSA, Contra Costa County, Oaklan...","(Pitkin County, CO, Pitkin County, Pitkin Coun...","(Pitkin County, CO, Pitkin County, Pitkin Coun...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Westchester County, NY HUD Metro FMR Area, We...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(Orange County, CA PMSA, Orange County, Santa...",...,"(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(San Francisco, CA PMSA, San Mateo County, San...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Kalawao County, HI, Kalawao County, Kalawao C...","(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Oakland, CA PMSA, Contra Costa County, Oaklan...","(Oakland, CA PMSA, Contra Costa County, Oaklan...","(Oakland, CA PMSA, Contra Costa County, Oaklan...","(Orange County, CA PMSA, Orange County, Santa..."
8,"(Boston, MA--NH PMSA, Bristol County, Taunton-...","(Orange County, CA PMSA, Orange County, Santa...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(Westchester County, NY HUD Metro FMR Area, We...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(Westchester County, NY HUD Metro FMR Area, We...","(Orange County, CA PMSA, Orange County, Santa...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(San Jose, CA PMSA, Santa Clara County, San Jo...",...,"(Denver, CO PMSA, Broomfield County, Denver-Au...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(San Miguel County, CO, San Miguel County, San...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Oakland, CA PMSA, Contra Costa County, Oaklan...","(Nantucket County, MA, Nantucket County, Nantu...","(Orange County, CA PMSA, Orange County, Santa...","(Santa Barbara--Santa Maria--Lompoc, CA MSA, S...","(Honolulu, HI MSA, Honolulu County, Urban Hono..."
9,"(Boston, MA--NH PMSA, Essex County, Boston-Cam...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(San Jose, CA PMSA, Santa Clara County, San Jo...","(Brockton, MA PMSA, Bristol County, Easton-Ray...","(Westchester County, NY HUD Metro FMR Area, We...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Stamford--Norwalk, CT PMSA, Fairfield County,...",...,"(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Kauai County, HI, Kauai County, Kauai County,...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Stamford--Norwalk, CT PMSA, Fairfield County,...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...","(Boston, MA--NH PMSA, Bristol County, Taunton-...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(Orange County, CA PMSA, Orange County, Santa...","(Nantucket County, MA, Nantucket County, Nantu..."
10,"(Boston, MA--NH PMSA, Middlesex County, Boston...","(Santa Cruz--Watsonville, CA PMSA, Santa Cruz ...","(Brockton, MA PMSA, Bristol County, Easton-Ray...","(Honolulu, HI MSA, Honolulu County, Urban Hono...","(Nantucket County, MA, Nantucket County, Nantu...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Nantucket County, MA, Nantucket County, Nantu...","(Nassau--Suffolk, NY PMSA, Nassau County, Nass...",...,"(Kauai County, HI, Kauai County, Kauai County,...","(Denver, CO PMSA, Broomfield County, Denver-Au...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(San Miguel County, CO, San Miguel County, San...","(Oakland, CA PMSA, Alameda County, Oakland-Fre...","(Nassau--Suffolk, NY PMSA, Suffolk County, Nas...","(Boston, MA--NH PMSA, Essex County, Boston-Cam...","(Seattle--Bellevue--Everett, WA PMSA, King Cou...","(Boston, MA--NH PMSA, Essex County, Boston-Cam...","(Ventura, CA PMSA, Ventura County, Oxnard-Thou..."


We want to know the locations that are most frequently repeated. This result means that these locations are consistently among the most expensive areas to rent in the United States.

We can see that San Francisco, CA area is the most consistent expensive area to rent in, with three counties in the area being the most frequent member of top 10 most expensive over the years.

We can also see that the frequency range is very wide from 31 to 1. The areas with very high frequencies have been very expensive over the years.

It will make sense to see if there were unusual trends in a particular year that made the areas with low frequencies one of the most expensive.

In [474]:
high_freq = pd.DataFrame(highest_ten.apply(pd.value_counts).fillna(0).sum(axis=1).sort_values(ascending = False),
             columns =['Frequency'])
high_freq.index.set_names(['pmsaname','cntyname','areaname'],inplace=True)
high_freq

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Frequency
pmsaname,cntyname,areaname,Unnamed: 3_level_1
"San Francisco, CA PMSA",Marin County,"San Francisco, CA HUD Metro FMR Area",31.0
"San Francisco, CA PMSA",San Francisco County,"San Francisco, CA HUD Metro FMR Area",30.0
"San Francisco, CA PMSA",San Mateo County,"San Francisco, CA HUD Metro FMR Area",30.0
"Stamford--Norwalk, CT PMSA",Fairfield County,"Stamford-Norwalk, CT HUD Metro FMR Area",27.0
"San Jose, CA PMSA",Santa Clara County,"San Jose-Sunnyvale-Santa Clara, CA HUD Metro FMR Area",20.0
"Kalawao County, HI",Kalawao County,"Kalawao County, HI HUD Metro FMR Area",18.0
"Honolulu, HI MSA",Honolulu County,"Urban Honolulu, HI MSA",16.0
"Nantucket County, MA",Nantucket County,"Nantucket County, MA",15.0
"Denver, CO PMSA",Broomfield County,"Denver-Aurora-Lakewood, CO MSA",14.0
"Orange County, CA PMSA",Orange County,"Santa Ana-Anaheim-Irvine, CA HUD Metro FMR Area",14.0


We repeat the same analysis but to see the cheapest areas

In [475]:
def low(df):
    names = []
    years = rank_avg.columns.tolist()
    for year in years:
        values = rank_avg.nsmallest(10,year).index.tolist()
        names.append(values)
    lowest_ten = pd.DataFrame(dict(zip(df.columns,names)), index = list(range(1,11)))
    return lowest_ten
lowest_ten = low(rank_avg)
lowest_ten

Unnamed: 0,1983,1985,1986,1987,1988,1989,1990,1991,1992,1993,...,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
1,"(Covington city, VA, Covington city, Alleghany...","(Covington city, VA, Covington city, Alleghany...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Northern Mariana Islands, Northern Mariana Is...","(Northern Mariana Islands, Northern Mariana Is...","(Northern Mariana Islands, Northern Mariana Is...","(Northern Mariana Islands, Northern Mariana Is...","(Northern Mariana Islands, Northern Mariana Is...","(Northern Mariana Islands, Northern Mariana Is...","(Northern Mariana Islands, Northern Mariana Is...",...,"(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu..."
2,"(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Clay County, KY, Clay County, Clay County, KY)","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Mayagüez, PR MSA, Cabo Rojo Municipio, San Ge...","(Mayagüez, PR MSA, Cabo Rojo Municipio, San Ge...","(Mayagüez, PR MSA, Cabo Rojo Municipio, San Ge...","(Mayagüez, PR MSA, Cabo Rojo Municipio, San Ge...","(Mayagüez, PR MSA, Cabo Rojo Municipio, San Ge...","(American Samoa, American Samoa, American Samoa)",...,"(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P..."
3,"(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Puerto Rico HUD Nonmetro FMR Area, Lajas Muni...","(Puerto Rico HUD Nonmetro FMR Area, Lajas Muni...","(Puerto Rico HUD Nonmetro FMR Area, Lajas Muni...","(Puerto Rico HUD Nonmetro FMR Area, Lajas Muni...","(Puerto Rico HUD Nonmetro FMR Area, Lajas Muni...","(Mayagüez, PR MSA, Cabo Rojo Municipio, San Ge...",...,"(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ..."
4,"(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Mayagüez, PR MSA, Sabana Grande Municipio, Sa...","(Mayagüez, PR MSA, Sabana Grande Municipio, Sa...","(Mayagüez, PR MSA, Sabana Grande Municipio, Sa...","(Mayagüez, PR MSA, Sabana Grande Municipio, Sa...","(Mayagüez, PR MSA, Sabana Grande Municipio, Sa...","(Puerto Rico HUD Nonmetro FMR Area, Lajas Muni...",...,"(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Maunabo Mu...","(Puerto Rico HUD Nonmetro FMR Area, Maunabo Mu...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD..."
5,"(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Mayagüez, PR MSA, San Germán Municipio, San G...","(Mayagüez, PR MSA, San Germán Municipio, San G...","(Mayagüez, PR MSA, San Germán Municipio, San G...","(Mayagüez, PR MSA, San Germán Municipio, San G...","(Mayagüez, PR MSA, San Germán Municipio, San G...","(Mayagüez, PR MSA, Sabana Grande Municipio, Sa...",...,"(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Orocovis M...","(Puerto Rico HUD Nonmetro FMR Area, Orocovis M...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun..."
6,"(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Mayagüez, PR MSA, San Germán Municipio, San G...",...,"(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Utuado Mun...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu..."
7,"(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...",...,"(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Arroyo Mun...","(Puerto Rico HUD Nonmetro FMR Area, Patillas M..."
8,"(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Coamo Muni...","(Puerto Rico HUD Nonmetro FMR Area, Jayuya Mun...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...","(Ponce, PR MSA, Guayanilla Municipio, Yauco, P...",...,"(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Aibonito M..."
9,"(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Culebra Mu...","(Puerto Rico HUD Nonmetro FMR Area, Las Marías...","(Puerto Rico HUD Nonmetro FMR Area, Jayuya Mun...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...","(Ponce, PR MSA, Peñuelas Municipio, Yauco, PR ...",...,"(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Jayuya Mun...","(Puerto Rico HUD Nonmetro FMR Area, Patillas M...","(Puerto Rico HUD Nonmetro FMR Area, Barranquit..."
10,"(Puerto Rico HUD Nonmetro FMR Area, Guayama Mu...","(Puerto Rico HUD Nonmetro FMR Area, Guánica Mu...","(Puerto Rico HUD Nonmetro FMR Area, Maricao Mu...","(Puerto Rico HUD Nonmetro FMR Area, Las Marías...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Montgomery County, KY, Montgomery County, Mon...","(Ponce, PR MSA, Yauco Municipio, Yauco, PR HUD...",...,"(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Isabela Mu...","(Puerto Rico HUD Nonmetro FMR Area, Las Marías...","(Puerto Rico HUD Nonmetro FMR Area, Adjuntas M...","(Puerto Rico HUD Nonmetro FMR Area, Ciales Mun..."


We can see that the most consistent cheapest places to rent in the USA are in Puerto Rico and some districts in LA.

In [476]:
low_freq = pd.DataFrame(lowest_ten.apply(pd.value_counts).fillna(0).sum(axis=1).sort_values(ascending = False),
             columns =['Frequency'])
low_freq.index.set_names(['pmsaname','cntyname','areaname'],inplace=True)
low_freq

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Frequency
pmsaname,cntyname,areaname,Unnamed: 3_level_1
Puerto Rico HUD Nonmetro FMR Area,Adjuntas Municipio,Puerto Rico HUD Nonmetro Area,24.0
Puerto Rico HUD Nonmetro FMR Area,Guánica Municipio,"Yauco, PR HUD Metro FMR Area",19.0
Puerto Rico HUD Nonmetro FMR Area,Culebra Municipio,Puerto Rico HUD Nonmetro Area,18.0
Puerto Rico HUD Nonmetro FMR Area,Coamo Municipio,Puerto Rico HUD Nonmetro Area,18.0
Puerto Rico HUD Nonmetro FMR Area,Aibonito Municipio,"Barranquitas-Aibonito, PR HUD Metro FMR Area",15.0
Puerto Rico HUD Nonmetro FMR Area,Barranquitas Municipio,"Barranquitas-Aibonito, PR HUD Metro FMR Area",15.0
Puerto Rico HUD Nonmetro FMR Area,Ciales Municipio,"Barranquitas-Aibonito, PR HUD Metro FMR Area",15.0
Puerto Rico HUD Nonmetro FMR Area,Arroyo Municipio,"Guayama, PR MSA",13.0
Puerto Rico HUD Nonmetro FMR Area,Guayama Municipio,"Guayama, PR MSA",12.0
"Ponce, PR MSA",Yauco Municipio,"Yauco, PR HUD Metro FMR Area",9.0


Now let us see the areas that have experienced the highest changes in rent price between 1983 to 2022.
It will be nice to analyze what factors led to the sharp growth in rent price. The highest location has grown by 2.5% in the period of analysis.

In [457]:
pd.DataFrame(rank_avg.assign(pct_change = lambda x: (x['2022'] - x['1983'])/x['1983'])\
             [['pct_change','1983','2022']].nlargest(10,'pct_change'))

Unnamed: 0,Unnamed: 1,Year,pct_change,1983,2022
"Cannon County, TN",Cannon County,"Nashville-Davidson--Murfreesboro--Franklin, TN HUD Metro FMR Area",2.544856,394.6,1398.8
"Trousdale County, TN",Trousdale County,"Nashville-Davidson--Murfreesboro--Franklin, TN HUD Metro FMR Area",2.209729,435.8,1398.8
"Heard County, GA",Heard County,"Atlanta-Sandy Springs-Roswell, GA HUD Metro FMR Area",2.060095,462.6,1415.6
"Skamania County, WA",Skamania County,"Portland-Vancouver-Hillsboro, OR-WA MSA",2.051892,555.0,1693.8
"Jasper County, GA",Jasper County,"Atlanta-Sandy Springs-Roswell, GA HUD Metro FMR Area",2.006797,470.8,1415.6
"Hampshire County, WV",Hampshire County,"Winchester, VA-WV MSA",1.952722,418.8,1236.6
"Denver, CO PMSA",Broomfield County,"Denver-Aurora-Lakewood, CO MSA",1.922276,615.0,1797.2
"Odessa--Midland, TX MSA",Midland County,"Midland, TX HUD Metro FMR Area",1.920937,546.4,1596.0
"McKenzie County, ND",McKenzie County,"McKenzie County, ND",1.880057,420.2,1210.2
"Clarke County, VA HUD Metro FMR Area",Clarke County,"Washington-Arlington-Alexandria, DC-VA-MD HUD Metro FMR Area",1.871616,687.0,1972.8


Let us see if we have areas where rent price has decreased.

In [458]:
pd.DataFrame(rank_avg.assign(pct_change = lambda x: (x['2022'] - x['1983'])/x['1983'])\
             .query('pct_change < 0')[['pct_change','1983','2022']].nsmallest(10,'pct_change'))

Unnamed: 0,Unnamed: 1,Year,pct_change,1983,2022
"Cleveland--Lorain--Elyria, OH PMSA",Ashtabula County,"Ashtabula County, OH",-0.013171,774.4,764.2
"Boston, MA--NH PMSA",Bristol County,"Taunton-Mansfield-Norton, MA HUD Metro FMR Area",-0.004987,1484.0,1476.6
"Boston, MA--NH PMSA",Worcester County,"Eastern Worcester County, MA HUD Metro FMR Area",-0.004582,1484.0,1477.2


Let us see areas where rent price has increased by 10% or below. There are very few locations where this has happened.

In [466]:
pd.DataFrame(rank_avg.assign(pct_change = lambda x: (x['2022'] - x['1983'])/x['1983'])\
             .query('pct_change <= 0.1')[['pct_change','1983','2022']])

Unnamed: 0,Unnamed: 1,Year,pct_change,1983,2022
"Lake and Peninsula Borough, AK",Lake and Peninsula Borough,"Lake and Peninsula Borough, AK",0.085091,834.4,905.4
"Boston, MA--NH PMSA",Bristol County,"Taunton-Mansfield-Norton, MA HUD Metro FMR Area",-0.004987,1484.0,1476.6
"Boston, MA--NH PMSA",Worcester County,"Eastern Worcester County, MA HUD Metro FMR Area",-0.004582,1484.0,1477.2
"Ann Arbor, MI PMSA",Lenawee County,"Lenawee County, MI",0.005131,857.6,862.0
"Las Vegas, NV--AZ MSA",Nye County,"Nye County, NV",0.060264,955.8,1013.4
"Cleveland--Lorain--Elyria, OH PMSA",Ashtabula County,"Ashtabula County, OH",-0.013171,774.4,764.2
"San Juan--Bayamón, PR PMSA",Ceiba Municipio,"Fajardo, PR HUD Metro FMR Area",0.084074,540.0,585.4
"San Juan--Bayamón, PR PMSA",Fajardo Municipio,"Fajardo, PR HUD Metro FMR Area",0.084074,540.0,585.4
"San Juan--Bayamón, PR PMSA",Luquillo Municipio,"Fajardo, PR HUD Metro FMR Area",0.084074,540.0,585.4


Here we can see the quantiles of pct_change. This shows that 50% of all locations have changed rent price from 0.25 to 0.75

In [481]:
pd.DataFrame(rank_avg.assign(pct_change = lambda x: (x['2022'] - x['1983'])/x['1983']))\
             ['pct_change'].quantile([0.25,0.75])

0.25    0.630120
0.75    0.885283
Name: pct_change, dtype: float64

### Next Steps...

1) Account for inflation
2) Do the analysis per state
3) Do the analysis per U.S. retion (midwest, south, north etc.)
4) Analyze university enrollment