# Rental Rates of Major Metro Markets
## Machine Learning Model based off data provided by Zillow.com

Zillow Observed Rent Index (ZORI): A smoothed measure of the typical observed market rate rent across a given region. ZORI is a repeat-rent index that is weighted to the rental housing stock to ensure representativeness across the entire market, not just those homes currently listed for-rent. The index is dollar-denominated by computing the mean of listed rents that fall into the 40th to 60th percentile range for all homes and apartments in a given region, which is once again weighted to reflect the rental housing stock. Details available in ZORI methodology.
https://www.zillow.com/research/methodology-zori-repeat-rent-27092/

In [56]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [57]:
# Imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn import preprocessing
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.cluster import DBSCAN, KMeans
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score, classification_report, silhouette_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

In [58]:
df = pd.read_csv('/content/drive/MyDrive/DS-CodingDojoColab/Project2/Zip_ZORI_AllHomesPlusMultifamily_SSA.csv')
df.tail()

Unnamed: 0,RegionID,RegionName,SizeRank,MsaName,2014-01,2014-02,2014-03,2014-04,2014-05,2014-06,2014-07,2014-08,2014-09,2014-10,2014-11,2014-12,2015-01,2015-02,2015-03,2015-04,2015-05,2015-06,2015-07,2015-08,2015-09,2015-10,2015-11,2015-12,2016-01,2016-02,2016-03,2016-04,2016-05,2016-06,2016-07,2016-08,2016-09,2016-10,2016-11,2016-12,...,2017-09,2017-10,2017-11,2017-12,2018-01,2018-02,2018-03,2018-04,2018-05,2018-06,2018-07,2018-08,2018-09,2018-10,2018-11,2018-12,2019-01,2019-02,2019-03,2019-04,2019-05,2019-06,2019-07,2019-08,2019-09,2019-10,2019-11,2019-12,2020-01,2020-02,2020-03,2020-04,2020-05,2020-06,2020-07,2020-08,2020-09,2020-10,2020-11,2020-12
2258,61618,10004,8842,"New York, NY",2836.0,2863.0,2890.0,2916.0,2942.0,2968.0,2994.0,3020.0,3046.0,3072.0,3093.0,3114.0,3135.0,3154.0,3173.0,3192.0,3214.0,3236.0,3258.0,3268.0,3279.0,3290.0,3291.0,3292.0,3294.0,3289.0,3284.0,3280.0,3270.0,3260.0,3250.0,3247.0,3244.0,3241.0,3240.0,3240.0,...,3210.0,3207.0,3204.0,3202.0,3200.0,3203.0,3205.0,3208.0,3211.0,3214.0,3217.0,3218.0,3219.0,3219.0,3224.0,3229.0,3233.0,3240.0,3247.0,3254.0,3263.0,3273.0,3282.0,3290.0,3298.0,3306.0,3299.0,3293.0,3286.0,3247.0,3207.0,3168.0,3118.0,3068.0,3018.0,2965.0,2913.0,2860.0,2803.0,2747.0
2259,58622,2108,8986,"Boston, MA",2392.0,2393.0,2394.0,2395.0,2396.0,2398.0,2400.0,2402.0,2403.0,2405.0,2412.0,2418.0,2425.0,2436.0,2446.0,2457.0,2463.0,2468.0,2474.0,2478.0,2481.0,2485.0,2490.0,2495.0,2500.0,2503.0,2506.0,2509.0,2516.0,2522.0,2529.0,2539.0,2549.0,2559.0,2565.0,2572.0,...,2585.0,2582.0,2582.0,2582.0,2581.0,2586.0,2590.0,2594.0,2597.0,2599.0,2602.0,2607.0,2611.0,2616.0,2626.0,2636.0,2646.0,2657.0,2667.0,2678.0,2694.0,2710.0,2725.0,2745.0,2764.0,2783.0,2788.0,2793.0,2798.0,2787.0,2776.0,2765.0,2752.0,2739.0,2726.0,2711.0,2696.0,2681.0,2664.0,2647.0
2260,72525,33306,9047,"Miami-Fort Lauderdale, FL",1105.0,1121.0,1137.0,1153.0,1167.0,1182.0,,1209.0,1222.0,1235.0,,1250.0,1257.0,1258.0,1259.0,,1255.0,1250.0,1244.0,,,1240.0,,,1252.0,1255.0,1258.0,1260.0,1265.0,1270.0,1275.0,1281.0,1287.0,,,1300.0,...,1318.0,1323.0,1327.0,1332.0,1336.0,1342.0,1348.0,1354.0,1361.0,1368.0,1374.0,1380.0,,,1400.0,1410.0,,1432.0,1443.0,1455.0,1465.0,1474.0,1484.0,1488.0,1491.0,1495.0,1493.0,1491.0,1489.0,1491.0,,1497.0,1500.0,1502.0,1505.0,1507.0,1510.0,1513.0,1516.0,
2261,58624,2110,9469,"Boston, MA",4037.0,4030.0,4023.0,4016.0,,4006.0,4001.0,,4006.0,,,,4015.0,,4087.0,4124.0,4187.0,4250.0,4313.0,,4356.0,,4386.0,,4403.0,4417.0,4430.0,4444.0,4467.0,4490.0,4513.0,4550.0,4586.0,4622.0,,4666.0,...,,4557.0,4539.0,4522.0,4504.0,4486.0,4469.0,4452.0,4456.0,4460.0,4463.0,4478.0,4492.0,4507.0,4518.0,4529.0,4540.0,4556.0,4572.0,4587.0,4606.0,4624.0,4642.0,4636.0,4629.0,4623.0,4601.0,4580.0,4558.0,4512.0,,,4366.0,4313.0,4260.0,4204.0,4149.0,4093.0,4037.0,3980.0
2262,66128,20004,9592,"Washington, DC",,,2297.0,2308.0,,2329.0,,2349.0,2359.0,,,,2380.0,2377.0,2375.0,2372.0,2373.0,2373.0,2374.0,2375.0,2377.0,,2377.0,2374.0,,2379.0,2385.0,2391.0,2399.0,2408.0,2416.0,2424.0,2432.0,2440.0,2452.0,2465.0,...,2473.0,2465.0,2454.0,2443.0,2432.0,2423.0,2415.0,2406.0,2401.0,2397.0,2392.0,2394.0,2395.0,2396.0,2399.0,2401.0,2404.0,2405.0,2407.0,2408.0,2412.0,2417.0,2421.0,2425.0,2429.0,2434.0,2438.0,2441.0,2445.0,2444.0,2443.0,2441.0,2436.0,2431.0,2426.0,2420.0,2414.0,2408.0,2401.0,2394.0


I can already see in the last 5 rows alone that there are some NaN null values... let's get some basic info and find null values.

In [61]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2263 entries, 0 to 2262
Data columns (total 88 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   RegionID    2263 non-null   int64  
 1   RegionName  2263 non-null   int64  
 2   SizeRank    2263 non-null   int64  
 3   MsaName     2263 non-null   object 
 4   2014-01     1726 non-null   float64
 5   2014-02     1785 non-null   float64
 6   2014-03     1740 non-null   float64
 7   2014-04     1961 non-null   float64
 8   2014-05     2030 non-null   float64
 9   2014-06     2094 non-null   float64
 10  2014-07     2129 non-null   float64
 11  2014-08     2119 non-null   float64
 12  2014-09     2113 non-null   float64
 13  2014-10     2062 non-null   float64
 14  2014-11     2022 non-null   float64
 15  2014-12     2027 non-null   float64
 16  2015-01     2066 non-null   float64
 17  2015-02     2060 non-null   float64
 18  2015-03     2144 non-null   float64
 19  2015-04     2146 non-null  

It appears that there are a lot of missing values for each month's rent reporting, so we're going to want to impute those.  Given that we generally know some value for a given location's rent before or after a missing month, it's a safe assumption we can fill the missing values with the median of before and after.

https://www.geeksforgeeks.org/python-pandas-dataframe-interpolate/



In [62]:
rent = df.loc[5:,:].interpolate(method='linear', axis=1)
rent

TypeError: ignored

In [None]:
rent.info()