Lambda School Data Science

*Unit 2, Sprint 1, Module 2*

---

# Regression 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [X] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [X] Engineer at least two new features. (See below for explanation & ideas.)
- [X] Fit a linear regression model with at least two features.
- [X] Get the model's coefficients and intercept.
- [X] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [X] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [X] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [X] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [X] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [X] Add your own stretch goal(s) !

In [2]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [125]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv(DATA_PATH+'apartments/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [126]:
df = df.dropna(subset=['description', 'display_address', 'street_address'], axis=0)

In [127]:
# These rows have no data on Mapbox
df = df.drop([380, 5149, 45372])

### Engineer at least two new features

In [5]:
# Generate feature: does the apartment have a description?
no_description = ['        ', '<p><a  website_redacted ', ' ']
condition = df['description'].isin(no_description)
df.loc[condition, 'has_description'] = 0
df.loc[~condition, 'has_description'] = 1
df['has_description'] = df['has_description'].apply(int)

In [6]:
# Generate feature: length of the apartment description
condition = (df['has_description'] == 0)
df.loc[condition, 'len_description'] = 0
df.loc[~condition, 'len_description'] = df.loc[~condition, 'description'].apply(len)
df['len_description'] = df['len_description'].apply(int)

In [7]:
# Generate feature: number of total perks
df['num_perks'] = df.iloc[:, 10:34].sum(axis=1)

In [8]:
# Generate feature: cats or dogs allowed
condition = (df['cats_allowed'] == 1) | (df['dogs_allowed'] == 1)
df.loc[condition, 'cats_or_dogs_allowed'] = 1
df.loc[~condition, 'cats_or_dogs_allowed'] = 0
df['cats_or_dogs_allowed'] = df['cats_or_dogs_allowed'].apply(int)

In [9]:
# Generate feature: cats and dogs allowed
condition = (df['cats_allowed'] == 1) & (df['dogs_allowed'] == 1)
df.loc[condition, 'cats_and_dogs_allowed'] = 1
df.loc[~condition, 'cats_and_dogs_allowed'] = 0
df['cats_and_dogs_allowed'] = df['cats_and_dogs_allowed'].apply(int)

In [10]:
# Generate feature: number of bathrooms and bedrooms
df['total_rooms'] = df['bathrooms'] + df['bedrooms']

In [11]:
# Generate feature: ratio of bathrooms to bedrooms
df['ratio_bath_to_bed'] = df['bathrooms'] / df['bedrooms']
df.loc[df['bedrooms'] == 0, 'ratio_bath_to_bed'] = 0

In [12]:
# Generate feature: interest level number
df['interest_level_num'] = df['interest_level'].replace({'low': 1, 'medium': 2, 'high': 3})

#### Generate feature: get the neighborhood based on latitude and longitude

In [1]:
import requests

In [66]:
API_KEY = '?access_token=pk.eyJ1Ijoiandyb3NzIiwiYSI6ImNrMmNrbTJzdjR2Y2QzbXBpdW1rM2J1dDgifQ.2aBh95qoW1rLwloHUQvEAg'
url_init = 'https://api.mapbox.com/geocoding/v5/'
mode = 'mapbox.places/'
types = '&types=neighborhood'

In [None]:
def mapbox_get(latitude, longitude, print_url=False):
    lon_lat = str(longitude) + '%2C' + str(latitude) + '.json'
    return get_neighborhood(lon_lat, print_url)

In [None]:
def get_neighborhood(lon_lat, print_url):
    url = url_init + mode + lon_lat + API_KEY + types
    if print_url:
        print(url)
    response = requests.get(url)
    return response

In [None]:
r = mapbox_get(df['latitude'][0], df['longitude'][0], print_url=True)
r.status_code

In [79]:
import json

def jprint(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    print(text)

jprint(r.json())

{
    "attribution": "NOTICE: \u00a9 2019 Mapbox and its suppliers. All rights reserved. Use of this data is subject to the Mapbox Terms of Service (https://www.mapbox.com/about/maps/). This response and the information it contains may not be retained. POI(s) provided by Foursquare.",
    "features": [
        {
            "bbox": [
                -74.001481,
                40.762369,
                -73.982339,
                40.774089
            ],
            "center": [
                -73.99,
                40.77
            ],
            "context": [
                {
                    "id": "locality.12696928000137850",
                    "text": "Manhattan",
                    "wikidata": "Q11299"
                },
                {
                    "id": "postcode.2672587990338410",
                    "text": "10019"
                },
                {
                    "id": "place.15278078705964500",
                    "text": "New York",
                

In [80]:
jprint(r.json()['features'][0]['text'])

"Columbus Circle"


In [109]:
neighborhoods = []

In [116]:
import time

In [132]:
for i in df.index.tolist()[len(neighborhoods):]:
    t = time.time()
    r = mapbox_get(df['latitude'][i], df['longitude'][i])
    nbh = r.json()['features'][0]['text']
    neighborhoods.append(nbh)
    elapsed = time.time() - t
    print(f'Row number: {i}, time elapsed: {elapsed} sec')

Row number: 45373, time elapsed: 1.0498220920562744 sec
Row number: 45374, time elapsed: 0.7944531440734863 sec
Row number: 45375, time elapsed: 0.812514066696167 sec
Row number: 45376, time elapsed: 0.8424088954925537 sec
Row number: 45377, time elapsed: 0.7975239753723145 sec
Row number: 45378, time elapsed: 0.795881986618042 sec
Row number: 45379, time elapsed: 0.7735011577606201 sec
Row number: 45380, time elapsed: 0.7872400283813477 sec
Row number: 45381, time elapsed: 0.7464652061462402 sec
Row number: 45382, time elapsed: 0.7496500015258789 sec
Row number: 45384, time elapsed: 0.7366220951080322 sec
Row number: 45385, time elapsed: 0.7670278549194336 sec
Row number: 45386, time elapsed: 0.7843949794769287 sec
Row number: 45387, time elapsed: 0.8282859325408936 sec
Row number: 45388, time elapsed: 0.7980878353118896 sec
Row number: 45389, time elapsed: 0.745107889175415 sec
Row number: 45390, time elapsed: 0.7803740501403809 sec
Row number: 45391, time elapsed: 0.7865350246429443

Row number: 45524, time elapsed: 0.7768659591674805 sec
Row number: 45525, time elapsed: 0.7510280609130859 sec
Row number: 45526, time elapsed: 0.7770068645477295 sec
Row number: 45527, time elapsed: 0.802678108215332 sec
Row number: 45528, time elapsed: 0.7431647777557373 sec
Row number: 45529, time elapsed: 0.7668139934539795 sec
Row number: 45530, time elapsed: 0.7698166370391846 sec
Row number: 45531, time elapsed: 0.7729551792144775 sec
Row number: 45532, time elapsed: 0.7824869155883789 sec
Row number: 45533, time elapsed: 0.7440102100372314 sec
Row number: 45534, time elapsed: 0.8317379951477051 sec
Row number: 45535, time elapsed: 0.774409294128418 sec
Row number: 45536, time elapsed: 0.7527837753295898 sec
Row number: 45537, time elapsed: 0.7492458820343018 sec
Row number: 45538, time elapsed: 0.7686889171600342 sec
Row number: 45539, time elapsed: 0.7408299446105957 sec
Row number: 45540, time elapsed: 0.7700121402740479 sec
Row number: 45541, time elapsed: 0.796981096267700

Row number: 45677, time elapsed: 0.7516758441925049 sec
Row number: 45678, time elapsed: 0.7361979484558105 sec
Row number: 45679, time elapsed: 0.7444140911102295 sec
Row number: 45680, time elapsed: 0.8216772079467773 sec
Row number: 45681, time elapsed: 0.7673311233520508 sec
Row number: 45682, time elapsed: 0.7366571426391602 sec
Row number: 45683, time elapsed: 0.8067588806152344 sec
Row number: 45684, time elapsed: 0.7981038093566895 sec
Row number: 45685, time elapsed: 0.7991507053375244 sec
Row number: 45686, time elapsed: 0.7616093158721924 sec
Row number: 45687, time elapsed: 0.7473728656768799 sec
Row number: 45688, time elapsed: 0.7766871452331543 sec
Row number: 45689, time elapsed: 0.7475872039794922 sec
Row number: 45690, time elapsed: 0.7369949817657471 sec
Row number: 45691, time elapsed: 0.7671298980712891 sec
Row number: 45692, time elapsed: 0.7374751567840576 sec
Row number: 45694, time elapsed: 0.7730720043182373 sec
Row number: 45695, time elapsed: 0.7497279644012

Row number: 45829, time elapsed: 0.7925996780395508 sec
Row number: 45830, time elapsed: 0.7560608386993408 sec
Row number: 45831, time elapsed: 0.8086957931518555 sec
Row number: 45832, time elapsed: 0.7964668273925781 sec
Row number: 45833, time elapsed: 0.7381000518798828 sec
Row number: 45834, time elapsed: 0.7962112426757812 sec
Row number: 45835, time elapsed: 0.8911988735198975 sec
Row number: 45836, time elapsed: 0.7590780258178711 sec
Row number: 45837, time elapsed: 0.770272970199585 sec
Row number: 45838, time elapsed: 0.752403974533081 sec
Row number: 45839, time elapsed: 0.7598409652709961 sec
Row number: 45840, time elapsed: 0.7370460033416748 sec
Row number: 45841, time elapsed: 0.7360765933990479 sec
Row number: 45842, time elapsed: 0.7288739681243896 sec
Row number: 45843, time elapsed: 0.77081298828125 sec
Row number: 45844, time elapsed: 0.7839269638061523 sec
Row number: 45845, time elapsed: 0.7241060733795166 sec
Row number: 45846, time elapsed: 0.7594718933105469 

Row number: 45986, time elapsed: 0.7601180076599121 sec
Row number: 45987, time elapsed: 0.8003368377685547 sec
Row number: 45988, time elapsed: 0.7591960430145264 sec
Row number: 45989, time elapsed: 0.8759276866912842 sec
Row number: 45990, time elapsed: 0.8211319446563721 sec
Row number: 45991, time elapsed: 0.7587828636169434 sec
Row number: 45992, time elapsed: 0.8032240867614746 sec
Row number: 45993, time elapsed: 0.8123080730438232 sec
Row number: 45994, time elapsed: 0.7767977714538574 sec
Row number: 45995, time elapsed: 0.7412443161010742 sec
Row number: 45996, time elapsed: 0.7557871341705322 sec
Row number: 45997, time elapsed: 0.8157529830932617 sec
Row number: 45998, time elapsed: 0.805757999420166 sec
Row number: 45999, time elapsed: 0.7692527770996094 sec
Row number: 46000, time elapsed: 0.7704150676727295 sec
Row number: 46001, time elapsed: 0.8251550197601318 sec
Row number: 46002, time elapsed: 0.7780730724334717 sec
Row number: 46003, time elapsed: 0.77815198898315

Row number: 46137, time elapsed: 0.7889089584350586 sec
Row number: 46138, time elapsed: 0.7357690334320068 sec
Row number: 46139, time elapsed: 0.7574710845947266 sec
Row number: 46140, time elapsed: 0.7499561309814453 sec
Row number: 46141, time elapsed: 0.744105339050293 sec
Row number: 46142, time elapsed: 0.7615141868591309 sec
Row number: 46143, time elapsed: 0.9979360103607178 sec
Row number: 46144, time elapsed: 1.0433650016784668 sec
Row number: 46145, time elapsed: 0.7428309917449951 sec
Row number: 46146, time elapsed: 0.7662868499755859 sec
Row number: 46147, time elapsed: 0.765380859375 sec
Row number: 46148, time elapsed: 0.7996368408203125 sec
Row number: 46149, time elapsed: 0.7652969360351562 sec
Row number: 46150, time elapsed: 0.743546724319458 sec
Row number: 46151, time elapsed: 0.7443161010742188 sec
Row number: 46152, time elapsed: 0.800642728805542 sec
Row number: 46153, time elapsed: 0.7559220790863037 sec
Row number: 46155, time elapsed: 0.7784152030944824 sec

Row number: 46291, time elapsed: 0.7746472358703613 sec
Row number: 46292, time elapsed: 0.7295811176300049 sec
Row number: 46293, time elapsed: 0.8438687324523926 sec
Row number: 46294, time elapsed: 0.8115677833557129 sec
Row number: 46295, time elapsed: 0.7690668106079102 sec
Row number: 46296, time elapsed: 0.766855001449585 sec
Row number: 46297, time elapsed: 0.8183319568634033 sec
Row number: 46298, time elapsed: 0.7337899208068848 sec
Row number: 46299, time elapsed: 0.7545561790466309 sec
Row number: 46300, time elapsed: 0.8306920528411865 sec
Row number: 46301, time elapsed: 0.8583176136016846 sec
Row number: 46302, time elapsed: 0.8053369522094727 sec
Row number: 46303, time elapsed: 0.7908639907836914 sec
Row number: 46304, time elapsed: 0.8448648452758789 sec
Row number: 46305, time elapsed: 0.9536032676696777 sec
Row number: 46306, time elapsed: 0.7888827323913574 sec
Row number: 46307, time elapsed: 0.7683451175689697 sec
Row number: 46308, time elapsed: 0.91666674613952

Row number: 46443, time elapsed: 0.742056131362915 sec
Row number: 46444, time elapsed: 0.7413129806518555 sec
Row number: 46445, time elapsed: 0.7443161010742188 sec
Row number: 46446, time elapsed: 0.7257320880889893 sec
Row number: 46447, time elapsed: 0.8433008193969727 sec
Row number: 46448, time elapsed: 0.7510762214660645 sec
Row number: 46449, time elapsed: 0.7945091724395752 sec
Row number: 46450, time elapsed: 0.7372138500213623 sec
Row number: 46451, time elapsed: 0.815234899520874 sec
Row number: 46452, time elapsed: 0.8063390254974365 sec
Row number: 46453, time elapsed: 0.7440309524536133 sec
Row number: 46455, time elapsed: 0.7767670154571533 sec
Row number: 46456, time elapsed: 0.7424492835998535 sec
Row number: 46457, time elapsed: 0.7867071628570557 sec
Row number: 46459, time elapsed: 0.756403923034668 sec
Row number: 46460, time elapsed: 0.7563650608062744 sec
Row number: 46461, time elapsed: 0.7277328968048096 sec
Row number: 46463, time elapsed: 0.7523999214172363

Row number: 46599, time elapsed: 0.8064639568328857 sec
Row number: 46600, time elapsed: 0.7370040416717529 sec
Row number: 46601, time elapsed: 0.798328161239624 sec
Row number: 46602, time elapsed: 0.7582976818084717 sec
Row number: 46603, time elapsed: 0.8132510185241699 sec
Row number: 46604, time elapsed: 0.815561056137085 sec
Row number: 46605, time elapsed: 1.1013424396514893 sec
Row number: 46606, time elapsed: 0.7387051582336426 sec
Row number: 46607, time elapsed: 0.8783140182495117 sec
Row number: 46608, time elapsed: 0.8211672306060791 sec
Row number: 46609, time elapsed: 0.7625648975372314 sec
Row number: 46610, time elapsed: 0.7753369808197021 sec
Row number: 46611, time elapsed: 0.768218994140625 sec
Row number: 46612, time elapsed: 0.7413759231567383 sec
Row number: 46613, time elapsed: 0.8219149112701416 sec
Row number: 46614, time elapsed: 0.7978818416595459 sec
Row number: 46615, time elapsed: 0.7858271598815918 sec
Row number: 46616, time elapsed: 0.7405121326446533

Row number: 46749, time elapsed: 0.8514938354492188 sec
Row number: 46750, time elapsed: 0.7537662982940674 sec
Row number: 46751, time elapsed: 0.7527070045471191 sec
Row number: 46752, time elapsed: 0.7926549911499023 sec
Row number: 46753, time elapsed: 0.8620419502258301 sec
Row number: 46754, time elapsed: 0.7861239910125732 sec
Row number: 46755, time elapsed: 0.7556619644165039 sec
Row number: 46756, time elapsed: 0.7981948852539062 sec
Row number: 46757, time elapsed: 0.821134090423584 sec
Row number: 46758, time elapsed: 0.7862958908081055 sec
Row number: 46759, time elapsed: 0.7382779121398926 sec
Row number: 46761, time elapsed: 0.7917611598968506 sec
Row number: 46762, time elapsed: 0.796314001083374 sec
Row number: 46763, time elapsed: 0.7652599811553955 sec
Row number: 46764, time elapsed: 0.7492392063140869 sec
Row number: 46765, time elapsed: 0.7956538200378418 sec
Row number: 46766, time elapsed: 0.7390129566192627 sec
Row number: 46767, time elapsed: 0.725943088531494

Row number: 46902, time elapsed: 0.7657499313354492 sec
Row number: 46903, time elapsed: 0.8127751350402832 sec
Row number: 46904, time elapsed: 0.7443089485168457 sec
Row number: 46905, time elapsed: 0.756850004196167 sec
Row number: 46906, time elapsed: 0.7519361972808838 sec
Row number: 46907, time elapsed: 0.7595219612121582 sec
Row number: 46908, time elapsed: 0.7426528930664062 sec
Row number: 46909, time elapsed: 0.757213830947876 sec
Row number: 46910, time elapsed: 0.7480340003967285 sec
Row number: 46911, time elapsed: 0.7552089691162109 sec
Row number: 46912, time elapsed: 0.763909101486206 sec
Row number: 46913, time elapsed: 0.8051090240478516 sec
Row number: 46914, time elapsed: 0.7780568599700928 sec
Row number: 46915, time elapsed: 0.8068358898162842 sec
Row number: 46916, time elapsed: 0.756746768951416 sec
Row number: 46917, time elapsed: 0.7354738712310791 sec
Row number: 46918, time elapsed: 0.7930119037628174 sec
Row number: 46919, time elapsed: 0.7402329444885254 

Row number: 47055, time elapsed: 0.7945840358734131 sec
Row number: 47056, time elapsed: 0.7321460247039795 sec
Row number: 47057, time elapsed: 0.7350049018859863 sec
Row number: 47058, time elapsed: 0.7572116851806641 sec
Row number: 47059, time elapsed: 1.0720691680908203 sec
Row number: 47060, time elapsed: 0.7355489730834961 sec
Row number: 47061, time elapsed: 0.9858732223510742 sec
Row number: 47062, time elapsed: 0.7842130661010742 sec
Row number: 47063, time elapsed: 0.7332701683044434 sec
Row number: 47064, time elapsed: 0.7287487983703613 sec
Row number: 47065, time elapsed: 0.7998499870300293 sec
Row number: 47066, time elapsed: 0.7803740501403809 sec
Row number: 47067, time elapsed: 0.7716331481933594 sec
Row number: 47068, time elapsed: 0.7384250164031982 sec
Row number: 47069, time elapsed: 0.7490518093109131 sec
Row number: 47070, time elapsed: 0.7780828475952148 sec
Row number: 47071, time elapsed: 0.8143100738525391 sec
Row number: 47072, time elapsed: 0.7689106464385

Row number: 47209, time elapsed: 0.7767221927642822 sec
Row number: 47210, time elapsed: 0.7991292476654053 sec
Row number: 47211, time elapsed: 0.7880139350891113 sec
Row number: 47212, time elapsed: 0.7832188606262207 sec
Row number: 47213, time elapsed: 0.736860990524292 sec
Row number: 47214, time elapsed: 0.7579860687255859 sec
Row number: 47215, time elapsed: 0.7959387302398682 sec
Row number: 47216, time elapsed: 0.7378261089324951 sec
Row number: 47217, time elapsed: 0.7495081424713135 sec
Row number: 47218, time elapsed: 0.7362091541290283 sec
Row number: 47219, time elapsed: 0.7679758071899414 sec
Row number: 47220, time elapsed: 0.7931230068206787 sec
Row number: 47221, time elapsed: 0.7446010112762451 sec
Row number: 47222, time elapsed: 0.8130569458007812 sec
Row number: 47223, time elapsed: 0.7465441226959229 sec
Row number: 47224, time elapsed: 0.8073899745941162 sec
Row number: 47225, time elapsed: 0.7403960227966309 sec
Row number: 47226, time elapsed: 0.77738022804260

Row number: 47363, time elapsed: 0.7402517795562744 sec
Row number: 47364, time elapsed: 0.7675528526306152 sec
Row number: 47365, time elapsed: 0.8052899837493896 sec
Row number: 47366, time elapsed: 0.7763090133666992 sec
Row number: 47367, time elapsed: 0.770979642868042 sec
Row number: 47368, time elapsed: 0.8221790790557861 sec
Row number: 47369, time elapsed: 0.8579769134521484 sec
Row number: 47370, time elapsed: 0.7432191371917725 sec
Row number: 47372, time elapsed: 0.7544598579406738 sec
Row number: 47373, time elapsed: 0.8757081031799316 sec
Row number: 47374, time elapsed: 0.7616109848022461 sec
Row number: 47375, time elapsed: 0.7497811317443848 sec
Row number: 47376, time elapsed: 0.7716562747955322 sec
Row number: 47377, time elapsed: 0.8136129379272461 sec
Row number: 47378, time elapsed: 0.7498853206634521 sec
Row number: 47379, time elapsed: 0.7711730003356934 sec
Row number: 47380, time elapsed: 0.7525210380554199 sec
Row number: 47381, time elapsed: 0.80538320541381

Row number: 47516, time elapsed: 0.7534661293029785 sec
Row number: 47517, time elapsed: 0.7924528121948242 sec
Row number: 47518, time elapsed: 0.76096510887146 sec
Row number: 47519, time elapsed: 0.7650589942932129 sec
Row number: 47520, time elapsed: 0.7765772342681885 sec
Row number: 47521, time elapsed: 0.8537781238555908 sec
Row number: 47522, time elapsed: 0.7751531600952148 sec
Row number: 47523, time elapsed: 0.7873599529266357 sec
Row number: 47524, time elapsed: 0.8578579425811768 sec
Row number: 47525, time elapsed: 0.7744140625 sec
Row number: 47526, time elapsed: 0.7582499980926514 sec
Row number: 47527, time elapsed: 0.800400972366333 sec
Row number: 47528, time elapsed: 0.763524055480957 sec
Row number: 47529, time elapsed: 0.7551178932189941 sec
Row number: 47530, time elapsed: 0.7600047588348389 sec
Row number: 47531, time elapsed: 0.7737112045288086 sec
Row number: 47532, time elapsed: 0.7528061866760254 sec
Row number: 47533, time elapsed: 0.7412958145141602 sec
Ro

Row number: 47667, time elapsed: 0.8005878925323486 sec
Row number: 47668, time elapsed: 0.7699780464172363 sec
Row number: 47670, time elapsed: 0.9831509590148926 sec
Row number: 47671, time elapsed: 0.8219630718231201 sec
Row number: 47672, time elapsed: 0.7561678886413574 sec
Row number: 47673, time elapsed: 0.7865679264068604 sec
Row number: 47674, time elapsed: 0.7319259643554688 sec
Row number: 47675, time elapsed: 0.8003840446472168 sec
Row number: 47676, time elapsed: 0.7520749568939209 sec
Row number: 47677, time elapsed: 0.8758788108825684 sec
Row number: 47679, time elapsed: 0.8153800964355469 sec
Row number: 47680, time elapsed: 0.8841869831085205 sec
Row number: 47681, time elapsed: 0.8097078800201416 sec
Row number: 47682, time elapsed: 0.9176700115203857 sec
Row number: 47683, time elapsed: 0.789268970489502 sec
Row number: 47684, time elapsed: 0.7985808849334717 sec
Row number: 47685, time elapsed: 0.7681019306182861 sec
Row number: 47686, time elapsed: 0.79457402229309

Row number: 47820, time elapsed: 0.7512996196746826 sec
Row number: 47821, time elapsed: 0.7762811183929443 sec
Row number: 47822, time elapsed: 0.777055025100708 sec
Row number: 47823, time elapsed: 0.7632689476013184 sec
Row number: 47824, time elapsed: 0.7795689105987549 sec
Row number: 47825, time elapsed: 0.7906060218811035 sec
Row number: 47826, time elapsed: 0.8336389064788818 sec
Row number: 47827, time elapsed: 0.7999370098114014 sec
Row number: 47828, time elapsed: 0.7720897197723389 sec
Row number: 47829, time elapsed: 0.8039748668670654 sec
Row number: 47830, time elapsed: 1.0466701984405518 sec
Row number: 47831, time elapsed: 0.797745943069458 sec
Row number: 47832, time elapsed: 0.8029360771179199 sec
Row number: 47833, time elapsed: 0.7662012577056885 sec
Row number: 47834, time elapsed: 0.7799596786499023 sec
Row number: 47835, time elapsed: 0.7614338397979736 sec
Row number: 47836, time elapsed: 0.7735748291015625 sec
Row number: 47837, time elapsed: 0.834213733673095

Row number: 47974, time elapsed: 0.7398710250854492 sec
Row number: 47976, time elapsed: 0.8004629611968994 sec
Row number: 47977, time elapsed: 0.7674448490142822 sec
Row number: 47978, time elapsed: 0.7703349590301514 sec
Row number: 47979, time elapsed: 0.79557204246521 sec
Row number: 47980, time elapsed: 0.7791621685028076 sec
Row number: 47982, time elapsed: 0.7976722717285156 sec
Row number: 47983, time elapsed: 0.7744839191436768 sec
Row number: 47984, time elapsed: 0.7850251197814941 sec
Row number: 47985, time elapsed: 0.7427551746368408 sec
Row number: 47986, time elapsed: 0.787214994430542 sec
Row number: 47987, time elapsed: 0.7546448707580566 sec
Row number: 47988, time elapsed: 0.7877259254455566 sec
Row number: 47989, time elapsed: 0.7856130599975586 sec
Row number: 47990, time elapsed: 0.9273910522460938 sec
Row number: 47991, time elapsed: 0.7739222049713135 sec
Row number: 47992, time elapsed: 0.7436940670013428 sec
Row number: 47993, time elapsed: 0.7580149173736572

Row number: 48129, time elapsed: 0.761958122253418 sec
Row number: 48130, time elapsed: 0.7955691814422607 sec
Row number: 48131, time elapsed: 0.7494051456451416 sec
Row number: 48132, time elapsed: 0.7649009227752686 sec
Row number: 48133, time elapsed: 0.7632808685302734 sec
Row number: 48134, time elapsed: 0.7635080814361572 sec
Row number: 48135, time elapsed: 0.7277653217315674 sec
Row number: 48136, time elapsed: 0.7576048374176025 sec
Row number: 48137, time elapsed: 0.8049008846282959 sec
Row number: 48138, time elapsed: 0.8173420429229736 sec
Row number: 48139, time elapsed: 0.7752728462219238 sec
Row number: 48140, time elapsed: 0.7517828941345215 sec
Row number: 48141, time elapsed: 0.7593650817871094 sec
Row number: 48142, time elapsed: 0.7410900592803955 sec
Row number: 48143, time elapsed: 0.7638330459594727 sec
Row number: 48144, time elapsed: 0.7937760353088379 sec
Row number: 48145, time elapsed: 0.7394511699676514 sec
Row number: 48146, time elapsed: 0.77932500839233

Row number: 48283, time elapsed: 0.745018720626831 sec
Row number: 48284, time elapsed: 0.7722680568695068 sec
Row number: 48285, time elapsed: 0.8137519359588623 sec
Row number: 48286, time elapsed: 0.7412118911743164 sec
Row number: 48288, time elapsed: 0.821295976638794 sec
Row number: 48289, time elapsed: 0.782905101776123 sec
Row number: 48290, time elapsed: 0.7829389572143555 sec
Row number: 48291, time elapsed: 0.7599060535430908 sec
Row number: 48292, time elapsed: 0.8442740440368652 sec
Row number: 48293, time elapsed: 0.768455982208252 sec
Row number: 48294, time elapsed: 0.7830018997192383 sec
Row number: 48295, time elapsed: 0.8189980983734131 sec
Row number: 48296, time elapsed: 0.7798190116882324 sec
Row number: 48297, time elapsed: 0.7935841083526611 sec
Row number: 48298, time elapsed: 0.7506198883056641 sec
Row number: 48299, time elapsed: 0.7564940452575684 sec
Row number: 48300, time elapsed: 0.9075710773468018 sec
Row number: 48301, time elapsed: 0.7764878273010254 

Row number: 48436, time elapsed: 0.8045189380645752 sec
Row number: 48437, time elapsed: 0.8249659538269043 sec
Row number: 48438, time elapsed: 0.8011250495910645 sec
Row number: 48440, time elapsed: 0.8720338344573975 sec
Row number: 48441, time elapsed: 0.7853469848632812 sec
Row number: 48442, time elapsed: 0.8070051670074463 sec
Row number: 48443, time elapsed: 0.8101739883422852 sec
Row number: 48444, time elapsed: 0.8061389923095703 sec
Row number: 48445, time elapsed: 0.8031973838806152 sec
Row number: 48446, time elapsed: 0.8000481128692627 sec
Row number: 48447, time elapsed: 0.7842559814453125 sec
Row number: 48448, time elapsed: 0.8418443202972412 sec
Row number: 48449, time elapsed: 0.7985060214996338 sec
Row number: 48450, time elapsed: 0.8214731216430664 sec
Row number: 48451, time elapsed: 0.9377317428588867 sec
Row number: 48452, time elapsed: 0.8223538398742676 sec
Row number: 48453, time elapsed: 0.7666358947753906 sec
Row number: 48454, time elapsed: 0.9546849727630

Row number: 48585, time elapsed: 0.7834780216217041 sec
Row number: 48586, time elapsed: 0.7904829978942871 sec
Row number: 48588, time elapsed: 0.7918269634246826 sec
Row number: 48589, time elapsed: 0.7721726894378662 sec
Row number: 48590, time elapsed: 0.8099269866943359 sec
Row number: 48591, time elapsed: 0.7907168865203857 sec
Row number: 48592, time elapsed: 0.7734537124633789 sec
Row number: 48593, time elapsed: 0.8474769592285156 sec
Row number: 48594, time elapsed: 0.8057389259338379 sec
Row number: 48595, time elapsed: 0.8012800216674805 sec
Row number: 48596, time elapsed: 0.7893068790435791 sec
Row number: 48597, time elapsed: 0.848168134689331 sec
Row number: 48598, time elapsed: 0.8119301795959473 sec
Row number: 48599, time elapsed: 0.7734029293060303 sec
Row number: 48600, time elapsed: 0.961961030960083 sec
Row number: 48601, time elapsed: 0.8453869819641113 sec
Row number: 48602, time elapsed: 0.825117826461792 sec
Row number: 48603, time elapsed: 0.7778382301330566

Row number: 48736, time elapsed: 0.8104751110076904 sec
Row number: 48737, time elapsed: 0.7667219638824463 sec
Row number: 48738, time elapsed: 0.7857170104980469 sec
Row number: 48739, time elapsed: 0.785179853439331 sec
Row number: 48740, time elapsed: 0.8314568996429443 sec
Row number: 48741, time elapsed: 0.7561678886413574 sec
Row number: 48742, time elapsed: 0.8371610641479492 sec
Row number: 48743, time elapsed: 0.8038210868835449 sec
Row number: 48744, time elapsed: 0.8308789730072021 sec
Row number: 48745, time elapsed: 0.8435800075531006 sec
Row number: 48746, time elapsed: 0.7680840492248535 sec
Row number: 48747, time elapsed: 0.8127710819244385 sec
Row number: 48748, time elapsed: 0.7745089530944824 sec
Row number: 48749, time elapsed: 0.9063498973846436 sec
Row number: 48750, time elapsed: 0.8442659378051758 sec
Row number: 48751, time elapsed: 0.8738758563995361 sec
Row number: 48753, time elapsed: 1.0550031661987305 sec
Row number: 48754, time elapsed: 0.80281305313110

Row number: 48889, time elapsed: 0.864537239074707 sec
Row number: 48890, time elapsed: 0.8167529106140137 sec
Row number: 48891, time elapsed: 1.124701976776123 sec
Row number: 48892, time elapsed: 0.9551019668579102 sec
Row number: 48893, time elapsed: 1.118229866027832 sec
Row number: 48894, time elapsed: 0.7802209854125977 sec
Row number: 48895, time elapsed: 0.8019680976867676 sec
Row number: 48897, time elapsed: 0.8295629024505615 sec
Row number: 48898, time elapsed: 0.8839643001556396 sec
Row number: 48899, time elapsed: 1.1230111122131348 sec
Row number: 48900, time elapsed: 0.882836103439331 sec
Row number: 48901, time elapsed: 0.8346350193023682 sec
Row number: 48902, time elapsed: 0.9044270515441895 sec
Row number: 48903, time elapsed: 0.8820407390594482 sec
Row number: 48904, time elapsed: 0.8117139339447021 sec
Row number: 48905, time elapsed: 0.8404982089996338 sec
Row number: 48906, time elapsed: 0.8532500267028809 sec
Row number: 48907, time elapsed: 0.7859992980957031 

Row number: 49044, time elapsed: 0.8629510402679443 sec
Row number: 49045, time elapsed: 0.7874910831451416 sec
Row number: 49046, time elapsed: 0.7812168598175049 sec
Row number: 49047, time elapsed: 0.8160467147827148 sec
Row number: 49048, time elapsed: 0.9121479988098145 sec
Row number: 49049, time elapsed: 0.8310599327087402 sec
Row number: 49050, time elapsed: 0.8086628913879395 sec
Row number: 49051, time elapsed: 0.7795689105987549 sec
Row number: 49052, time elapsed: 1.0604188442230225 sec
Row number: 49053, time elapsed: 0.8513860702514648 sec
Row number: 49054, time elapsed: 0.8338069915771484 sec
Row number: 49055, time elapsed: 0.821404218673706 sec
Row number: 49056, time elapsed: 0.805588960647583 sec
Row number: 49057, time elapsed: 0.8133461475372314 sec
Row number: 49058, time elapsed: 0.8822379112243652 sec
Row number: 49059, time elapsed: 0.8315539360046387 sec
Row number: 49060, time elapsed: 0.8258907794952393 sec
Row number: 49061, time elapsed: 0.777499914169311

Row number: 49195, time elapsed: 1.384310007095337 sec
Row number: 49196, time elapsed: 0.7987170219421387 sec
Row number: 49197, time elapsed: 1.0536820888519287 sec
Row number: 49198, time elapsed: 0.8934001922607422 sec
Row number: 49199, time elapsed: 0.793759822845459 sec
Row number: 49201, time elapsed: 0.7751369476318359 sec
Row number: 49202, time elapsed: 0.8695859909057617 sec
Row number: 49203, time elapsed: 0.8754410743713379 sec
Row number: 49204, time elapsed: 0.9011499881744385 sec
Row number: 49205, time elapsed: 0.8067309856414795 sec
Row number: 49206, time elapsed: 0.8836958408355713 sec
Row number: 49207, time elapsed: 0.7867209911346436 sec
Row number: 49208, time elapsed: 0.7828469276428223 sec
Row number: 49209, time elapsed: 0.8474721908569336 sec
Row number: 49210, time elapsed: 0.8728148937225342 sec
Row number: 49211, time elapsed: 0.8005068302154541 sec
Row number: 49212, time elapsed: 0.9535109996795654 sec
Row number: 49213, time elapsed: 0.816272020339965

Row number: 49345, time elapsed: 0.7987279891967773 sec
Row number: 49346, time elapsed: 0.8583481311798096 sec
Row number: 49347, time elapsed: 1.4146699905395508 sec
Row number: 49348, time elapsed: 0.8476550579071045 sec
Row number: 49349, time elapsed: 1.3611948490142822 sec
Row number: 49350, time elapsed: 0.8628909587860107 sec
Row number: 49351, time elapsed: 0.822019100189209 sec


In [135]:
# df.insert(9, 'neighborhood', neighborhoods)
df.tail()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,neighborhood,...,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
49347,1.0,2,2016-06-02 05:41:05,"30TH/3RD, MASSIVE CONV 2BR IN LUXURY FULL SERV...",E 30 St,40.7426,-73.979,3200,230 E 30 St,Gramercy,...,0,0,0,0,0,0,0,0,0,0
49348,1.0,1,2016-04-04 18:22:34,"HIGH END condo finishes, swimming pool, and ki...",Rector Pl,40.7102,-74.0163,3950,225 Rector Place,Battery Park,...,0,0,0,0,0,1,0,0,0,1
49349,1.0,1,2016-04-16 02:13:40,Large Renovated One Bedroom Apartment with Sta...,West 45th Street,40.7601,-73.99,2595,341 West 45th Street,Clinton,...,0,0,0,0,0,0,0,0,0,0
49350,1.0,0,2016-04-08 02:13:33,Stylishly sleek studio apartment with unsurpas...,Wall Street,40.7066,-74.0101,3350,37 Wall Street,Financial District,...,0,0,0,0,0,0,0,0,0,0
49351,1.0,2,2016-04-12 02:48:07,Look no further!!! This giant 2 bedroom apart...,Park Terrace East,40.8699,-73.9172,2200,30 Park Terrace East,Inwood,...,0,0,0,0,0,0,0,0,0,0


In [124]:
df[(df['latitude'] == 40.7813) & (df['longitude'] == -74.0094)]

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,...,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
45372,1.0,0,2016-04-23 13:50:36,This studio is located at 43 w 69th street. T...,43 w 69th,40.7813,-74.0094,1700,43 w 69th,medium,...,0,0,0,0,0,0,0,0,0,0


In [131]:
df.index.tolist()[len(neighborhoods)]

45373

In [140]:
# Take the neighborhoods list and export to CSV
import csv
import os

def WriteListToCSV(csv_file,csv_columns,data_list):
    with open(csv_file, 'w') as csvfile:
        writer = csv.writer(csvfile, dialect='excel', quoting=csv.QUOTE_NONNUMERIC)
        writer.writerow(csv_columns)
        for data in data_list:
            writer.writerow(data)
    return

csv_columns = ['neighborhood']
csv_data_list = [['Neighborhood'] + neighborhoods]

currentPath = os.getcwd()
csv_file = currentPath + "/csv/neighborhoods.csv"


WriteListToCSV(csv_file,csv_columns,csv_data_list)

### Check our work

In [130]:
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,...,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,...,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,...,0,0,0,0,0,0,0,0,0,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,...,0,0,0,0,0,0,0,0,0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,...,0,0,0,0,0,0,0,0,0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,...,0,0,0,0,0,0,0,0,0,0


### Do train-test split

In [16]:
# Split the data into two sets based on date
train = df[df['created'] < '2016-06']
test = df[df['created'] > '2016-06']

In [17]:
train.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,...,wheelchair_access,common_outdoor_space,interest_level_num,has_description,len_description,num_perks,cats_or_dogs_allowed,cats_and_dogs_allowed,total_rooms,ratio_bath_to_bed
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,...,0,0,3,1,691,3,0,0,2.0,1.0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,...,0,0,1,1,492,2,0,0,2.0,1.0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,...,0,0,1,1,479,1,0,0,5.0,0.25
5,2.0,4,2016-04-19 04:24:47,,West 18th Street,40.7429,-74.0028,7995,350 West 18th Street,medium,...,0,0,2,0,0,0,0,0,6.0,0.5
6,1.0,2,2016-04-27 03:19:56,Stunning unit with a great location and lots o...,West 107th Street,40.8012,-73.966,3600,210 West 107th Street,low,...,0,0,1,1,579,3,1,1,3.0,0.5


In [18]:
test.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,...,wheelchair_access,common_outdoor_space,interest_level_num,has_description,len_description,num_perks,cats_or_dogs_allowed,cats_and_dogs_allowed,total_rooms,ratio_bath_to_bed
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,...,0,0,2,1,588,0,0,0,4.5,0.5
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,...,0,0,1,0,0,5,1,1,3.0,0.5
11,1.0,1,2016-06-03 03:21:22,Check out this one bedroom apartment in a grea...,W. 173rd Street,40.8448,-73.9396,1675,644 W. 173rd Street,low,...,0,0,1,1,690,0,0,0,2.0,1.0
14,1.0,1,2016-06-01 03:11:01,Spacious 1-Bedroom to fit King-sized bed comfo...,East 56th St..,40.7584,-73.9648,3050,315 East 56th St..,low,...,0,0,1,1,569,3,0,0,2.0,1.0
24,2.0,4,2016-06-07 04:39:56,SPRAWLING 2 BEDROOM FOUND! ENJOY THE LUXURY OF...,W 18 St.,40.7391,-73.9936,7400,30 W 18 St.,medium,...,0,0,2,1,870,11,1,1,6.0,0.5


### Fit a multiple regression with at least two features

In [19]:
# Create correlation matrix
corr_matrix = df.corr().abs()

# Select upper triangle of correlation matrix
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))

# Find feature columns where correlation is greater than 0.9
to_drop = [column for column in upper.columns.values if any(upper[column] > 0.9)]
print(to_drop)

['cats_or_dogs_allowed', 'cats_and_dogs_allowed', 'total_rooms']


In [20]:
# 1. Import the appropriate estimator class from Scikit-Learn
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import GridSearchCV

# 2. Instantiate this class
model = LinearRegression()

# 3. Arrange the X features matrices and y target vectors
features = ['bathrooms', 'bedrooms'] + train.columns.values[10:].tolist()
for feature in to_drop:
    features.remove(feature)
feature_print = "\n".join(features)
print(f'Linear Regression, dependent on:\n{feature_print}\n')

X_train = train[features]
X_test = test[features]

target = 'price'
y_train = train[target]
y_test = test[target]

# Cross-validation
# param_grid = {'fit_intercept': [True, False],
#               'normalize': [True, False]}

# grid = GridSearchCV(LinearRegression(), param_grid, cv=7)

# grid.fit(X_train, y_train)
# print(grid.best_params_)

# 4. Fit the model
# model = grid.best_estimator_
model.fit(X_train, y_train)
y_pred_train = model.predict(X_train)
mae = mean_absolute_error(y_train, y_pred_train)
print(f'Train Error: ${mae:.0f}')

# 5. Apply the model to new data
y_pred_test = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred_test)
print(f'Test Error: ${mae:.0f}')

Linear Regression, dependent on:
bathrooms
bedrooms
elevator
cats_allowed
hardwood_floors
dogs_allowed
doorman
dishwasher
no_fee
laundry_in_building
fitness_center
pre-war
laundry_in_unit
roof_deck
outdoor_space
dining_room
high_speed_internet
balcony
swimming_pool
new_construction
terrace
exclusive
loft
garden_patio
wheelchair_access
common_outdoor_space
interest_level_num
has_description
len_description
num_perks
ratio_bath_to_bed

Train Error: $725
Test Error: $725


### Get the model's coefficients and intercept

In [21]:
print('Intercept:', '\t'*2, f'{model.intercept_:.6f}')
coefficients = pd.Series(model.coef_, features)
print('\nCoefficients:')
print(coefficients.to_string())

Intercept: 		 1264.899439

Coefficients:
bathrooms               1684.767483
bedrooms                 470.032431
elevator                  81.498517
cats_allowed            -153.395791
hardwood_floors         -165.096850
dogs_allowed              91.657317
doorman                  492.177040
dishwasher                61.581625
no_fee                  -176.741048
laundry_in_building     -134.842651
fitness_center           126.333281
pre-war                 -122.181700
laundry_in_unit          476.871980
roof_deck               -200.364926
outdoor_space           -145.560882
dining_room              126.274898
high_speed_internet     -330.802849
balcony                 -128.979920
swimming_pool            -29.992715
new_construction        -193.326305
terrace                  164.776986
exclusive                107.166897
loft                     177.440841
garden_patio            -109.737939
wheelchair_access         79.371920
common_outdoor_space     -37.033476
interest_level_num     

### Get regression metrics RMSE, MAE, and $R^2$ , for both the train and test data.

In [22]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Calculate regression metrics
mse = mean_squared_error(y_train, y_pred_train)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_train, y_pred_train)
r2 = r2_score(y_train, y_pred_train)

# Print regression metrics
print('=======Training Set=======')
print(f'Root Mean Squared Error: ${rmse:.0f}')
print(f'Mean Absolute Error: ${mae:.0f}')
print('R^2:', r2)

# Calculate regression metrics
mse = mean_squared_error(y_test, y_pred_test)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred_test)
r2 = r2_score(y_test, y_pred_test)

# Print regression metrics
print('\n=======Testing Set=======')
print(f'Root Mean Squared Error: ${rmse:.0f}')
print(f'Mean Absolute Error: ${mae:.0f}')
print('R^2:', r2)

Root Mean Squared Error: 1116.5943457168876
Mean Absolute Error: 725.2054606757093
R^2: 0.599396392394393

Root Mean Squared Error: 1097.096925005977
Mean Absolute Error: 724.5369280866752
R^2: 0.6145446116393236
