<a href="https://colab.research.google.com/github/Avery1493/2019-US-Student-Loan-Debt-by-Location-and-Age/blob/master/module2-regression-2/Quinn_212_LS_DS_212_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 2*

---

# Regression 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [ ] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [ ] Engineer at least two new features. (See below for explanation & ideas.)
- [ ] Fit a linear regression model with at least two features.
- [ ] Get the model's coefficients and intercept.
- [ ] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [ ] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [ ] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Add your own stretch goal(s) !

In [0]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [0]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv(DATA_PATH+'apartments/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [0]:
df.head(5)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [0]:
print(df.shape)
df.describe()

(48817, 34)


Unnamed: 0,bathrooms,bedrooms,latitude,longitude,price,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
count,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0,48817.0
mean,1.201794,1.537149,40.75076,-73.97276,3579.585247,0.524838,0.478276,0.478276,0.447631,0.424852,0.415081,0.367085,0.052769,0.268452,0.185653,0.175902,0.132761,0.138394,0.102833,0.087203,0.060471,0.055206,0.051908,0.046193,0.043305,0.042711,0.039331,0.027224,0.026241
std,0.470711,1.106087,0.038954,0.028883,1762.430772,0.499388,0.499533,0.499533,0.497255,0.494326,0.492741,0.482015,0.223573,0.443158,0.38883,0.380741,0.33932,0.345317,0.303744,0.282136,0.238359,0.228385,0.221844,0.209905,0.203544,0.202206,0.194382,0.162738,0.159852
min,0.0,0.0,40.5757,-74.0873,1375.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,1.0,40.7283,-73.9918,2500.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,1.0,40.7517,-73.978,3150.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,1.0,2.0,40.774,-73.955,4095.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,10.0,8.0,40.9894,-73.7001,15500.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [0]:
df.dtypes

bathrooms               float64
bedrooms                  int64
created                  object
description              object
display_address          object
latitude                float64
longitude               float64
price                     int64
street_address           object
interest_level           object
elevator                  int64
cats_allowed              int64
hardwood_floors           int64
dogs_allowed              int64
doorman                   int64
dishwasher                int64
no_fee                    int64
laundry_in_building       int64
fitness_center            int64
pre-war                   int64
laundry_in_unit           int64
roof_deck                 int64
outdoor_space             int64
dining_room               int64
high_speed_internet       int64
balcony                   int64
swimming_pool             int64
new_construction          int64
terrace                   int64
exclusive                 int64
loft                      int64
garden_p

# Engineer new features.

In [0]:
#Interest
df['interest'] = df['interest_level'].replace({'low': 1, 'medium': 2, 'high' : 3})

In [0]:
#Are cats and dogs allowed?
df.loc[(df['cats_allowed'] == 1) & (df['dogs_allowed'] == 1), 'pets'] = 1
df.loc[(df['cats_allowed'] == 0)| (df['dogs_allowed'] == 0), 'pets'] = 0
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,interest,pets
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0.0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0.0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.0


# Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.

In [0]:
#SPLITTING DATA INTO TRAIN AND TEST
df['created'] = pd.to_datetime(df['created'])
train = df[df['created'].dt.month < 6]
test = df[df['created'].dt.month >= 6]

In [0]:
train.sample(3)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,interest,pets
44787,1.0,1,2016-04-29 05:05:35,"**NO FEE** Gut Reno** MOTT street, W/D in unit...",Mott Street,40.7234,-73.9945,3695,250 Mott Street,low,0,1,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0
24423,0.0,2,2016-05-14 05:23:12,,5th Ave,40.7968,-73.9486,2825,1295 5th Ave,low,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.0
3759,1.0,3,2016-04-06 05:51:42,AWESOME THREE BEDROOM DEAL PRIME CHELSEA LOCAT...,West 19th Street,40.7423,-73.9995,4995,264 West 19th Street,medium,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0.0


In [0]:
test.sample(3)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,interest,pets
19534,1.0,2,2016-06-24 05:56:26,1 MONTH FREE RENT!! WOW!! Beautiful 2 bedroom ...,W 188th St,40.8532,-73.9299,1558,552 W 188th St,high,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0.0
12106,2.0,4,2016-06-08 05:56:11,Located in the heart of Midtown East?s Turtle ...,East 47th Street,40.753,-73.9695,6500,301 East 47th Street,medium,1,0,1,0,1,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0.0
17663,1.0,3,2016-06-16 08:00:31,"Gorgeous 3 bedroom apartment in BedStuy, featu...",Lexington Avenue,40.6901,-73.9317,2595,733 Lexington Avenue,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.0


# Fit a linear regression model with at least two features.
#Get the model's coefficients and intercept.

In [0]:
#Import estimator class / Instantiate class
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [0]:
#Arrange X features matrices
target = 'price'
features = ['bathrooms', 'bedrooms', 'latitude', 'longitude', 'pets','interest']
y_train = train[target]
y_test = test[target]
X_train = train[features]
X_test = test[features]

In [0]:
#Fit model
model.fit(X_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [0]:
#Coefficients and intercept
model.coef_ , model.intercept_

(array([  1927.46071247,    453.95612268,   1442.45114218, -15604.35627837,
            75.94204556,   -446.1178721 ]), -1211935.0225305504)

In [0]:
#Predict
y_pred = model.predict(X_train)
y_pred

array([2622.22171975, 3004.12992711, 4180.1382098 , ..., 3436.99231476,
       3219.51261717, 2837.39039022])

In [0]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

mse = mean_squared_error(y_train, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_train, y_pred)
r2 = r2_score(y_train, y_pred)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Root Mean Squared Error: 1111.4709893022202
Mean Absolute Error: 712.4319990994923
R^2: 0.6021396028086832


# Get regression metrics RMSE, MAE, and  R2 , for both the train and test data.

In [0]:
#Predict on test data
y_pred = model.predict(X_test)
y_pred

array([3979.59230338, 3577.27574567, 2696.76632451, ..., 3027.21051609,
       3638.01475067, 3171.99770572])

In [0]:
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Root Mean Squared Error: 1093.6383871512649
Mean Absolute Error: 717.1830783802878
R^2: 0.6151747949018352


# What's the best test MAE you can get?

In [0]:
target = 'price'
features = ['bathrooms', 'bedrooms', 'latitude', 'longitude', 'pets','interest',
            'doorman',	'dishwasher',	'no_fee',	'laundry_in_building',
            'fitness_center',	'pre-war', 'laundry_in_unit', 'roof_deck',
            'outdoor_space',	'dining_room',	'high_speed_internet', 'balcony',
            'swimming_pool',	'new_construction',	'terrace',	'exclusive',
            'loft',	'garden_patio',	'wheelchair_access',
            'common_outdoor_space', 'elevator',	'hardwood_floors']
y_train = train[target]
y_test = test[target]
X_train = train[features]
X_test = test[features]

In [0]:
#Predict on train data
model.fit(X_train, y_train)
y_pred = model.predict(X_train)

In [0]:
mse = mean_squared_error(y_train, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_train, y_pred)
r2 = r2_score(y_train, y_pred)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Root Mean Squared Error: 1062.1820486161942
Mean Absolute Error: 673.2609232684173
R^2: 0.6366439770493535


In [0]:
#Predict on test data
model.fit(X_test, y_test)
y_pred = model.predict(X_test)

In [0]:
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Root Mean Squared Error: 1043.1926309745054
Mean Absolute Error: 680.692514289233
R^2: 0.6498573299431055


In [0]:
# This is easier to read
print('Intercept', model.intercept_)
coefficients = pd.Series(model.coef_, features)
print(coefficients.to_string())

Intercept -1014739.7950930495
bathrooms                1765.298777
bedrooms                  494.381629
latitude                 1352.913895
longitude              -12986.909183
pets                       45.789096
interest                 -433.670672
doorman                   459.813232
dishwasher                 47.435870
no_fee                    -91.387110
laundry_in_building      -136.297029
fitness_center            132.105683
pre-war                   -71.675211
laundry_in_unit           364.569207
roof_deck                -179.591146
outdoor_space            -123.575332
dining_room               258.731027
high_speed_internet      -281.490485
balcony                    89.427952
swimming_pool              99.709171
new_construction         -172.507468
terrace                    73.570355
exclusive                  94.342972
loft                      228.903677
garden_patio              245.605298
wheelchair_access         247.805103
common_outdoor_space      -45.535894
elevator

In [0]:
df.corr()


Unnamed: 0,bathrooms,bedrooms,latitude,longitude,price,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,interest,pets
bathrooms,1.0,0.524082,0.013407,-0.020099,0.687296,0.132882,0.021475,0.096922,0.024539,0.157173,0.17223,0.129547,-0.013202,0.148334,-0.016214,0.209366,0.102113,0.14445,0.233038,0.089073,0.13699,0.113315,0.066826,0.140767,-0.001944,0.01426,0.096283,0.06694,-0.003403,-0.072246,0.024916
bedrooms,0.524082,1.0,0.00349,0.055117,0.535503,-0.024821,-0.011173,0.09642,-0.00975,-0.047562,0.152102,0.163,0.008558,0.01315,0.00142,0.15179,0.045451,0.124912,0.189415,0.061006,0.097772,0.033441,0.000907,0.099149,-0.01488,-0.107341,0.07076,0.012306,0.003234,0.040735,-0.009478
latitude,0.013407,0.00349,1.0,0.329185,-0.036286,-0.010523,-0.029808,0.018653,-0.030954,-0.043393,-0.020612,-0.026788,-0.041635,-0.108455,0.026802,-0.044339,-0.063198,-0.079919,0.017146,-0.030969,0.016877,0.02702,-0.056093,0.006466,-0.060054,-0.01565,-0.000589,-0.06833,-0.117199,-0.046203,-0.030764
longitude,-0.020099,0.055117,0.329185,1.0,-0.251004,-0.189836,-0.058475,-0.108493,-0.070329,-0.275734,-0.16922,-0.088033,-0.044562,-0.25496,0.000196,-0.130139,-0.161466,-0.098595,-0.024793,-0.125635,-0.035474,-0.075046,-0.108001,-0.049016,0.046755,-0.060018,-0.029846,-0.063635,-0.102955,0.059222,-0.070415
price,0.687296,0.535503,-0.036286,-0.251004,1.0,0.207169,0.051453,0.101503,0.060401,0.276215,0.223899,0.13224,-0.019417,0.228775,-0.029122,0.271195,0.122929,0.142146,0.242911,0.090269,0.13914,0.134513,0.071431,0.145973,-0.013251,0.0071,0.103672,0.072517,0.011517,-0.203596,0.060873
elevator,0.132882,-0.024821,-0.010523,-0.189836,0.207169,1.0,0.033347,0.270831,0.034833,0.614558,0.349832,0.227895,0.141097,0.43107,-0.097015,0.134158,0.332028,0.204343,0.200591,0.277666,0.168081,0.183664,0.184178,0.135329,0.025895,0.054918,0.084056,0.155396,0.114882,-0.008542,0.035572
cats_allowed,0.021475,-0.011173,-0.029808,-0.058475,0.051453,0.033347,1.0,-0.177633,0.937245,0.08848,-0.04788,-0.024052,0.105644,0.126886,0.0475,-0.008827,0.02759,0.066713,-0.022404,0.077759,0.019979,0.009704,0.04271,0.006152,0.031414,-0.037966,0.00669,0.039656,0.104496,-0.057777,0.938812
hardwood_floors,0.096922,0.09642,0.018653,-0.108493,0.101503,0.270831,-0.177633,1.0,-0.185663,0.205119,0.634983,0.342971,-0.144728,0.16751,0.013435,0.360716,0.278727,0.188913,0.317281,0.237935,0.178089,0.170589,0.187635,0.181005,-0.194436,0.116572,0.161751,0.124829,-0.120489,0.118992,-0.184437
dogs_allowed,0.024539,-0.00975,-0.030954,-0.070329,0.060401,0.034833,0.937245,-0.185663,1.0,0.095434,-0.043839,-0.011414,0.093035,0.131521,0.05161,0.00251,0.034104,0.067859,-0.016154,0.089425,0.024472,0.009675,0.055655,0.005219,0.032729,-0.041014,0.009229,0.048123,0.106071,-0.064214,0.99851
doorman,0.157173,-0.047562,-0.043393,-0.275734,0.276215,0.614558,0.08848,0.205119,0.095434,1.0,0.31271,0.257031,0.077216,0.604863,-0.054614,0.166397,0.388921,0.205656,0.195814,0.312036,0.160789,0.263833,0.219943,0.126936,-0.074747,0.01008,0.075956,0.168161,0.126708,-0.078467,0.096057


#Total Features

In [0]:
df['features'] = df.iloc[:,-26:34].sum(axis = 1)
df.head(3)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,interest,pets,features
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0.0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,5
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0.0,3


In [0]:
df.corr()

Unnamed: 0,bathrooms,bedrooms,latitude,longitude,price,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,interest,pets,features
bathrooms,1.0,0.524082,0.013407,-0.020099,0.687296,0.132882,0.021475,0.096922,0.024539,0.157173,0.17223,0.129547,-0.013202,0.148334,-0.016214,0.209366,0.102113,0.14445,0.233038,0.089073,0.13699,0.113315,0.066826,0.140767,-0.001944,0.01426,0.096283,0.06694,-0.003403,-0.072246,0.024916,0.238312
bedrooms,0.524082,1.0,0.00349,0.055117,0.535503,-0.024821,-0.011173,0.09642,-0.00975,-0.047562,0.152102,0.163,0.008558,0.01315,0.00142,0.15179,0.045451,0.124912,0.189415,0.061006,0.097772,0.033441,0.000907,0.099149,-0.01488,-0.107341,0.07076,0.012306,0.003234,0.040735,-0.009478,0.116616
latitude,0.013407,0.00349,1.0,0.329185,-0.036286,-0.010523,-0.029808,0.018653,-0.030954,-0.043393,-0.020612,-0.026788,-0.041635,-0.108455,0.026802,-0.044339,-0.063198,-0.079919,0.017146,-0.030969,0.016877,0.02702,-0.056093,0.006466,-0.060054,-0.01565,-0.000589,-0.06833,-0.117199,-0.046203,-0.030764,-0.068345
longitude,-0.020099,0.055117,0.329185,1.0,-0.251004,-0.189836,-0.058475,-0.108493,-0.070329,-0.275734,-0.16922,-0.088033,-0.044562,-0.25496,0.000196,-0.130139,-0.161466,-0.098595,-0.024793,-0.125635,-0.035474,-0.075046,-0.108001,-0.049016,0.046755,-0.060018,-0.029846,-0.063635,-0.102955,0.059222,-0.070415,-0.256118
price,0.687296,0.535503,-0.036286,-0.251004,1.0,0.207169,0.051453,0.101503,0.060401,0.276215,0.223899,0.13224,-0.019417,0.228775,-0.029122,0.271195,0.122929,0.142146,0.242911,0.090269,0.13914,0.134513,0.071431,0.145973,-0.013251,0.0071,0.103672,0.072517,0.011517,-0.203596,0.060873,0.305263
elevator,0.132882,-0.024821,-0.010523,-0.189836,0.207169,1.0,0.033347,0.270831,0.034833,0.614558,0.349832,0.227895,0.141097,0.43107,-0.097015,0.134158,0.332028,0.204343,0.200591,0.277666,0.168081,0.183664,0.184178,0.135329,0.025895,0.054918,0.084056,0.155396,0.114882,-0.008542,0.035572,0.597606
cats_allowed,0.021475,-0.011173,-0.029808,-0.058475,0.051453,0.033347,1.0,-0.177633,0.937245,0.08848,-0.04788,-0.024052,0.105644,0.126886,0.0475,-0.008827,0.02759,0.066713,-0.022404,0.077759,0.019979,0.009704,0.04271,0.006152,0.031414,-0.037966,0.00669,0.039656,0.104496,-0.057777,0.938812,0.317865
hardwood_floors,0.096922,0.09642,0.018653,-0.108493,0.101503,0.270831,-0.177633,1.0,-0.185663,0.205119,0.634983,0.342971,-0.144728,0.16751,0.013435,0.360716,0.278727,0.188913,0.317281,0.237935,0.178089,0.170589,0.187635,0.181005,-0.194436,0.116572,0.161751,0.124829,-0.120489,0.118992,-0.184437,0.503269
dogs_allowed,0.024539,-0.00975,-0.030954,-0.070329,0.060401,0.034833,0.937245,-0.185663,1.0,0.095434,-0.043839,-0.011414,0.093035,0.131521,0.05161,0.00251,0.034104,0.067859,-0.016154,0.089425,0.024472,0.009675,0.055655,0.005219,0.032729,-0.041014,0.009229,0.048123,0.106071,-0.064214,0.99851,0.325645
doorman,0.157173,-0.047562,-0.043393,-0.275734,0.276215,0.614558,0.08848,0.205119,0.095434,1.0,0.31271,0.257031,0.077216,0.604863,-0.054614,0.166397,0.388921,0.205656,0.195814,0.312036,0.160789,0.263833,0.219943,0.126936,-0.074747,0.01008,0.075956,0.168161,0.126708,-0.078467,0.096057,0.636769


In [0]:
target = 'price'
features = ['bathrooms', 'bedrooms', 'latitude', 'longitude',
            'interest','features']

train = df[df['created'].dt.month < 6]
test = df[df['created'].dt.month >= 6]
           
y_train = train[target]
y_test = test[target]
X_train = train[features]
X_test = test[features]

In [0]:
model.fit(X_train, y_train)
y_pred = model.predict(X_train)
mse = mean_squared_error(y_train, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_train, y_pred)
r2 = r2_score(y_train, y_pred)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Root Mean Squared Error: 1098.7542182022607
Mean Absolute Error: 700.3894440286409
R^2: 0.6111916712830507


In [0]:
model.fit(X_test, y_test)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Root Mean Squared Error: 1080.756763569673
Mean Absolute Error: 706.3787055636561
R^2: 0.6241868760413505
