<a href="https://colab.research.google.com/github/EEdwardsA/DS-Unit-2-Linear-Models/blob/master/module2-regression-2/LS_DS_212_assignment_Elizabeth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 2*

---

# Regression 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [ ] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [ ] Engineer at least two new features. (See below for explanation & ideas.)
- [ ] Fit a linear regression model with at least two features.
- [ ] Get the model's coefficients and intercept.
- [ ] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [ ] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [ ] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Add your own stretch goal(s) !

In [None]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [None]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv(DATA_PATH+'apartments/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [None]:
#"Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test."
# I need to make a mask

#First, look at head
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
#reassign index
df = df.set_index('created')
df

Unnamed: 0_level_0,bathrooms,bedrooms,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
created,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
2016-06-24 07:54:24,1.5,3,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2016-06-12 12:19:27,1.0,2,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2016-04-17 03:26:41,1.0,1,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2016-04-18 02:22:02,1.0,1,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2016-04-28 01:32:41,1.0,4,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016-06-02 05:41:05,1.0,2,"30TH/3RD, MASSIVE CONV 2BR IN LUXURY FULL SERV...",E 30 St,40.7426,-73.9790,3200,230 E 30 St,medium,1,0,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2016-04-04 18:22:34,1.0,1,"HIGH END condo finishes, swimming pool, and ki...",Rector Pl,40.7102,-74.0163,3950,225 Rector Place,low,1,1,0,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1
2016-04-16 02:13:40,1.0,1,Large Renovated One Bedroom Apartment with Sta...,West 45th Street,40.7601,-73.9900,2595,341 West 45th Street,low,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2016-04-08 02:13:33,1.0,0,Stylishly sleek studio apartment with unsurpas...,Wall Street,40.7066,-74.0101,3350,37 Wall Street,low,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
df['description'].replace({np.NaN:' '}, inplace=True)

In [None]:
date_mask = df.index < '2016-06'
date_mask


array([False, False,  True, ...,  True,  True,  True])

In [None]:
y = df['price']
X = df[['bathrooms','bedrooms','latitude','longitude','elevator',	'cats_allowed',
        'hardwood_floors',	'dogs_allowed',	'doorman',	'dishwasher',	'no_fee',
        'laundry_in_building',	'fitness_center',	'pre-war',	'laundry_in_unit',
        'roof_deck',	'outdoor_space',	'dining_room',	'high_speed_internet',	
        'balcony',	'swimming_pool',	'new_construction',	'terrace',	'exclusive',
        'loft',	'garden_patio',	'wheelchair_access',	'common_outdoor_space']]

X_train, y_train = X.loc[date_mask], y.loc[date_mask]
X_test, y_test = X.loc[~date_mask], y.loc[~date_mask]

In [None]:
#'Engineer at least two new features.'
df['description'].head(20)

created
2016-06-24 07:54:24    A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...
2016-06-12 12:19:27                                                     
2016-04-17 03:26:41    Top Top West Village location, beautiful Pre-w...
2016-04-18 02:22:02    Building Amenities - Garage - Garden - fitness...
2016-04-28 01:32:41    Beautifully renovated 3 bedroom flex 4 bedroom...
2016-04-19 04:24:47                                                     
2016-04-27 03:19:56    Stunning unit with a great location and lots o...
2016-04-13 06:01:42    This huge sunny ,plenty of lights 1 bed/2 bath...
2016-04-20 02:36:35                             <p><a  website_redacted 
2016-04-02 02:58:15    This is a spacious four bedroom with every bed...
2016-04-14 01:10:30    New to the market! Spacious studio located in ...
2016-06-03 03:21:22    Check out this one bedroom apartment in a grea...
2016-04-19 05:37:25    ***LOW FEE. Beautiful CHERRY OAK WOODEN FLOORS...
2016-04-09 01:22:11    Lincoln Square's pre

In [None]:
df['desc_length'] = df['description'].str.len()
df['desc_length']

created
2016-06-24 07:54:24     588
2016-06-12 12:19:27       8
2016-04-17 03:26:41     691
2016-04-18 02:22:02     492
2016-04-28 01:32:41     479
                       ... 
2016-06-02 05:41:05     787
2016-04-04 18:22:34    1125
2016-04-16 02:13:40     671
2016-04-08 02:13:33     735
2016-04-12 02:48:07     799
Name: desc_length, Length: 48817, dtype: int64

In [None]:
df['desc_missing'] = df['description'] == ' '

In [None]:
df['ratio'] = (df['bedrooms'] +1)/(df['bathrooms'] +1)
df['ratio'].value_counts()

1.000000    19165
1.500000    10891
0.500000     9141
2.000000     3856
1.333333     2764
1.666667     1153
2.500000      365
1.200000      239
1.600000      208
0.666667      207
0.800000      166
1.250000      135
1.142857      128
0.857143       83
3.000000       66
1.428571       38
0.750000       34
0.333333       29
1.111111       27
0.888889       24
4.000000       21
1.750000       18
2.333333       11
1.400000       11
0.400000        9
0.909091        7
2.400000        5
1.714286        4
5.000000        3
0.571429        2
3.500000        1
2.250000        1
2.800000        1
1.800000        1
0.272727        1
0.363636        1
0.200000        1
Name: ratio, dtype: int64

In [None]:
df['description'].str.contains('cozy')

created
2016-06-24 07:54:24    False
2016-06-12 12:19:27    False
2016-04-17 03:26:41    False
2016-04-18 02:22:02    False
2016-04-28 01:32:41    False
                       ...  
2016-06-02 05:41:05    False
2016-04-04 18:22:34    False
2016-04-16 02:13:40    False
2016-04-08 02:13:33    False
2016-04-12 02:48:07    False
Name: description, Length: 48817, dtype: bool

In [None]:
df['description'].str.contains('cozy').value_counts()

False    48551
True       266
Name: description, dtype: int64

In [None]:
df['cozy'] = df['description'].str.contains('cozy')
df.head()

Unnamed: 0_level_0,bathrooms,bedrooms,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,desc_length,desc_missing,ratio,cozy
created,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1
2016-06-24 07:54:24,1.5,3,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,588,False,1.6,False
2016-06-12 12:19:27,1.0,2,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,False,1.5,False
2016-04-17 03:26:41,1.0,1,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,691,False,1.0,False
2016-04-18 02:22:02,1.0,1,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,492,False,1.0,False
2016-04-28 01:32:41,1.0,4,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,479,False,2.5,False


In [None]:
df['cozy'].replace({True:1, False:0}, inplace=True)
df['cozy']

created
2016-06-24 07:54:24    0
2016-06-12 12:19:27    0
2016-04-17 03:26:41    0
2016-04-18 02:22:02    0
2016-04-28 01:32:41    0
                      ..
2016-06-02 05:41:05    0
2016-04-04 18:22:34    0
2016-04-16 02:13:40    0
2016-04-08 02:13:33    0
2016-04-12 02:48:07    0
Name: cozy, Length: 48817, dtype: int64

In [None]:
df['luxury'] = df['description'].str.contains('luxury')
df['luxury'].replace({True:1, False:0}, inplace=True)
df['luxury']

created
2016-06-24 07:54:24    0
2016-06-12 12:19:27    0
2016-04-17 03:26:41    0
2016-04-18 02:22:02    0
2016-04-28 01:32:41    0
                      ..
2016-06-02 05:41:05    0
2016-04-04 18:22:34    0
2016-04-16 02:13:40    0
2016-04-08 02:13:33    0
2016-04-12 02:48:07    0
Name: luxury, Length: 48817, dtype: int64

In [None]:
df['luxury'].value_counts()

0    44364
1     4453
Name: luxury, dtype: int64

In [None]:
df['exclamations'] = df['description'].str.count('!')
df['exclamations']

created
2016-06-24 07:54:24    1
2016-06-12 12:19:27    0
2016-04-17 03:26:41    2
2016-04-18 02:22:02    2
2016-04-28 01:32:41    0
                      ..
2016-06-02 05:41:05    4
2016-04-04 18:22:34    2
2016-04-16 02:13:40    0
2016-04-08 02:13:33    0
2016-04-12 02:48:07    5
Name: exclamations, Length: 48817, dtype: int64

In [None]:
df['spacious'] = df['description'].str.contains('spacious')
df['spacious'].replace({True:1, False:0}, inplace=True)

In [None]:
df['hardwood'] = df['description'].str.contains('hardwood')
df['hardwood'].replace({True:1, False:0}, inplace=True)

In [None]:
df['view'] = df['description'].str.contains('view')
df['view'].replace({True:1, False:0}, inplace=True)

In [None]:
df['park'] = df['description'].str.contains('park')
df['park'].replace({True:1, False:0}, inplace=True)

In [None]:
df['unique'] = df['description'].str.contains('unique')
df['unique'].replace({True:1, False:0}, inplace=True)

In [None]:
df['sun-drenched'] = df['description'].str.contains('sun-drenched')
df['sun-drenched'].replace({True:1, False:0}, inplace=True)
df['sun-drenched'].value_counts()

0    48724
1       93
Name: sun-drenched, dtype: int64

In [None]:
df['breathtaking'] = df['description'].str.contains('breathtaking')
df['breathtaking'].replace({True:1, False:0}, inplace=True)
df['breathtaking'].value_counts()

0    48104
1      713
Name: breathtaking, dtype: int64

In [None]:
df['private'] = df['description'].str.contains('private')
df['private'].replace({True:1, False:0}, inplace=True)
df['private'].value_counts()

0    42933
1     5884
Name: private, dtype: int64

In [None]:
df['upscale'] = df['description'].str.contains('upscale')
df['upscale'].replace({True:1, False:0}, inplace=True)
df['upscale'].value_counts()

0    48528
1      289
Name: upscale, dtype: int64

In [None]:
df['elegant'] = df['description'].str.contains('elegant')
df['elegant'].replace({True:1, False:0}, inplace=True)
df['elegant'].value_counts()

0    47871
1      946
Name: elegant, dtype: int64

In [None]:
df.head(1)

Unnamed: 0_level_0,bathrooms,bedrooms,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,desc_length,desc_missing,ratio,cozy,luxury,exclamations,spacious,hardwood,view,park,unique,sun-drenched,breathtaking,private,upscale,elegant
created,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1
2016-06-24 07:54:24,1.5,3,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,588,False,1.6,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
y = df['price']
X = df[['bathrooms','bedrooms','latitude','longitude','elevator',	'cats_allowed',
        'hardwood_floors',	'dogs_allowed',	'doorman',	'dishwasher',	'no_fee',
        'laundry_in_building',	'fitness_center',	'pre-war',	'laundry_in_unit',
        'roof_deck',	'outdoor_space',	'dining_room',	'high_speed_internet',	
        'balcony',	'swimming_pool',	'new_construction',	'terrace',	'exclusive',
        'loft',	'garden_patio',	'wheelchair_access',	'common_outdoor_space',
        'cozy',	'luxury',	'spacious',	'hardwood',	'view',	'park',	'unique',	
        'sun-drenched',	'breathtaking',	'private',	'upscale',	'elegant',
        'desc_missing','desc_length','ratio','exclamations']]

X_train, y_train = X.loc[date_mask], y.loc[date_mask]
X_test, y_test = X.loc[~date_mask], y.loc[~date_mask]

In [None]:
48817 - 47392

1425

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 48817 entries, 2016-06-24 07:54:24 to 2016-04-12 02:48:07
Data columns (total 49 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   bathrooms             48817 non-null  float64
 1   bedrooms              48817 non-null  int64  
 2   description           48817 non-null  object 
 3   display_address       48684 non-null  object 
 4   latitude              48817 non-null  float64
 5   longitude             48817 non-null  float64
 6   price                 48817 non-null  int64  
 7   street_address        48807 non-null  object 
 8   interest_level        48817 non-null  object 
 9   elevator              48817 non-null  int64  
 10  cats_allowed          48817 non-null  int64  
 11  hardwood_floors       48817 non-null  int64  
 12  dogs_allowed          48817 non-null  int64  
 13  doorman               48817 non-null  int64  
 14  dishwasher            48817 non-null  int64

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
model = LinearRegression()

In [None]:
X.dtypes

bathrooms               float64
bedrooms                  int64
latitude                float64
longitude               float64
elevator                  int64
cats_allowed              int64
hardwood_floors           int64
dogs_allowed              int64
doorman                   int64
dishwasher                int64
no_fee                    int64
laundry_in_building       int64
fitness_center            int64
pre-war                   int64
laundry_in_unit           int64
roof_deck                 int64
outdoor_space             int64
dining_room               int64
high_speed_internet       int64
balcony                   int64
swimming_pool             int64
new_construction          int64
terrace                   int64
exclusive                 int64
loft                      int64
garden_patio              int64
wheelchair_access         int64
common_outdoor_space      int64
cozy                      int64
luxury                    int64
spacious                  int64
hardwood

In [None]:
X.shape

(48817, 44)

In [None]:
y.shape

(48817,)

In [None]:
X.isnull().sum()

bathrooms               0
bedrooms                0
latitude                0
longitude               0
elevator                0
cats_allowed            0
hardwood_floors         0
dogs_allowed            0
doorman                 0
dishwasher              0
no_fee                  0
laundry_in_building     0
fitness_center          0
pre-war                 0
laundry_in_unit         0
roof_deck               0
outdoor_space           0
dining_room             0
high_speed_internet     0
balcony                 0
swimming_pool           0
new_construction        0
terrace                 0
exclusive               0
loft                    0
garden_patio            0
wheelchair_access       0
common_outdoor_space    0
cozy                    0
luxury                  0
spacious                0
hardwood                0
view                    0
park                    0
unique                  0
sun-drenched            0
breathtaking            0
private                 0
upscale     

In [None]:
model.fit(X_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
# Get the model's coefficients and intercept.

print('Coefficients are', model.coef_)
print('Intercept is', model.intercept_)


Coefficients are [ 2.74520131e+03 -3.64287917e+02  1.70124648e+03 -1.37149794e+04
  1.39897352e+02 -4.64409676e+01 -1.69951676e+02  1.07684325e+02
  4.49088965e+02  5.52489741e+01 -1.52134023e+02 -2.26932086e+02
  1.03523116e+02 -5.85057328e+01  4.63945504e+02 -1.63009999e+02
 -9.95144986e+01  2.32179940e+02 -2.98002928e+02 -7.09681495e+01
  5.35222352e+01 -1.64849450e+02  1.59931569e+02  5.43249778e+01
  1.16640701e+02 -4.41103565e+01  1.41397913e+02 -1.34588297e+02
  4.25892180e+01  9.50592381e+01 -3.52407835e+01 -7.91533665e+01
  2.13104130e+01 -2.81302214e+01 -7.50374962e+01  2.82838781e+02
  1.38229829e+02  1.25951078e+02  1.97479390e+02  3.09703976e+02
  1.57604038e+02  4.17220733e-02  1.78073034e+03 -2.14279949e+01]
Intercept is -1085306.2959753016


In [None]:
y_train.shape

(31844,)

In [None]:
y_test.shape

(16973,)

In [None]:
X

Unnamed: 0_level_0,bathrooms,bedrooms,latitude,longitude,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,cozy,luxury,spacious,hardwood,view,park,unique,sun-drenched,breathtaking,private,upscale,elegant,desc_missing,desc_length,ratio,exclamations
created,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1
2016-06-24 07:54:24,1.5,3,40.7145,-73.9425,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False,588,1.6,1
2016-06-12 12:19:27,1.0,2,40.7947,-73.9667,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False,8,1.5,0
2016-04-17 03:26:41,1.0,1,40.7388,-74.0018,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,False,691,1.0,2
2016-04-18 02:22:02,1.0,1,40.7539,-73.9677,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False,492,1.0,2
2016-04-28 01:32:41,1.0,4,40.8241,-73.9493,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,False,479,2.5,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016-06-02 05:41:05,1.0,2,40.7426,-73.9790,1,0,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,False,787,1.5,4
2016-04-04 18:22:34,1.0,1,40.7102,-74.0163,1,1,0,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,1,0,0,0,1,0,0,False,1125,1.0,2
2016-04-16 02:13:40,1.0,1,40.7601,-73.9900,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False,671,1.0,0
2016-04-08 02:13:33,1.0,0,40.7066,-74.0101,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False,735,0.5,0


In [None]:
#  Get regression metrics RMSE

from sklearn.metrics import mean_squared_error
print('Training Root Mean Squared Error',mean_squared_error(y_train, model.predict(X_train), squared=False))
print('Test Root Mean Squared Error', mean_squared_error(y_test, model.predict(X_test), squared=False))
#  What's the best test MAE you can get? Share your score and features used with your cohort on Slack!


Training Root Mean Squared Error 1074.1107207229406
Test Root Mean Squared Error 1062.576646068045


In [None]:
# Get MAE, and  𝑅2 , for both the train and test data.

from sklearn.metrics import mean_absolute_error

print('Training Mean Absolute Error', mean_absolute_error(y_train, model.predict(X_train)))
print('Test Mean Absolute Error', mean_absolute_error(y_test, model.predict(X_test)))


Training Mean Absolute Error 682.0849754063249
Test Mean Absolute Error 692.8312389144976
