<a href="https://colab.research.google.com/github/Daniel-Benson-Poe/DS-Unit-2-Linear-Models/blob/master/db_LS_DS_212_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 2*

---

# Regression 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [ ] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [ ] Engineer at least two new features. (See below for explanation & ideas.)
- [ ] Fit a linear regression model with at least two features.
- [ ] Get the model's coefficients and intercept.
- [ ] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [ ] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [ ] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Add your own stretch goal(s) !

In [0]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [0]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv(DATA_PATH+'apartments/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [4]:
# Look at the first five rows of data
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [5]:
df.shape

(48817, 34)

In [6]:
# Look at the created column values
df['created'].shape

(48817,)

In [7]:
df[df['created'] < '2016-05'].shape

(16217, 34)

In [8]:
df[df['created'] > '2016-05'].sample(15)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
18543,1.0,2,2016-06-14 15:54:32,Gorgeous convertible 2BD in Luxury 24HR Doorma...,East 29th Street,40.7412,-73.9772,3250,340 East 29th Street,medium,1,0,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
32028,1.0,0,2016-05-26 02:43:53,Several Layouts to Choose From Starting at $30...,Spruce Street,40.7111,-74.0055,3090,8 Spruce Street,low,1,0,1,0,1,1,1,0,1,0,1,1,0,0,1,0,1,0,0,0,0,0,0,0
19082,1.0,0,2016-06-18 01:27:42,"Huge studio with small study area on side, ful...",Jane Street,40.7382,-74.0023,3525,1 Jane Street,low,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27843,1.0,1,2016-05-10 03:31:34,You just can't beat the location of this gorge...,West 118th St,40.8034,-73.9492,2350,100 West 118th St,low,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
21380,1.0,0,2016-06-24 08:26:01,Each residence enjoys generous living space th...,Park Ave,40.7482,-73.9808,3150,30 Park Ave,low,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
41452,1.5,2,2016-05-26 03:12:06,Amazing two bed one and a half bathroom in the...,W 98th St.,40.7957,-73.9705,5000,220 W 98th St.,low,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
25907,1.0,3,2016-06-20 19:07:47,"No Brokers Fee & 1 Month Free*Brand New, Gut R...",Linden St,40.6907,-73.9216,3850,36 Linden St,low,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0
42331,1.0,0,2016-06-05 04:18:10,Spacious alcove studio with great natural ligh...,East 72nd Street,40.7681,-73.9563,2595,355 East 72nd Street,low,1,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
18625,2.0,3,2016-06-02 03:37:06,STUNNING LUXURY 24 HOUR DOORMAN BUILDING WITH ...,E 34 St.,40.7456,-73.9785,4600,166 E 34 St.,low,1,1,1,1,1,1,0,0,0,0,0,1,1,1,0,1,0,0,0,0,0,0,0,0
36006,1.0,0,2016-05-14 06:02:20,Beautiful newly renovated studio in the Upper ...,West 77th Street,40.7803,-73.976,2700,50 West 77th Street,low,1,1,0,1,1,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0


In [0]:
# Let's split the data into test and train sets; for train we'll use data from April & May 2016, for test we'll use data from June 2016
train = df[df['created'] < '2016-06']
test = df[df['created'] > '2016-06']

In [10]:
train, test

(       bathrooms  bedrooms  ... wheelchair_access common_outdoor_space
 2            1.0         1  ...                 0                    0
 3            1.0         1  ...                 0                    0
 4            1.0         4  ...                 0                    0
 5            2.0         4  ...                 0                    0
 6            1.0         2  ...                 0                    0
 ...          ...       ...  ...               ...                  ...
 49346        1.0         1  ...                 0                    0
 49348        1.0         1  ...                 0                    1
 49349        1.0         1  ...                 0                    0
 49350        1.0         0  ...                 0                    0
 49351        1.0         2  ...                 0                    0
 
 [31844 rows x 34 columns],
        bathrooms  bedrooms  ... wheelchair_access common_outdoor_space
 0            1.5         3  ...  

In [11]:
# Check to see that the sum of our train and test rows match our main dataframe's rows
train.shape[0] + test.shape[0] == df.shape[0]

True

In [12]:
test

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
11,1.0,1,2016-06-03 03:21:22,Check out this one bedroom apartment in a grea...,W. 173rd Street,40.8448,-73.9396,1675,644 W. 173rd Street,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
14,1.0,1,2016-06-01 03:11:01,Spacious 1-Bedroom to fit King-sized bed comfo...,East 56th St..,40.7584,-73.9648,3050,315 East 56th St..,low,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
24,2.0,4,2016-06-07 04:39:56,SPRAWLING 2 BEDROOM FOUND! ENJOY THE LUXURY OF...,W 18 St.,40.7391,-73.9936,7400,30 W 18 St.,medium,1,1,1,1,1,1,0,0,1,0,0,0,1,0,1,1,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49305,1.0,2,2016-06-16 04:20:46,Spacious sunny 2 bedroom apartment/ Queen size...,W 175 Street,40.8456,-73.9361,2295,575 W 175 Street,low,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
49310,1.0,3,2016-06-21 06:25:35,"Soaring 40 stories into the Manhattan skyline,...",Second Avenue,40.7817,-73.9497,3995,1751 Second Avenue,high,1,0,1,0,1,1,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0
49320,1.0,1,2016-06-02 13:24:18,Great deal for a one bedroom located in Prime ...,West 55th Street,40.7669,-73.9917,2727,448 West 55th Street,medium,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
49332,1.0,2,2016-06-06 01:22:44,Don't miss out on this spacious and beautiful ...,West 98th Street,40.7957,-73.9705,4850,220 West 98th Street,low,0,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [13]:
# Now that our data has been split into train and test sets, let's do some feature engineering
# The bathrooms and bedrooms combination seems like a good place to start.
bed_and_bath_train = (train['bedrooms'] + train['bathrooms'])
bed_and_bath_train
bed_and_bath_test = (test['bedrooms'] + test['bathrooms'])
bed_and_bath_test

0        4.5
1        3.0
11       2.0
14       2.0
24       6.0
        ... 
49305    3.0
49310    4.0
49320    2.0
49332    3.0
49347    3.0
Length: 16973, dtype: float64

In [14]:
# Let's create dogs or cats allowed
cats_or_dogs_train = (train['cats_allowed'] | train['dogs_allowed'])
print(cats_or_dogs_train)
cats_or_dogs_test = (test['cats_allowed'] | test['dogs_allowed'])
cats_or_dogs_test

2        0
3        0
4        0
5        0
6        1
        ..
49346    1
49348    1
49349    1
49350    1
49351    0
Length: 31844, dtype: int64


0        0
1        1
11       0
14       0
24       1
        ..
49305    0
49310    0
49320    1
49332    1
49347    0
Length: 16973, dtype: int64

In [15]:
# Now create cats and dogs allowed
cats_and_dogs_train = (train['cats_allowed'] & train['dogs_allowed'])
print(cats_and_dogs_train)
cats_and_dogs_test = (test['cats_allowed'] & test['dogs_allowed'])
cats_and_dogs_test

2        0
3        0
4        0
5        0
6        1
        ..
49346    0
49348    1
49349    1
49350    1
49351    0
Length: 31844, dtype: int64


0        0
1        1
11       0
14       0
24       1
        ..
49305    0
49310    0
49320    1
49332    1
49347    0
Length: 16973, dtype: int64

In [16]:
# Refresh our memory on the contents of our dataframe
train.columns

Index(['bathrooms', 'bedrooms', 'created', 'description', 'display_address',
       'latitude', 'longitude', 'price', 'street_address', 'interest_level',
       'elevator', 'cats_allowed', 'hardwood_floors', 'dogs_allowed',
       'doorman', 'dishwasher', 'no_fee', 'laundry_in_building',
       'fitness_center', 'pre-war', 'laundry_in_unit', 'roof_deck',
       'outdoor_space', 'dining_room', 'high_speed_internet', 'balcony',
       'swimming_pool', 'new_construction', 'terrace', 'exclusive', 'loft',
       'garden_patio', 'wheelchair_access', 'common_outdoor_space'],
      dtype='object')

In [0]:
# Let's create a feature that adds up all amenities/perks
# First we'll make a list here
perks_list = ['elevator', 'cats_allowed', 'hardwood_floors', 'dogs_allowed',
       'doorman', 'dishwasher', 'no_fee', 'laundry_in_building',
       'fitness_center', 'pre-war', 'laundry_in_unit', 'roof_deck',
       'outdoor_space', 'dining_room', 'high_speed_internet', 'balcony',
       'swimming_pool', 'new_construction', 'terrace', 'exclusive', 'loft',
       'garden_patio', 'wheelchair_access', 'common_outdoor_space']

In [18]:
# Now let's put that list together to create our perks feature
perks_train = train[perks_list].sum(axis=1)
print(perks_train)
perks_test = test[perks_list].sum(axis=1)
perks_test

2        3
3        2
4        1
5        0
6        3
        ..
49346    5
49348    9
49349    5
49350    5
49351    1
Length: 31844, dtype: int64


0         0
1         5
11        0
14        3
24       11
         ..
49305     2
49310     8
49320     2
49332     4
49347     5
Length: 16973, dtype: int64

In [19]:
# Let's add these new features to our dataframes now
train['bed_and_bath'] = bed_and_bath_train
test['bed_and_bath'] = bed_and_bath_test
train['cats_or_dogs'] = cats_or_dogs_train
test['cats_or_dogs'] = cats_or_dogs_test
train['cats_and_dogs'] = cats_and_dogs_train
test['cats_and_dogs'] = cats_and_dogs_test
train['perks'] = perks_train
test['perks'] = perks_test

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .l

In [20]:
train.head(5)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,bed_and_bath,cats_or_dogs,cats_and_dogs,perks
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0,3
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0,2
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5.0,0,0,1
5,2.0,4,2016-04-19 04:24:47,,West 18th Street,40.7429,-74.0028,7995,350 West 18th Street,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6.0,0,0,0
6,1.0,2,2016-04-27 03:19:56,Stunning unit with a great location and lots o...,West 107th Street,40.8012,-73.966,3600,210 West 107th Street,low,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.0,1,1,3


In [21]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [31]:
# Let's look at our new feature, bed_and_bath and compare it to a multi feature linear regression
features = ['bed_and_bath']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[3]])
estimate = y_pred[0]
coefficient = model.coef_[0]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for a 3 room apartment in New York.'
explanation = f'In this linear regression, each additional room adds ${coefficient:,.0f}.'
print(result, '\n', explanation, '\nIntercept:', intercept)

$3,793 rent estimate for a 3 room apartment in New York. 
 In this linear regression, each additional room adds $810. 
Intercept: 1363.4860214142823


In [32]:
# Let's compare that to a multi-feature linear regression with 2 bedrooms and 1 bathroom
features = ['bedrooms', 'bathrooms']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[2, 1]])
estimate = y_pred[0]
bedroom_coefficient = model.coef_[0]
bathroom_coefficient = model.coef_[1]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for a 2 bedroom, 1 bathroom apartment in New York'
explanation = f'In this linear regression each additional bedroom adds ${bedroom_coefficient:,.0f} and each additional bathroom adds ${bathroom_coefficient:,.0f}'
print(result, '\n', explanation, '\nIntercept:', intercept)

$3,337 rent estimate for a 2 bedroom, 1 bathroom apartment in New York 
 In this linear regression each additional bedroom adds $389 and each additional bathroom adds $2,073 
Intercept: 485.71869002322273


In [33]:
# Now let's compare that to a multi-feat
features = ['bedrooms', 'bathrooms']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[1, 2]])
estimate = y_pred[0]
bedroom_coefficient = model.coef_[0]
bathroom_coefficient = model.coef_[1]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for a 1 bedroom, 2 bathroom apartment in New York'
explanation = f'In this linear regression each additional bedroom adds ${bedroom_coefficient:,.0f} and each additional bathroom adds ${bathroom_coefficient:,.0f}'
print(result, '\n', explanation, '\nIntercept:', intercept)

$5,020 rent estimate for a 1 bedroom, 2 bathroom apartment in New York 
 In this linear regression each additional bedroom adds $389 and each additional bathroom adds $2,073 
Intercept: 485.71869002322273


In [34]:
# Now let's look at the cats or dogs column
features = ['cats_or_dogs']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[1]])
estimate = y_pred[0]
coefficient = model.coef_[0]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for an apartment room that allows either a cat or a dog in New York.'
explanation = f'In this linear regression, including a cat or a dog adds ${coefficient:,.0f}.'
print(result, '\n', explanation, '\nIntercept:', intercept)

$3,670 rent estimate for an apartment room that allows either a cat or a dog in New York. 
 In this linear regression, including a cat or a dog adds $182. 
Intercept: 3488.7881157154025


In [35]:
# Now cats and dogs
features = ['cats_and_dogs']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[1]])
estimate = y_pred[0]
coefficient = model.coef_[0]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for an apartment room that allows both cats and dogs in New York.'
explanation = f'In this linear regression, including aboth cats and dogs adds ${coefficient:,.0f}.'
print(result, '\n', explanation, '\nIntercept:', intercept)

$3,697 rent estimate for an apartment room that allows both cats and dogs in New York. 
 In this linear regression, including aboth cats and dogs adds $218. 
Intercept: 3478.4094742203856


In [36]:
# Finally we'll look at number of perks included
features = ['perks']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[2]])
estimate = y_pred[0]
coefficient = model.coef_[0]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for an apartment with 2 perks'
explanation = f'In this linear regression, the addition of a perk adds ${coefficient:,.0f}.'
print(result, '\n', explanation, '\nIntercept:', intercept)

$3,156 rent estimate for an apartment with 2 perks 
 In this linear regression, the addition of a perk adds $155. 
Intercept: 2846.625396510405


In [37]:
features = ['perks']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[5]])
estimate = y_pred[0]
coefficient = model.coef_[0]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for an apartment with 2 perks'
explanation = f'In this linear regression, the addition of a perk adds ${coefficient:,.0f}.'
print(result, '\n', explanation, '\nIntercept:', intercept)

$3,621 rent estimate for an apartment with 2 perks 
 In this linear regression, the addition of a perk adds $155. 
Intercept: 2846.625396510405


In [38]:
features = ['perks']
target = 'price'
X_train = train[features]
y_train = train[target]
model.fit(X_train, y_train)
y_pred = model.predict([[0]])
estimate = y_pred[0]
coefficient = model.coef_[0]
intercept = model.intercept_
result = f'${estimate:,.0f} rent estimate for an apartment with 2 perks'
explanation = f'In this linear regression, the addition of a perk adds ${coefficient:,.0f}.'
print(result, '\n', explanation, '\nIntercept:', intercept)

$2,847 rent estimate for an apartment with 2 perks 
 In this linear regression, the addition of a perk adds $155. 
Intercept: 2846.625396510405


Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,bed_and_bath,cats_or_dogs,cats_and_dogs,perks
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0,3
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0,2
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5.0,0,0,1
5,2.0,4,2016-04-19 04:24:47,,West 18th Street,40.7429,-74.0028,7995,350 West 18th Street,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6.0,0,0,0
6,1.0,2,2016-04-27 03:19:56,Stunning unit with a great location and lots o...,West 107th Street,40.8012,-73.9660,3600,210 West 107th Street,low,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3.0,1,1,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49346,1.0,1,2016-04-22 15:44:11,24hr Doorman Luxury building in the heart of t...,East 10th Street,40.7296,-73.9869,4500,166 2nd avenue,medium,1,1,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,1,0,5
49348,1.0,1,2016-04-04 18:22:34,"HIGH END condo finishes, swimming pool, and ki...",Rector Pl,40.7102,-74.0163,3950,225 Rector Place,low,1,1,0,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,2.0,1,1,9
49349,1.0,1,2016-04-16 02:13:40,Large Renovated One Bedroom Apartment with Sta...,West 45th Street,40.7601,-73.9900,2595,341 West 45th Street,low,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,1,1,5
49350,1.0,0,2016-04-08 02:13:33,Stylishly sleek studio apartment with unsurpas...,Wall Street,40.7066,-74.0101,3350,37 Wall Street,low,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,1,1,5


In [0]:
# Get regression metrics RMSE, MAE, and  𝑅2 , for both the train and test data.
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

def squared_errors(df, feature, target, m, b):
    
    # Make predictions
    x = df[feature]
    y = df[target]
    y_pred = m*x + b
    
    
    # Print regression metrics
    mse = mean_squared_error(y, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y, y_pred)
    r2 = r2_score(y, y_pred)
    print('Mean Squared Error:', mse)
    print('Root Mean Squared Error:', rmse)
    print('Mean Absolute Error:', mae)
    print('R^2:', r2)

In [46]:
# For train data
feature = 'bed_and_bath'
target = 'price'
squared_errors(train, feature, target, m=0, b=y_train.mean())

Mean Squared Error: 3105028.217891242
Root Mean Squared Error: 1762.1090255404863
Mean Absolute Error: 1201.8811133682555
R^2: 0.0


In [54]:
# Let's figure out the best test MAE we can get for test data
target = 'price'
y_test = test[target]
# For test data
feature = 'bed_and_bath'
target = 'price'
squared_errors(test, feature, target, m=0, b=y_test.mean())

Mean Squared Error: 3108021.267852562
Root Mean Squared Error: 1762.9581015590138
Mean Absolute Error: 1200.8695333832386
R^2: 0.0


In [55]:
feature = 'bed_and_bath'
target = 'price'
squared_errors(test, feature, target, m=1, b=y_test.mean())

Mean Squared Error: 3104803.6084589674
Root Mean Squared Error: 1762.045291262108
Mean Absolute Error: 1200.893220813317
R^2: 0.0010352758608430657


In [56]:
feature = 'bed_and_bath'
target = 'price'
squared_errors(test, feature, target, m=2, b=y_test.mean())

Mean Squared Error: 3101605.063747515
Root Mean Squared Error: 1761.1374346562266
Mean Absolute Error: 1200.9219474764138
R^2: 0.0020644016086415196


In [59]:
feature = 'bed_and_bath'
target = 'price'
squared_errors(test, feature, target, m=3, b=y_test.mean())

Mean Squared Error: 3098425.633718204
Root Mean Squared Error: 1760.2345394060999
Mean Absolute Error: 1200.9644910872112
R^2: 0.0030873772433956947


In [60]:
feature = 'cats_or_dogs'
target = 'price'
squared_errors(test, feature, target, m=0, b=y_test.mean())

Mean Squared Error: 3108021.267852562
Root Mean Squared Error: 1762.9581015590138
Mean Absolute Error: 1200.8695333832386
R^2: 0.0


In [61]:
feature = 'cats_or_dogs'
target = 'price'
squared_errors(test, feature, target, m=1, b=y_test.mean())

Mean Squared Error: 3107933.6552951257
Root Mean Squared Error: 1762.9332532161068
Mean Absolute Error: 1200.972697231704
R^2: 2.818917564773038e-05


In [62]:
feature = 'cats_or_dogs'
target = 'price'
squared_errors(test, feature, target, m=2, b=y_test.mean())

Mean Squared Error: 3107847.0050896592
Root Mean Squared Error: 1762.9086774673438
Mean Absolute Error: 1201.075861080169
R^2: 5.606871635832622e-05


In [63]:
feature = 'cats_or_dogs'
target = 'price'
squared_errors(test, feature, target, m=3, b=y_test.mean())

Mean Squared Error: 3107761.317236164
Root Mean Squared Error: 1762.8843743241257
Mean Absolute Error: 1201.1790764693844
R^2: 8.363862213123241e-05


In [64]:
feature = 'perks'
target = 'price'
squared_errors(test, feature, target, m=0, b=y_test.mean())

Mean Squared Error: 3108021.267852562
Root Mean Squared Error: 1762.9581015590138
Mean Absolute Error: 1200.8695333832386
R^2: 0.0


In [65]:
feature = 'perks'
target = 'price'
squared_errors(test, feature, target, m=1, b=y_test.mean())

Mean Squared Error: 3104361.0355348024
Root Mean Squared Error: 1761.919701784052
Mean Absolute Error: 1201.3357545082126
R^2: 0.001177672867177182


In [66]:
feature = 'perks'
target = 'price'
squared_errors(test, feature, target, m=2, b=y_test.mean())

Mean Squared Error: 3100765.8085785
Root Mean Squared Error: 1760.8991477590362
Mean Absolute Error: 1201.8646578288974
R^2: 0.0023344303815124867


In [67]:
feature = 'perks'
target = 'price'
squared_errors(test, feature, target, m=3, b=y_test.mean())

Mean Squared Error: 3097235.586983654
Root Mean Squared Error: 1759.8964705299156
Mean Absolute Error: 1202.4443836409225
R^2: 0.0034702725430060255
