<a href="https://colab.research.google.com/github/duellal/DS-Unit-2-Linear-Models/blob/master/2_LS_DS_Regression_2_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 2*

---

# Regression 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [ ] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [ ] Engineer at least two new features. (See below for explanation & ideas.)
- [ ] Fit a linear regression model with at least two features.
- [ ] Get the model's coefficients and intercept.
- [ ] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [ ] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [ ] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Add your own stretch goal(s)!

0 = False/null

1 = True/opposite of null

In [None]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [None]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv(DATA_PATH+'apartments/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [None]:
df.head(2)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
print('Double checking that the only dates are between April and June.')
print()
print(df['created'].value_counts().sort_index())

Double checking that the only dates are between April and June.

2016-04-01 22:12:41    1
2016-04-01 22:56:00    1
2016-04-01 22:57:15    1
2016-04-01 23:26:07    1
2016-04-02 00:48:13    1
                      ..
2016-06-29 17:47:34    1
2016-06-29 17:56:12    1
2016-06-29 18:14:48    1
2016-06-29 18:30:41    1
2016-06-29 21:41:47    1
Name: created, Length: 48148, dtype: int64


##Designing new features:

- Outdoor areas: roof deck, outdoor space, balcony, terrace, garden patio, or common outdoor space
- Handicapped friendly: elevator, laundry in unit, and wheelchair access
- Number of words per description

In [None]:
def outdoor_areas(roof_deck, outdoor_space, balcony, terrace, garden_patio, common_outdoor_space):
  if roof_deck == 1:
    return 1
  elif outdoor_space == 1:
    return 1
  elif balcony == 1:
    return 1
  elif terrace == 1:
    return 1
  elif garden_patio == 1:
    return 1
  elif common_outdoor_space == 1:
    return 1
  else:
     return 0

In [None]:
df['All Outdoor Areas'] = df[['roof_deck', 'outdoor_space', 'balcony', 'terrace', 
                          'garden_patio', 'common_outdoor_space']].apply(lambda x: outdoor_areas(*x), axis=1)

print('Apartment has at least 1 outdoor area:\n', df['All Outdoor Areas'].value_counts())

Apartment has at least 1 outdoor area:
 0    36379
1    12438
Name: All Outdoor Areas, dtype: int64


In [None]:
def handicap_friendly(elevator, laundry_in_unit, wheelchair_access):
  if elevator == 1 and laundry_in_unit == 1 and wheelchair_access == 1:
    return 1
  else:
    return 0

In [None]:
df['Handicapped Friendly'] = df[['elevator', 'laundry_in_unit', 'wheelchair_access']].apply(lambda x: handicap_friendly(*x), axis=1)
print('Apartment is Handicapped Friendly:\n', df['Handicapped Friendly'].value_counts())

Apartment is Handicapped Friendly:
 0    48203
1      614
Name: Handicapped Friendly, dtype: int64


In [None]:
df['Decription Word #'] = df['description'].str.split().str.len()
df.head(2)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,All Outdoor Areas,Handicapped Friendly,Decription Word #
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,93.0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0


In [None]:
def interest_level(interest_level):
  if interest_level == 'low':
    return 0
  if interest_level == 'medium':
    return 1
  if interest_level == 'high':
    return 2

In [None]:
df['Amenities'] = (df['elevator'] + df['dishwasher'] + df['laundry_in_unit'] + df['fitness_center'] + df['roof_deck'] +
                  df['outdoor_space'] + df['dining_room'] + df['high_speed_internet'] + df['balcony'] + df['swimming_pool'] +
                  df['terrace'] + df['garden_patio'] + df['common_outdoor_space'] + df['doorman'] + df['dining_room'])

df['# of Rooms'] = (df['bathrooms'] + df['bedrooms'])

df['Interest Level'] = df['interest_level'].apply(interest_level)

df['Pets Allowed'] = (df['dogs_allowed'] + df['cats_allowed'])
df.head(1)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,All Outdoor Areas,Handicapped Friendly,Decription Word #,Amenities,# of Rooms,Interest Level,Pets Allowed,year_month created
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,93.0,0,4.5,1,0,2016-06


In [None]:
df.corr()

Unnamed: 0,bathrooms,bedrooms,latitude,longitude,price,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,All Outdoor Areas,Handicapped Friendly,Decription Word #,Amenities,# of Rooms,Interest Level,Pets Allowed
bathrooms,1.0,0.524082,0.013407,-0.020099,0.687296,0.132882,0.021475,0.096922,0.024539,0.157173,0.17223,0.129547,-0.013202,0.148334,-0.016214,0.209366,0.102113,0.14445,0.233038,0.089073,0.13699,0.113315,0.066826,0.140767,-0.001944,0.01426,0.096283,0.06694,-0.003403,0.162202,0.080678,0.136763,0.270375,0.744468,-0.072246,0.023373
bedrooms,0.524082,1.0,0.00349,0.055117,0.535503,-0.024821,-0.011173,0.09642,-0.00975,-0.047562,0.152102,0.163,0.008558,0.01315,0.00142,0.15179,0.045451,0.124912,0.189415,0.061006,0.097772,0.033441,0.000907,0.099149,-0.01488,-0.107341,0.07076,0.012306,0.003234,0.114414,0.013825,0.099223,0.132823,0.958785,0.040735,-0.010631
latitude,0.013407,0.00349,1.0,0.329185,-0.036286,-0.010523,-0.029808,0.018653,-0.030954,-0.043393,-0.020612,-0.026788,-0.041635,-0.108455,0.026802,-0.044339,-0.063198,-0.079919,0.017146,-0.030969,0.016877,0.02702,-0.056093,0.006466,-0.060054,-0.01565,-0.000589,-0.06833,-0.117199,-0.095882,-0.041558,-0.039736,-0.057805,0.007209,-0.046203,-0.030868
longitude,-0.020099,0.055117,0.329185,1.0,-0.251004,-0.189836,-0.058475,-0.108493,-0.070329,-0.275734,-0.16922,-0.088033,-0.044562,-0.25496,0.000196,-0.130139,-0.161466,-0.098595,-0.024793,-0.125635,-0.035474,-0.075046,-0.108001,-0.049016,0.046755,-0.060018,-0.029846,-0.063635,-0.102955,-0.175481,-0.029845,-0.095826,-0.249191,0.036503,0.059222,-0.065423
price,0.687296,0.535503,-0.036286,-0.251004,1.0,0.207169,0.051453,0.101503,0.060401,0.276215,0.223899,0.13224,-0.019417,0.228775,-0.029122,0.271195,0.122929,0.142146,0.242911,0.090269,0.13914,0.134513,0.071431,0.145973,-0.013251,0.0071,0.103672,0.072517,0.011517,0.189714,0.092276,0.157691,0.345543,0.649097,-0.203596,0.056815
elevator,0.132882,-0.024821,-0.010523,-0.189836,0.207169,1.0,0.033347,0.270831,0.034833,0.614558,0.349832,0.227895,0.141097,0.43107,-0.097015,0.134158,0.332028,0.204343,0.200591,0.277666,0.168081,0.183664,0.184178,0.135329,0.025895,0.054918,0.084056,0.155396,0.114882,0.365357,0.107388,0.225435,0.645122,0.024873,-0.008542,0.034636
cats_allowed,0.021475,-0.011173,-0.029808,-0.058475,0.051453,0.033347,1.0,-0.177633,0.937245,0.08848,-0.04788,-0.024052,0.105644,0.126886,0.0475,-0.008827,0.02759,0.066713,-0.022404,0.077759,0.019979,0.009704,0.04271,0.006152,0.031414,-0.037966,0.00669,0.039656,0.104496,0.046039,0.023675,0.043902,0.05788,-0.001594,-0.057777,0.984259
hardwood_floors,0.096922,0.09642,0.018653,-0.108493,0.101503,0.270831,-0.177633,1.0,-0.185663,0.205119,0.634983,0.342971,-0.144728,0.16751,0.013435,0.360716,0.278727,0.188913,0.317281,0.237935,0.178089,0.170589,0.187635,0.181005,-0.194436,0.116572,0.161751,0.124829,-0.120489,0.302021,0.10647,0.250248,0.484844,0.107922,0.118992,-0.184558
dogs_allowed,0.024539,-0.00975,-0.030954,-0.070329,0.060401,0.034833,0.937245,-0.185663,1.0,0.095434,-0.043839,-0.011414,0.093035,0.131521,0.05161,0.00251,0.034104,0.067859,-0.016154,0.089425,0.024472,0.009675,0.055655,0.005219,0.032729,-0.041014,0.009229,0.048123,0.106071,0.053924,0.030739,0.056194,0.066691,0.000543,-0.064214,0.984113
doorman,0.157173,-0.047562,-0.043393,-0.275734,0.276215,0.614558,0.08848,0.205119,0.095434,1.0,0.31271,0.257031,0.077216,0.604863,-0.054614,0.166397,0.388921,0.205656,0.195814,0.312036,0.160789,0.263833,0.219943,0.126936,-0.074747,0.01008,0.075956,0.168161,0.126708,0.392933,0.11607,0.258141,0.68603,0.01515,-0.078467,0.093426


##Train/Test Split 
* Train - April/May 2016
* Test - June 2016

In [None]:
import datetime

In [None]:
df['year_month created'] = pd.to_datetime(df['created']).dt.to_period('M')
df.head(2)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,All Outdoor Areas,Handicapped Friendly,Decription Word #,Amenities,# of Rooms,Interest Level,Pets Allowed,year_month created
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,93.0,0,4.5,1,0,2016-06
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3,3.0,0,2,2016-06


In [None]:
train = df[df['year_month created'] < '2016-6']
test = df[df['year_month created'] >= '2016-6']

##Linear Regression with 2 features

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
import plotly.express as px

In [None]:
model = LinearRegression()

In [None]:
features = ['Handicapped Friendly', 'bedrooms']
target = ['price']

###Baseline

In [None]:
y_train = train[target]
y_test = test[target]

guess = y_train.mean()
print('Mean Baseline w/ 0 features:', guess)

Mean Baseline w/ 0 features: price    3575.604007
dtype: float64


In [None]:
#Training and Testing Errors

y_pred1 = [guess] * len(y_train)
train_mae = mean_absolute_error(y_train, y_pred1)
print(f'Training Error (NYC Apartment Prices): {train_mae:.2f} percent')

y_pred2 = [guess] * len(y_test)
test_mae = mean_absolute_error(y_test, y_pred2)
print(f'Testing Error (NYC Apartment Prices: {test_mae:.2f} percent')


Training Error (NYC Apartment Prices): 1201.88 percent
Testing Error (NYC Apartment Prices: 1197.71 percent


###Linear Regression #1

In [None]:
X_train = train[features]
X_test = test[features]

print(f'Linear Regression, dependent on: {features}')

Linear Regression, dependent on: ['Handicapped Friendly', 'bedrooms']


In [None]:
model.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
y_pred_train = model.predict(X_train)

In [None]:
mae_train = mean_absolute_error(y_train, y_pred_train)
print(f'Train Error (NYC Apartment Prices): {mae_train:.2f} percent')

Train Error (NYC Apartment Prices): 968.17 percent


In [None]:
y_pred_test = model.predict(X_test)
mae_test = mean_absolute_error(y_test, y_pred_test)
print(f'Test Error (NYC Apartment Prices: {mae_test:.2f} percent')

Test Error (NYC Apartment Prices: 986.20 percent


####Model's coefficients and intercept.


In [None]:
model.coef_

array([[1306.60288328,  854.01119936]])

In [None]:
model.intercept_

array([2253.71128525])

###Linear Regression #2

In [None]:
df.columns

Index(['bathrooms', 'bedrooms', 'created', 'description', 'display_address',
       'latitude', 'longitude', 'price', 'street_address', 'interest_level',
       'elevator', 'cats_allowed', 'hardwood_floors', 'dogs_allowed',
       'doorman', 'dishwasher', 'no_fee', 'laundry_in_building',
       'fitness_center', 'pre-war', 'laundry_in_unit', 'roof_deck',
       'outdoor_space', 'dining_room', 'high_speed_internet', 'balcony',
       'swimming_pool', 'new_construction', 'terrace', 'exclusive', 'loft',
       'garden_patio', 'wheelchair_access', 'common_outdoor_space',
       'All Outdoor Areas', 'Handicapped Friendly', 'Decription Word #',
       'Amenities', '# of Rooms', 'Interest Level', 'Pets Allowed',
       'year_month created'],
      dtype='object')

In [None]:
features2 = ['Handicapped Friendly', 'Amenities', '# of Rooms', 'Pets Allowed']

In [None]:
X_train2 = train[features2]
X_test2 = test[features2]

print(f'Linear Regression, dependent on: {features2}')
print()
model.fit(X_train2, y_train)

Linear Regression, dependent on: ['Handicapped Friendly', 'Amenities', '# of Rooms', 'Pets Allowed']



LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
y_pred_train2 = model.predict(X_train2)
train2_mae = mean_absolute_error(y_train, y_pred_train2)
print(f'Train MAE: {train2_mae:.2f} percent')

y_pred_test2 = model.predict(X_test2)
test2_mae = mean_absolute_error(y_test, y_pred_test2)
print(f'Test MAE: {test2_mae:.2f} percent')

Train MAE: 844.79 percent
Test MAE: 852.52 percent


####Model Coefficients and Intercept

In [None]:
model.coef_

array([[246.42631022, 141.65761776, 756.36913277,  70.74425523]])

In [None]:
model.intercept_

array([1068.03772832])

###Linear Regression #3

In [None]:
features3 = ['# of Rooms', 'Amenities']

In [None]:
X_train3 = train[features3]
X_test3 = test[features3]
print(f'Linear Regression, dependent on: {features3}')
print()

model.fit(X_train3, y_train)

Linear Regression, dependent on: ['# of Rooms', 'Amenities']



LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
y_pred_train3 = model.predict(X_train3)
train3_mae = mean_absolute_error(y_train, y_pred_train3)
print(f'Train MAE: {train3_mae:.2f} percent')

y_pred_test3 = model.predict(X_test3)
test3_mae = mean_absolute_error(y_test, y_pred_test3)
print(f'Test MAE: {test3_mae:2f} percent')

Train MAE: 847.29 percent
Test MAE: 855.386737 percent


####Model's coefficients and intercept

In [None]:
model.coef_

array([[755.49056532, 146.07650175]])

In [None]:
model.intercept_

array([1127.25192472])

###Regression metrics RMSE, MAE, and $R^2$, for both the train and test data.

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

####Train Regression Metrics:

In [None]:
mae_baseline = mean_absolute_error(y_train, y_pred1)
mse_baseline = mean_squared_error(y_train, y_pred1)
rmse_baseline = np.sqrt(mse)
r2_baseline = r2_score(y_train, y_pred1)

print('Baseline Training Regression Metrics:')
print()
print('Mean Square Error:', mse_baseline)
print('Root Mean Squared Error:', rmse_baseline)
print('Mean Absolute Error:', mae_baseline)
print('R^2:', r2_baseline)

Baseline Training Regression Metrics:

Mean Square Error: 3105028.217891242
Root Mean Squared Error: 1479.8022825598161
Mean Absolute Error: 1201.8811133682555
R^2: 0.0


In [None]:
mae_train = mean_absolute_error(y_train, y_pred_train)
mse_train = mean_squared_error(y_train, y_pred_train)
rmse_train = np.sqrt(mse_train)
r2_train = r2_score(y_train, y_pred_train)

print('Train Regression #1 Metrics:')
print()
print('Mean Square Error:', mse_train)
print('Root Mean Squared Error:', rmse_train)
print('Mean Absolute Error:', mae_train)
print('R^2:', r2_train)


Train Regression #1 Metrics:

Mean Square Error: 2189814.7954692417
Root Mean Squared Error: 1479.8022825598161
Mean Absolute Error: 968.171571164732
R^2: 0.2947520467442196


In [None]:
mae_train2 = mean_absolute_error(y_train, y_pred_train2)
mse_train2 = mean_squared_error(y_train, y_pred_train2)
rmse_train2 = np.sqrt(mse_train2)
r2_train2 = r2_score(y_train, y_pred_train2)

print('Train Regression #2 Metrics:')
print()
print('Mean Square Error:', mse_train2)
print('Root Mean Squared Error:', rmse_train2)
print('Mean Absolute Error:', mae_train2)
print('R^2:', r2_train2)

Train Regression #2 Metrics:

Mean Square Error: 1640674.0069810941
Root Mean Squared Error: 1280.8879759686615
Mean Absolute Error: 844.7909531799122
R^2: 0.4716073762140086


In [None]:
mae_train3 = mean_absolute_error(y_train, y_pred_train3)
mse_train3 = mean_squared_error(y_train, y_pred_train3)
rmse_train3 = np.sqrt(mse_train3)
r2_train3 = r2_score(y_train, y_pred_train3)

print('Train Regression #3 Metrics:')
print()
print('Mean Square Error:', mse_train3)
print('Root Mean Squared Error:', rmse_train3)
print('Mean Absolute Error:', mae_train3)
print('R^2:', r2_train3)

Train Regression #3 Metrics:

Mean Square Error: 1646222.674250671
Root Mean Squared Error: 1283.0520933503328
Mean Absolute Error: 847.2919089136145
R^2: 0.46982038206123244


####Test Regression Metrics:

In [None]:
mae_base = mean_absolute_error(y_test, y_pred2)
mse_base = mean_squared_error(y_test, y_pred2)
rmse_base = np.sqrt(mse)
r2_base = r2_score(y_test, y_pred2)

print('Baseline Testing Regression Metrics:')
print()
print('Mean Square Error:', mse_base)
print('Root Mean Squared Error:', rmse_base)
print('Mean Absolute Error:', mae_base)
print('R^2:', r2_base)

Baseline Testing Regression Metrics:

Mean Square Error: 3108152.385651076
Root Mean Squared Error: 1479.8022825598161
Mean Absolute Error: 1197.7088871089013
R^2: -4.218690517676649e-05


In [None]:
mae_test = mean_absolute_error(y_test, y_pred_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = np.sqrt(mse_test)
r2_test = r2_score(y_test, y_pred_test)

print('Test Regression #1 Metrics:')
print()
print('Mean Square Error:', mse_test)
print('Root Mean Squared Error:', rmse_test)
print('Mean Absolute Error:', mae_test)
print('R^2:', r2_test)

Test Regression #1 Metrics:

Mean Square Error: 2199088.8671295564
Root Mean Squared Error: 1482.9325227836755
Mean Absolute Error: 986.2047646923478
R^2: 0.2924472911831192


In [None]:
mae_test2 = mean_absolute_error(y_test, y_pred_test2)
mse_test2 = mean_squared_error(y_test, y_pred_test2)
rmse_test2 = np.sqrt(mse_test2)
r2_test2 = r2_score(y_test, y_pred_test2)

print('Test Regression #2 Metrics:')
print()
print('Mean Square Error:', mse_test2)
print('Root Mean Squared Error:', rmse_test2)
print('Mean Absolute Error:', mae_test2)
print('R^2:', r2_test2)

Test Regression #2 Metrics:

Mean Square Error: 1626524.8161409174
Root Mean Squared Error: 1275.3528202583461
Mean Absolute Error: 852.5244963012196
R^2: 0.4766686981956404


In [None]:
mae_test3 = mean_absolute_error(y_test, y_pred_test3)
mse_test3 = mean_squared_error(y_test, y_pred_test3)
rmse_test3 = np.sqrt(mse_test3)
r2_test3 = r2_score(y_test, y_pred_test3)

print('Test Regression #3 Metrics:')
print()
print('Mean Square Error:', mse_test3)
print('Root Mean Squared Error:', rmse_test3)
print('Mean Absolute Error:', mae_test3)
print('R^2:', r2_test3)

Test Regression #3 Metrics:

Mean Square Error: 1634290.031521748
Root Mean Squared Error: 1278.3935354661912
Mean Absolute Error: 855.3867369971023
R^2: 0.4741702547450922


###What's the best test MAE you can get?

The best MAE I got was 852.52 for the features of handicapped friendly, number of rooms, amenities, and pets allowed.