<a href="https://colab.research.google.com/github/noreallyimfine/DS-Unit-2-Regression-Classification/blob/master/module2/Copy_of_assignment_regression_classification_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science, Unit 2: Predictive Modeling

# Regression & Classification, Module 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [x] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [x] Engineer at least two new features. (See below for explanation & ideas.)
- [x] Fit a linear regression model with at least two features.
- [x] Get the model's coefficients and intercept.
- [x] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [x] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [x] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Do the [Plotly Dash](https://dash.plot.ly/) Tutorial, Parts 1 & 2.
- [ ] Add your own stretch goal(s) !

In [1]:
# If you're in Colab...
import os, sys
in_colab = 'google.colab' in sys.modules

if in_colab:
    # Install required python packages:
    # pandas-profiling, version >= 2.0
    # plotly, version >= 4.0
    !pip install --upgrade pandas-profiling plotly
    
    # Pull files from Github repo
    os.chdir('/content')
    !git init .
    !git remote add origin https://github.com/LambdaSchool/DS-Unit-2-Regression-Classification.git
    !git pull origin master
    
    # Change into directory for module
    os.chdir('module1')

Requirement already up-to-date: pandas-profiling in /usr/local/lib/python3.6/dist-packages (2.3.0)
Requirement already up-to-date: plotly in /usr/local/lib/python3.6/dist-packages (4.0.0)
Reinitialized existing Git repository in /content/.git/
fatal: remote origin already exists.
From https://github.com/LambdaSchool/DS-Unit-2-Regression-Classification
 * branch            master     -> FETCH_HEAD
Already up to date.


In [0]:
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [0]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv('../data/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [4]:
# Get a fresh look at the data
print(df.shape)
df.head()

(48817, 34)


Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [5]:
# See what values are null
# Not values that are concerning right now
df.isnull().sum()

bathrooms                  0
bedrooms                   0
created                    0
description             1425
display_address          133
latitude                   0
longitude                  0
price                      0
street_address            10
interest_level             0
elevator                   0
cats_allowed               0
hardwood_floors            0
dogs_allowed               0
doorman                    0
dishwasher                 0
no_fee                     0
laundry_in_building        0
fitness_center             0
pre-war                    0
laundry_in_unit            0
roof_deck                  0
outdoor_space              0
dining_room                0
high_speed_internet        0
balcony                    0
swimming_pool              0
new_construction           0
terrace                    0
exclusive                  0
loft                       0
garden_patio               0
wheelchair_access          0
common_outdoor_space       0
dtype: int64

In [0]:
# Eliminate some of the weird outliers, same as yesterday
mask =(
     (df.price > 400) & 
     (df.price < 100000) &
     (df.latitude > 40.4) &
     (df.latitude < 40.95) &
     (df.longitude > -74.3) &
    (df.longitude < -73.6)
)

df = df[mask]

In [7]:
# Is the date a datetime object or a string?
df.created.dtype

dtype('O')

In [8]:
# Convert the dates to datetime objects
df['created'] = pd.to_datetime(df['created'], infer_datetime_format=True)

# Make sure that it worked how we wanted
df.created.dtype

dtype('<M8[ns]')

In [9]:
# Confirm which months are data include
df.created.describe()

count                   48816
unique                  48147
top       2016-05-02 03:41:36
freq                        3
first     2016-04-01 22:12:41
last      2016-06-29 21:41:47
Name: created, dtype: object

In [10]:
# ENGINEER NEW FEATURES 

# First feature: Total Amenities

# DataFrame of just amenities
amenities = df.iloc[:, 10:]

# Calculate total amenities
df['total_amenities'] = amenities.sum(axis=1)

# Check it out visually to see that it worked
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,total_amenities
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [11]:
# Second feature: Bedroom/Bathroom Ratio

df['bed_bath_ratio'] = df['bedrooms'] / df['bathrooms']

# Infinities and Nans come from a numerator or denominator being 0
df.replace([np.inf, np.nan], 0, inplace=True)
df.bed_bath_ratio.describe()

count    48816.000000
mean         1.262424
std          0.877441
min          0.000000
25%          1.000000
50%          1.000000
75%          2.000000
max          6.000000
Name: bed_bath_ratio, dtype: float64

In [0]:
# Add column for length of description
df['description_length'] = len(df['description'])

In [0]:
# New feature - binary if pets are allowed
pets = df[['cats_allowed', 'dogs_allowed']]

df['pets_allowed'] = pets.sum(axis=1) > 0

In [14]:
# Total Rooms Feature
df['total_rooms'] = df['bathrooms'] + df['bedrooms']

df.total_rooms.describe()

count    48816.000000
mean         2.738938
std          1.410944
min          0.000000
25%          2.000000
50%          2.000000
75%          4.000000
max         12.000000
Name: total_rooms, dtype: float64

In [47]:
# Lux amenities feature

luxuries = ['elevator', 'doorman', 'fitness_center', 'laundry_in_unit',
            'terrace', 'garden_patio', 'swimming_pool', 'balcony',
            'roof_deck', 'dining_room']
lux = amenities[luxuries]

df['lux'] = lux.sum(axis=1)
amenities.columns

Index(['elevator', 'cats_allowed', 'hardwood_floors', 'dogs_allowed',
       'doorman', 'dishwasher', 'no_fee', 'laundry_in_building',
       'fitness_center', 'pre-war', 'laundry_in_unit', 'roof_deck',
       'outdoor_space', 'dining_room', 'high_speed_internet', 'balcony',
       'swimming_pool', 'new_construction', 'terrace', 'exclusive', 'loft',
       'garden_patio', 'wheelchair_access', 'common_outdoor_space'],
      dtype='object')

In [48]:
# See new features
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,total_amenities,bed_bath_ratio,description_length,pets_allowed,total_rooms,lux
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,48816,False,4.5,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2.0,48816,True,3.0,3
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1.0,48816,False,2.0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1.0,48816,False,2.0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,4.0,48816,False,5.0,0


In [49]:
# Now it can be split by date
# Everything with a month <6 will be the train data
# 6(June) is the test data
train = df[df['created'] < '2016-06']

# And describe says last date is May 31
train.created.describe()

count                   31844
unique                  31436
top       2016-05-14 01:11:03
freq                        3
first     2016-04-01 22:12:41
last      2016-05-31 23:10:48
Name: created, dtype: object

In [50]:

# Now test data
test = df[df['created'] >= '2016-06-01']

# All test dates are in June
test.created.describe()

count                   16972
unique                  16711
top       2016-06-25 01:30:16
freq                        3
first     2016-06-01 01:10:37
last      2016-06-29 21:41:47
Name: created, dtype: object

In [19]:
# MODEL TIME

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Simple Model

# create instance of the model
model = LinearRegression()

# X matrix and y vector
features = ['bedrooms']
target = 'price'

X_train = train[features]
y_train = train[target]
X_test = test[features]
y_test = test[target]

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print('BEDROOMS MODEL')
print('---------------')
print('The coefficient of this model is:', model.coef_[0])
print('The intercept of this model is: ', model.intercept_)
print('The Mean Absolute Error of this model is: ', mean_absolute_error(y_test, y_pred))
print('The Root Mean Squared Error of this model is: ', (mean_squared_error(y_test, y_pred) ** 0.5))
print('The r^2 score of this model is: ', r2_score(y_test, y_pred))

BEDROOMS MODEL
---------------
The coefficient of this model is: 855.5060260304634
The intercept of this model is:  2268.0853918830353
The Mean Absolute Error of this model is:  988.6955251775885
The Root Mean Squared Error of this model is:  1490.9996688078954
The r^2 score of this model is:  0.2847512205241449


In [0]:
# Function to run models

def linear_regression(train, test, features, target):
  '''
  FUNCTION
  -------------
  Function to make linear regression predictions
  
  ARGUMENTS
  -------------
  train: a training dataset
  
  test: a test dataset
  
  features: the features to use to train the model
  
  target: the target value to predict
  
  RETURNS
  ------------
  '''

  model = LinearRegression()

  X_train = train[features]
  y_train = train[target]

  X_test = test[features]
  y_test = test[target]

  model.fit(X_train, y_train)

  y_pred = model.predict(X_test)

  mae = mean_absolute_error(y_test, y_pred)
  rmse = np.sqrt(mean_squared_error(y_test, y_pred))
  coefficients = [c for c in model.coef_]
  r_squared = r2_score(y_test, y_pred)

  
  message = f'''{features} Model
             \n-----------------------
             \nThe Mean Absolute Error of this model is {mae}
             \nThe Root Mean Squared Error of this model is {rmse}
             \nThe r_2 score of this model is {r_squared}
             \nThe coefficients of this model are {coefficients}'''

  print(message)

In [21]:
# JUST BATHROOMS MODEL 
print(linear_regression(train, test, ['bathrooms'], 'price'))

['bathrooms'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 890.4750258391745
             
The Root Mean Squared Error of this model is 1270.9357458600039
             
The r_2 score of this model is 0.4803041835559483
             
The coefficients of this model are [2551.697525461011]
None


In [22]:
# BEDROOMS AND BATHROOMS MODEL
linear_regression(train, test, ['bedrooms', 'bathrooms'], 'price')

['bedrooms', 'bathrooms'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 825.8922373537173
             
The Root Mean Squared Error of this model is 1219.7340855191514
             
The r_2 score of this model is 0.5213342514889859
             
The coefficients of this model are [389.32489590255824, 2072.6101163851895]


In [23]:
# BED/BATH RATIO AND TOTALE AMENITIES
linear_regression(train, test, ['bed_bath_ratio', 'total_amenities'], 'price')

['bed_bath_ratio', 'total_amenities'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 1066.1488264438128
             
The Root Mean Squared Error of this model is 1635.0043512703805
             
The r_2 score of this model is 0.13991800344429295
             
The coefficients of this model are [440.73814139346644, 155.5587799625268]


In [24]:
# BEDROOMS AND TOTAL AMENITIES MODEL
linear_regression(train, test, ['bedrooms', 'total_amenities'], 'price')

['bedrooms', 'total_amenities'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 934.4229206592794
             
The Root Mean Squared Error of this model is 1425.0600600331727
             
The r_2 score of this model is 0.34661619139781863
             
The coefficients of this model are [810.7866446574795, 124.82387108689248]


In [25]:
# BEDROOMS BATHROOMS AND TOTAL AMENITIES
features = ['bedrooms', 'bathrooms', 'total_amenities']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_amenities'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 799.729978451697
             
The Root Mean Squared Error of this model is 1192.1023410135072
             
The r_2 score of this model is 0.5427759006760544
             
The coefficients of this model are [391.9856492605974, 1936.2768075981717, 78.16664365401857]


In [26]:
# BASIC LATITUDE AND LONGITUDE MODEL
features = ['latitude', 'longitude']
linear_regression(train, test, features, 'price')

['latitude', 'longitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 1139.7529622102
             
The Root Mean Squared Error of this model is 1703.1297222128528
             
The r_2 score of this model is 0.06675109799044554
             
The coefficients of this model are [2208.1897189587876, -16215.705413894815]


In [27]:
# BEDROOMS, LATITUDE, AND LONGITUDE MODEL
features = ['bedrooms', 'latitude', 'longitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'latitude', 'longitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 898.3250217998537
             
The Root Mean Squared Error of this model is 1401.0943815523249
             
The r_2 score of this model is 0.36840771777093917
             
The coefficients of this model are [881.6137812307753, 2593.529265404981, -18257.106444914247]


In [28]:
# BEDROOMS, BATHROOMS, AND LATITUDE
features = ['bedrooms', 'bathrooms', 'latitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'latitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 820.0924642969193
             
The Root Mean Squared Error of this model is 1217.8576935723745
             
The r_2 score of this model is 0.5228058406090675
             
The coefficients of this model are [389.1030043555469, 2075.1601408400616, -2161.7963234382933]


In [29]:
# NEW FEATURE

# Whats in the data again?
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,total_amenities,bed_bath_ratio,description_length,pets_allowed,total_rooms,lux
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,48816,False,4.5,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2.0,48816,True,3.0,3
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1.0,48816,False,2.0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1.0,48816,False,2.0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,4.0,48816,False,5.0,0


In [30]:
# BATHROOMS AND DESCRIPTION LENGTH MODEL
features = ['bathrooms', 'description_length']
linear_regression(train, test, features, 'price')

['bathrooms', 'description_length'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 890.4750258391745
             
The Root Mean Squared Error of this model is 1270.9357458600039
             
The r_2 score of this model is 0.4803041835559483
             
The coefficients of this model are [2551.697525461011, 0.0]


In [31]:
# BEDROOMS, BATHROOMS, AND DESCRIPTION LENGTH
features = ['bedrooms', 'bathrooms', 'description_length']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'description_length'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 825.8922373537173
             
The Root Mean Squared Error of this model is 1219.7340855191514
             
The r_2 score of this model is 0.5213342514889859
             
The coefficients of this model are [389.32489590255824, 2072.6101163851895, 0.0]


In [32]:
# BEDROOMS, BATHROOMS, AND PETS ALLOWED
features = ['bedrooms', 'bathrooms', 'pets_allowed']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'pets_allowed'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 822.7426871441967
             
The Root Mean Squared Error of this model is 1217.431101420411
             
The r_2 score of this model is 0.5231400859508328
             
The coefficients of this model are [391.1745787082467, 2066.9752735917696, 147.0161367123075]


In [33]:
# BEDROOMS, BATHROOMS, TOTAL AMENITIES, AND PETS ALLOWED
features = ['bedrooms', 'bathrooms', 'total_amenities', 'pets_allowed']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_amenities', 'pets_allowed'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 799.9393771974412
             
The Root Mean Squared Error of this model is 1192.154053950578
             
The r_2 score of this model is 0.5427362314075621
             
The coefficients of this model are [391.65747859474965, 1934.8299097813833, 79.65809259066327, -30.118744745991602]


In [34]:
# BEDROOMS, BATHROOMS, LATITUDE, TOTAL AMENITIES
features = ['bedrooms', 'bathrooms', 'latitude', 'total_amenities']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'latitude', 'total_amenities'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 796.0726053911799
             
The Root Mean Squared Error of this model is 1191.0272650299846
             
The r_2 score of this model is 0.5436002074210817
             
The coefficients of this model are [391.76628042116823, 1940.788081505113, -1649.3736377653893, 76.69560294711141]


In [35]:
features = ['bedrooms', 'bathrooms', 'latitude', 'total_amenities', 'pets_allowed']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'latitude', 'total_amenities', 'pets_allowed'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 796.2098036833747
             
The Root Mean Squared Error of this model is 1191.0890519714414
             
The r_2 score of this model is 0.543552852873378
             
The coefficients of this model are [391.4206366900509, 1939.2788638274542, -1653.8048694634924, 78.25983174374664, -31.668292207068582]


In [36]:
# TOTAL ROOMS AND TOTAL AMENITIES MODEL
features = ['total_rooms', 'total_amenities']
linear_regression(train, test, features, 'price')

['total_rooms', 'total_amenities'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 867.9707445753579
             
The Root Mean Squared Error of this model is 1293.2247592273623
             
The r_2 score of this model is 0.4619160316556864
             
The coefficients of this model are [767.9566227199571, 101.76435154698719]


In [37]:
# BATHROOMS AND TOTAL ROOMS
features = ['bathrooms', 'total_rooms']
linear_regression(train, test, features, 'price')

['bathrooms', 'total_rooms'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 825.8922373537173
             
The Root Mean Squared Error of this model is 1219.7340855191514
             
The r_2 score of this model is 0.5213342514889859
             
The coefficients of this model are [1683.2852204826352, 389.32489590255585]


In [38]:
# BATHROOMS, TOTAL ROOMS, TOTAL AMENITIES, PETS ALLOWED
features = ['bathrooms', 'total_rooms', 'total_amenities', 'pets_allowed']
linear_regression(train, test, features, 'price')

['bathrooms', 'total_rooms', 'total_amenities', 'pets_allowed'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 799.9393771974414
             
The Root Mean Squared Error of this model is 1192.1540539505777
             
The r_2 score of this model is 0.5427362314075621
             
The coefficients of this model are [1543.1724311866444, 391.6574785947489, 79.65809259066401, -30.1187447459915]


In [39]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, TOTAL AMENITIES
features = ['bedrooms', 'bathrooms', 'total_rooms', 'total_amenities']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'total_amenities'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 799.730608620003
             
The Root Mean Squared Error of this model is 1192.1007664955791
             
The r_2 score of this model is 0.5427771084701192
             
The coefficients of this model are [12665939984616.465, 12665939986160.582, -12665939984224.262, 78.1759033203125]


In [40]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, LUX AMENITIES MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 782.1401429919595
             
The Root Mean Squared Error of this model is 1171.4803865054546
             
The r_2 score of this model is 0.5584579445114146
             
The coefficients of this model are [68067180574743.87, 68067180576192.98, -68067180574325.93, 191.143798828125]


In [41]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, LUX AMENITIES, TOTAL AMENITIES AND LATITUDE MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'latitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'latitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 776.7142028637526
             
The Root Mean Squared Error of this model is 1167.9696557785787
             
The r_2 score of this model is 0.561100434357976
             
The coefficients of this model are [-340.3217768908227, 1105.0203552341231, 764.6985783433006, 255.4179814630317, -39.99266931357897, -1700.0407532788404]


In [42]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, LUX AMENITIES, TOTAL AMENITIES AND LONGITUDE MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'longitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'longitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 716.9465757606492
             
The Root Mean Squared Error of this model is 1107.8985620114117
             
The r_2 score of this model is 0.6050864558423065
             
The coefficients of this model are [-323.8646600510237, 1098.3738593254907, 774.5091992744664, 215.28016078755113, -50.264129076065615, -13500.661641042156]


In [43]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, PETS ALLOWED, LUX AMENITIES, TOTAL AMENITIES AND LONGITUDE MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 712.4297783237304
             
The Root Mean Squared Error of this model is 1102.2633780504755
             
The r_2 score of this model is 0.6090935931626714
             
The coefficients of this model are [-316.29526278603157, 1092.2514276522782, 775.9561648662478, 268.76792141995094, -85.09930233209121, 212.78151752811297, -13443.245496901842]


In [44]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, PETS ALLOWED, LUX AMENITIES, DESCRIPTION LENGTH, TOTAL AMENITIES AND LONGITUDE MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude', 'description_length']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude', 'description_length'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 712.4297783237286
             
The Root Mean Squared Error of this model is 1102.2633780504752
             
The r_2 score of this model is 0.6090935931626715
             
The coefficients of this model are [-316.29526278603157, 1092.2514276522782, 775.9561648662478, 268.76792141995094, -85.09930233209121, 212.78151752811297, -13443.24549690184, 0.0]


In [45]:
# BATHROOMS, TOTAL ROOMS, PETS ALLOWED, LUX AMENITIES, TOTAL AMENITIES AND LONGITUDE MODEL
features = ['bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude']
linear_regression(train, test, features, 'price')

['bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 712.429778323727
             
The Root Mean Squared Error of this model is 1102.2633780504761
             
The r_2 score of this model is 0.6090935931626709
             
The coefficients of this model are [1408.5466904383172, 459.66090208021416, 268.76792141994815, -85.09930233208945, 212.78151752810388, -13443.245496901849]


In [51]:
# Changed Lux Amenities before running this model again
# BEDROOMS, BATHROOMS, TOTAL ROOMS, PETS ALLOWED, LUX AMENITIES, TOTAL AMENITIES AND LONGITUDE MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'longitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 712.6756075286497
             
The Root Mean Squared Error of this model is 1097.973909919083
             
The r_2 score of this model is 0.6121301051329555
             
The coefficients of this model are [-311.4214748899762, 1079.143244895947, 767.7217700059725, 298.9345914495018, -113.07197068703937, 270.262453573441, -13668.051325218496]


In [52]:
# BEDROOMS, BATHROOMS, TOTAL ROOMS, PETS ALLOWED, LUX AMENITIES, TOTAL AMENITIES AND LATITUDE MODEL
features = ['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'latitude']
linear_regression(train, test, features, 'price')

['bedrooms', 'bathrooms', 'total_rooms', 'lux', 'total_amenities', 'pets_allowed', 'latitude'] Model
             
-----------------------
             
The Mean Absolute Error of this model is 772.3282737411515
             
The Root Mean Squared Error of this model is 1159.8285621680116
             
The r_2 score of this model is 0.567197629729937
             
The coefficients of this model are [-330.3830930841767, 1088.1719458816465, 757.7888527974671, 327.47040839041233, -98.47888867097168, 274.9669783925911, -1791.544290550858]
