<a href="https://colab.research.google.com/github/Pdugovich/DS-Unit-2-Regression-Classification/blob/master/module2/assignment_regression_classification_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science, Unit 2: Predictive Modeling

# Regression & Classification, Module 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [ ] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [ ] Engineer at least two new features. (See below for explanation & ideas.)
- [ ] Fit a linear regression model with at least two features.
- [ ] Get the model's coefficients and intercept.
- [ ] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [ ] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [ ] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Add your own stretch goal(s) !

In [0]:
import os, sys
in_colab = 'google.colab' in sys.modules

# If you're in Colab...
if in_colab:
    # Pull files from Github repo
    os.chdir('/content')
    !git init .
    !git remote add origin https://github.com/LambdaSchool/DS-Unit-2-Regression-Classification.git
    !git pull origin master
    
    # Install required python packages
    !pip install -r requirements.txt
    
    # Change into directory for module
    os.chdir('module2')

Reinitialized existing Git repository in /content/.git/
fatal: remote origin already exists.
From https://github.com/LambdaSchool/DS-Unit-2-Regression-Classification
 * branch            master     -> FETCH_HEAD
Already up to date.


In [0]:
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [0]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv('../data/apartments/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

### Feature Engineering before split so I don't have to reorganize anything

In [0]:
df.isnull().sum()

bathrooms                  0
bedrooms                   0
created                    0
description             1425
display_address          133
latitude                   0
longitude                  0
price                      0
street_address            10
interest_level             0
elevator                   0
cats_allowed               0
hardwood_floors            0
dogs_allowed               0
doorman                    0
dishwasher                 0
no_fee                     0
laundry_in_building        0
fitness_center             0
pre-war                    0
laundry_in_unit            0
roof_deck                  0
outdoor_space              0
dining_room                0
high_speed_internet        0
balcony                    0
swimming_pool              0
new_construction           0
terrace                    0
exclusive                  0
loft                       0
garden_patio               0
wheelchair_access          0
common_outdoor_space       0
beds_per_bath 

In [0]:
df.head(1)

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [0]:
df['beds_per_bath'] = df['bedrooms'] / df['bathrooms']

In [0]:
#I'm seeing some zeroes, which shouldn't be the case
df['beds_per_bath']

In [0]:
df['bathrooms'].value_counts()

1.0     39152
2.0      7619
3.0       680
1.5       645
0.0       304
2.5       256
4.0        93
3.5        55
4.5         8
5.0         4
10.0        1
Name: bathrooms, dtype: int64

In [0]:
#Alright, I'm really interested in these apartments with 0 bedrooms.
df['bedrooms'].value_counts()

1    15651
2    14569
0     9317
3     7188
4     1825
5      221
6       43
8        2
7        1
Name: bedrooms, dtype: int64

In [0]:
zero_bed_apartments = df[df['bedrooms'] == 0]

In [0]:
# Ah yes, Studio apartments. That makes sense.
zero_bed_apartments.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,beds_per_bath
10,1.0,0,2016-04-14 01:10:30,New to the market! Spacious studio located in ...,York Avenue,40.7769,-73.9467,1950,1661 York Avenue,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0
15,1.0,0,2016-04-18 02:36:00,Stunning full renovated studio unit. High cei...,East 34th Street,40.7439,-73.9743,2350,340 East 34th Street,medium,1,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0
30,1.0,0,2016-04-21 02:17:28,Enjoy the Upper West Side life-style! This ap...,250 West 88th Street,40.7897,-73.976,2750,250 West 88th Street,medium,1,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0.0
37,1.0,0,2016-04-17 02:16:42,Located in one of Manhattan's most desirable a...,West 58th Street,40.7649,-73.9763,1980,57 West 58th Street,high,1,1,0,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0
39,1.0,0,2016-06-29 04:08:35,Prime Location!! This Luxury Chelsea building ...,W 34 St.,40.753,-73.9959,2396,360 W 34 St.,medium,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0.0


In [0]:
zero_bath_apartments = df[df['bathrooms'] == 0]

In [0]:
# Can't wrap my head around these guys.
zero_bath_apartments

In [0]:
zero_apartments = zero_bath_apartments[zero_bath_apartments['bedrooms'] ==0]

In [0]:
zero_apartments

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,beds_per_bath
240,0.0,0,2016-06-12 14:12:49,,46th Street,40.6503,-74.0135,2195,250 46th Street,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
300,0.0,0,2016-04-07 02:49:51,"Enjoy the great restaurants, clubs, boutiques ...",408 East 92nd Street,40.7805,-73.9464,2775,408 East 92nd Street,low,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,
481,0.0,0,2016-04-05 02:49:40,Amidst the vibrant energy of the West Village ...,10 Downing Street,40.7295,-74.0029,3695,10 Downing Street,low,1,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,
679,0.0,0,2016-04-22 02:50:22,SpaciousA StudioA Apartment in Modern New Deve...,1465 Fifth Avenue,40.8021,-73.9450,2024,1465 Fifth Avenue,low,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,
1379,0.0,0,2016-04-16 04:35:39,West Village Studio. Amenities: 24hr ...,Christopher Street,40.7328,-74.0089,2900,165 Christopher Street,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
1601,0.0,0,2016-04-06 02:57:35,This doorman building in the heart of Murray H...,141 East 33rd Street,40.7456,-73.9797,3350,141 East 33rd Street,low,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,
2416,0.0,0,2016-04-30 02:59:33,New Renovation! Upper Level LoftListed on the ...,666 Greenwich Street,40.7321,-74.0080,5325,666 Greenwich Street,low,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,
2982,0.0,0,2016-04-29 02:56:55,"Featuring high-end amenities, from a stunning ...",315 West 33rd Street,40.7518,-73.9944,3175,315 West 33rd Street,low,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,
3091,0.0,0,2016-04-27 02:49:10,We Are Now Offering a 1 Month OP OR 1 Month Fr...,West 14th Street,40.7310,-73.9786,2700,West 14th Street,low,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,
3353,0.0,0,2016-04-06 08:02:26,,46th Street,40.6503,-74.0135,4800,250 46th Street,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,


In [0]:
# My mind is blown.
zero_apartments['price'].mean()

2965.3112582781455

In [0]:
#Lets make a feature for noting if it's an apartment with no beds or baths
df['zero_apartments'] =(df['bathrooms'] == 0) & (df['bedrooms']==0)

In [0]:
#Replacing true/false with 1/0
df['zero_apartments'] = df['zero_apartments'].replace({True: 1, False:0})

In [94]:
df['zero_apartments'].value_counts()

0    48666
1      151
Name: zero_apartments, dtype: int64

In [0]:
df['cats_and_dogs'] = (df['cats_allowed'] == 1) & (df['dogs_allowed'] == 1)

### A feature for total amenities



In [0]:
#Feature for total amenities
amenities = ['elevator','cats_allowed','hardwood_floors','dogs_allowed',
             'doorman','dishwasher','no_fee','laundry_in_building',
             'fitness_center','pre-war','laundry_in_unit','roof_deck',
             'outdoor_space','dining_room','high_speed_internet','balcony',
             'swimming_pool','new_construction','terrace','exclusive',
             'loft','garden_patio','wheelchair_access','common_outdoor_space'] 

df['sum_amenities'] =df[amenities].sum(axis=1)



In [0]:
df['sum_amenities']

### Train/Test splitting

In [95]:
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,beds_per_bath,sum_amenities,zero_apartments
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,5,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,3,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,2,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4.0,1,0


In [77]:
df.dtypes

bathrooms                      float64
bedrooms                         int64
created                 datetime64[ns]
description                     object
display_address                 object
latitude                       float64
longitude                      float64
price                            int64
street_address                  object
interest_level                  object
elevator                         int64
cats_allowed                     int64
hardwood_floors                  int64
dogs_allowed                     int64
doorman                          int64
dishwasher                       int64
no_fee                           int64
laundry_in_building              int64
fitness_center                   int64
pre-war                          int64
laundry_in_unit                  int64
roof_deck                        int64
outdoor_space                    int64
dining_room                      int64
high_speed_internet              int64
balcony                  

In [0]:
# What a great thing pd.to_datetime is
df['created'] = pd.to_datetime(df['created'])

In [0]:
# Creating two dataframes separated by months
train = df[df['created'].dt.month < 6]
test = df[df['created'].dt.month == 6]

In [0]:
train.head()

In [79]:
test.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,beds_per_bath,sum_amenities,zero_apartments
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,5,0
11,1.0,1,2016-06-03 03:21:22,Check out this one bedroom apartment in a grea...,W. 173rd Street,40.8448,-73.9396,1675,644 W. 173rd Street,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,0,0
14,1.0,1,2016-06-01 03:11:01,Spacious 1-Bedroom to fit King-sized bed comfo...,East 56th St..,40.7584,-73.9648,3050,315 East 56th St..,low,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,3,0
24,2.0,4,2016-06-07 04:39:56,SPRAWLING 2 BEDROOM FOUND! ENJOY THE LUXURY OF...,W 18 St.,40.7391,-73.9936,7400,30 W 18 St.,medium,1,1,1,1,1,1,0,0,1,0,0,0,1,0,1,1,0,0,1,0,0,0,0,0,2.0,11,0


### Fit a linear regression model with two features

In [0]:
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression

In [100]:
# Grabbing a baseline

#selecting column name the y variable to guess
target = 'price'

# naming the target variable for the train and test data
y_train = train[target]
y_test = test[target]

# Getting the mean of the training data to use as baseline.
# This is what the single feature guess is
print('Baseline Error to gague progress')
baseline_guess = y_train.mean()

#Figuring out the training error  

# y prediction that is the LENGTH of the training data
y_pred_train = [baseline_guess] * len(y_train)
mae1 = mean_absolute_error(y_train,y_pred_train)
print('Train Error (April and May 2016): ' + str(mae1))


# Figuring out the error by creating a list of 
#guesses equal to the number of observations
#This is because our guess would be the same (the train mean)
#for every observation in the test

#y prediction that is the LENGTH of the test
y_pred_test = [baseline_guess] * len(y_test)
mae2 = mean_absolute_error(y_test, y_pred_test)
print('Test Error (July 2016): ' + str(mae2))

Baseline Error to gague progress
Train Error (April and May 2016): 1201.8811133682555
Test Error (July 2016): 1197.7088871089013


In [101]:
# I don't need to reassign these variables, but I will
target = 'price'
y_train = train[target]

features = ['sum_amenities','zero_apartments']

X_train = train[features]
X_test = test[features]

model1 = LinearRegression()

model1.fit(X_train,y_train)
y_pred_train = model1.predict(X_train)

mae_train = mean_absolute_error(y_train,y_pred_train)
print('Train Error: ' + str(mae_train))

y_pred_test = model1.predict(X_test)
mae_test = mean_absolute_error(y_test, y_pred_test)
print('Test Error: ' + str(mae_test))

Train Error: 1139.3561662777124
Test Error: 1126.9291816567852


Well, that did just a hair better than baseline.

### Models coefficients and intercept

In [104]:
model1.intercept_

2848.714461538271

In [105]:
# so every additional ammenity increases the cost of rent by 154
# and having an apartment with no beds or baths lowers the cost by 441
model1.coef_

array([ 154.62760735, -441.09704809])

### regression metrics RSME, MAE, R^2 for both train and test data

In [0]:
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [110]:
#Copying code from above and adding in regression metrics

target = 'price'
y_train = train[target]
y_test = test[target]

features = ['sum_amenities','zero_apartments']

X_train = train[features]
X_test = test[features]

model1 = LinearRegression()

print('Training Calculations')
#Calculating for train data
model1.fit(X_train,y_train)
y_pred_train = model1.predict(X_train)

#MAE
mae_train = mean_absolute_error(y_train,y_pred_train)
print('Train MAE: ' + str(mae_train))
#R2
r2_train = r2_score(y_train, y_pred_train)
print('Train R2: ' + str(r2_train))
#MSE
mse_train = mean_squared_error(y_train, y_pred_train)
print('Train MSE: ' + str(mse_train))
#RSME
rsme_train = mse_train**.5
print('Train RSME ' + str(rsme_train))

print('\nTesting Calculations')
# Calculating for test data
y_pred_test = model1.predict(X_test)
mae_test = mean_absolute_error(y_test, y_pred_test)

#MAE
mae_test = mean_absolute_error(y_test,y_pred_test)
print('Test MAE: ' + str(mae_test))

#R2
r2_test = r2_score(y_test, y_pred_test)
print('Test R2: ' + str(r2_test))

#MSE
mse_test = mean_squared_error(y_test, y_pred_test)
print('Test MSE: ' + str(mse_test))

#RSME
rsme_test = mse_test**.5
print('Test RSME ' + str(rsme_test))

Training Calculations
Train MAE: 1139.3561662777124
Train R2: 0.09155116202362279
Train MSE: 2820759.27642716
Train RSME 1679.5116184257731

Testing Calculations
Test MAE: 1126.9291816567852
Test R2: 0.09639548065475612
Test MSE: 2808422.0638527097
Test RSME 1675.834736438146


In [0]:
#selecting all numeric columns for MAE testing
df_for_mae = df.select_dtypes(include='number')

In [0]:
df_for_mae = df_for_mae.drop(columns='beds_per_bath')

In [126]:
df_for_mae.head()

Unnamed: 0,bathrooms,bedrooms,latitude,longitude,price,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,beds_per_bath,sum_amenities,zero_apartments
0,1.5,3,40.7145,-73.9425,3000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,0,0
1,1.0,2,40.7947,-73.9667,5465,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.0,5,0
2,1.0,1,40.7388,-74.0018,2850,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,3,0
3,1.0,1,40.7539,-73.9677,3275,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.0,2,0
4,1.0,4,40.8241,-73.9493,3350,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4.0,1,0


In [0]:
# for single column

mae_by_col = []
for feature in df_for_mae.columns:
  model = LinearRegression()
  features = [feature]
  target = 'price'
  X_train = df[features]
  y_train = df[target]
  model.fit(X_train,y_train)
  y_pred = model.predict(df[features])
  error = mean_absolute_error(y_train,y_pred)
  mae_by_col.append({str(feature): error})


In [131]:
mae_by_col

[{'bathrooms': 889.7631877413652},
 {'bedrooms': 975.6496767374764},
 {'latitude': 1199.04628931195},
 {'longitude': 1144.2648232666427},
 {'price': 3.9229259918074074e-13},
 {'elevator': 1173.064845040008},
 {'cats_allowed': 1198.04915820049},
 {'hardwood_floors': 1198.7261894084704},
 {'dogs_allowed': 1197.2191761169793},
 {'doorman': 1156.7361465139638},
 {'dishwasher': 1171.6623542947125},
 {'no_fee': 1192.776853526042},
 {'laundry_in_building': 1201.329312358066},
 {'fitness_center': 1169.6353645549684},
 {'pre-war': 1200.6825561891678},
 {'laundry_in_unit': 1160.9803893884323},
 {'roof_deck': 1192.8262688082948},
 {'outdoor_space': 1187.8398683194848},
 {'dining_room': 1168.3221049083145},
 {'high_speed_internet': 1195.6752136997882},
 {'balcony': 1187.515336153092},
 {'swimming_pool': 1193.1072880315883},
 {'new_construction': 1200.249343284789},
 {'terrace': 1188.733040985156},
 {'exclusive': 1201.3631129934672},
 {'loft': 1201.8790142812356},
 {'garden_patio': 1194.47669660679

In [0]:
#for two features

mae_by_cols = []
for feature1 in df_for_mae.columns:
  for feature2 in df_for_mae.columns:
    if feature1 != feature2:
      model = LinearRegression()
      features = [feature1,feature2]
      target = 'price'
      X_train = df[features]
      y_train = df[target]
      model.fit(X_train,y_train)
      y_pred = model.predict(df[features])
      error = mean_absolute_error(y_train,y_pred)
      mae_by_cols.append({str(feature1) + ' and ' + str(feature2): error})

In [174]:
mae_by_cols

[{'bathrooms and bedrooms': 821.9653796977909},
 {'bathrooms and latitude': 884.2877630183418},
 {'bathrooms and longitude': 821.8723107728202},
 {'bathrooms and price': 2.9315400644031846e-14},
 {'bathrooms and elevator': 868.1499237847396},
 {'bathrooms and cats_allowed': 886.8973673113898},
 {'bathrooms and hardwood_floors': 887.5908761524163},
 {'bathrooms and dogs_allowed': 886.2544423294763},
 {'bathrooms and doorman': 852.8241123269296},
 {'bathrooms and dishwasher': 874.0246561134173},
 {'bathrooms and no_fee': 884.5024072459859},
 {'bathrooms and laundry_in_building': 889.9824471756513},
 {'bathrooms and fitness_center': 865.9176198182182},
 {'bathrooms and pre-war': 889.0423513917731},
 {'bathrooms and laundry_in_unit': 876.1498877617796},
 {'bathrooms and roof_deck': 885.1700421642943},
 {'bathrooms and outdoor_space': 886.1912281309172},
 {'bathrooms and dining_room': 884.6410179198015},
 {'bathrooms and high_speed_internet': 888.1316231473022},
 {'bathrooms and balcony': 8