<a href="https://colab.research.google.com/github/alastra32/DS-Unit-2-Regression-Classification/blob/master/module2/assignment_regression_classification_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science, Unit 2: Predictive Modeling

# Regression & Classification, Module 2

## Assignment

You'll continue to **predict how much it costs to rent an apartment in NYC,** using the dataset from renthop.com.

- [ ] Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.
- [ ] Engineer at least two new features. (See below for explanation & ideas.)
- [ ] Fit a linear regression model with at least two features.
- [ ] Get the model's coefficients and intercept.
- [ ] Get regression metrics RMSE, MAE, and $R^2$, for both the train and test data.
- [ ] What's the best test MAE you can get? Share your score and features used with your cohort on Slack!
- [ ] As always, commit your notebook to your fork of the GitHub repo.


#### [Feature Engineering](https://en.wikipedia.org/wiki/Feature_engineering)

> "Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used." — Pedro Domingos, ["A Few Useful Things to Know about Machine Learning"](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

> "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." — Andrew Ng, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf) 

> Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. 

#### Feature Ideas
- Does the apartment have a description?
- How long is the description?
- How many total perks does each apartment have?
- Are cats _or_ dogs allowed?
- Are cats _and_ dogs allowed?
- Total number of rooms (beds + baths)
- Ratio of beds to baths
- What's the neighborhood, based on address or latitude & longitude?

## Stretch Goals
- [ ] If you want more math, skim [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf),  Chapter 3.1, Simple Linear Regression, & Chapter 3.2, Multiple Linear Regression
- [ ] If you want more introduction, watch [Brandon Foltz, Statistics 101: Simple Linear Regression](https://www.youtube.com/watch?v=ZkjP5RJLQF4)
(20 minutes, over 1 million views)
- [ ] Do the [Plotly Dash](https://dash.plot.ly/) Tutorial, Parts 1 & 2.
- [ ] Add your own stretch goal(s) !

In [1]:
# If you're in Colab...
import os, sys
in_colab = 'google.colab' in sys.modules

if in_colab:
    # Install required python packages:
    # pandas-profiling, version >= 2.0
    # plotly, version >= 4.0
    !pip install --upgrade pandas-profiling plotly
    
    # Pull files from Github repo
    os.chdir('/content')
    !git init .
    !git remote add origin https://github.com/LambdaSchool/DS-Unit-2-Regression-Classification.git
    !git pull origin master
    
    # Change into directory for module
    os.chdir('module1')

Collecting pandas-profiling
[?25l  Downloading https://files.pythonhosted.org/packages/2c/2f/aae19e2173c10a9bb7fee5f5cad35dbe53a393960fc91abc477dcc4661e8/pandas-profiling-2.3.0.tar.gz (127kB)
[K     |██▋                             | 10kB 14.7MB/s eta 0:00:01[K     |█████▏                          | 20kB 1.8MB/s eta 0:00:01[K     |███████▊                        | 30kB 2.6MB/s eta 0:00:01[K     |██████████▎                     | 40kB 1.7MB/s eta 0:00:01[K     |████████████▉                   | 51kB 2.1MB/s eta 0:00:01[K     |███████████████▍                | 61kB 2.5MB/s eta 0:00:01[K     |██████████████████              | 71kB 2.9MB/s eta 0:00:01[K     |████████████████████▋           | 81kB 3.3MB/s eta 0:00:01[K     |███████████████████████▏        | 92kB 3.7MB/s eta 0:00:01[K     |█████████████████████████▊      | 102kB 2.8MB/s eta 0:00:01[K     |████████████████████████████▎   | 112kB 2.8MB/s eta 0:00:01[K     |██████████████████████████████▉ | 122kB 2.8MB/

In [0]:
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')

In [0]:
import numpy as np
import pandas as pd

# Read New York City apartment rental listing data
df = pd.read_csv('../data/renthop-nyc.csv')
assert df.shape == (49352, 34)

# Remove the most extreme 1% prices,
# the most extreme .1% latitudes, &
# the most extreme .1% longitudes
df = df[(df['price'] >= np.percentile(df['price'], 0.5)) & 
        (df['price'] <= np.percentile(df['price'], 99.5)) & 
        (df['latitude'] >= np.percentile(df['latitude'], 0.05)) & 
        (df['latitude'] < np.percentile(df['latitude'], 99.95)) &
        (df['longitude'] >= np.percentile(df['longitude'], 0.05)) & 
        (df['longitude'] <= np.percentile(df['longitude'], 99.95))]

In [4]:
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


 Do train/test split. Use data from April & May 2016 to train. Use data from June 2016 to test.

In [5]:
df['Month']=pd.to_datetime(df['created']).dt.strftime('%B')
df.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,Month
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,June
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,June
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April


In [10]:
df_train=df[df.Month.isin(['April','May'])]
                         
df_test=df[df['Month']=='June']

df_train.head()

Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,Month
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April
5,2.0,4,2016-04-19 04:24:47,,West 18th Street,40.7429,-74.0028,7995,350 West 18th Street,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April
6,1.0,2,2016-04-27 03:19:56,Stunning unit with a great location and lots o...,West 107th Street,40.8012,-73.966,3600,210 West 107th Street,low,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April


Engineer at least two new features. (See below for explanation & ideas.)

In [12]:
df.columns

Index(['bathrooms', 'bedrooms', 'created', 'description', 'display_address',
       'latitude', 'longitude', 'price', 'street_address', 'interest_level',
       'elevator', 'cats_allowed', 'hardwood_floors', 'dogs_allowed',
       'doorman', 'dishwasher', 'no_fee', 'laundry_in_building',
       'fitness_center', 'pre-war', 'laundry_in_unit', 'roof_deck',
       'outdoor_space', 'dining_room', 'high_speed_internet', 'balcony',
       'swimming_pool', 'new_construction', 'terrace', 'exclusive', 'loft',
       'garden_patio', 'wheelchair_access', 'common_outdoor_space', 'Month'],
      dtype='object')

In [13]:
df_train['rooms']=df_train['bathrooms']+df_train['bedrooms']
df_test['rooms']=df_test['bathrooms']+df_test['bedrooms']
df_train.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,Month,rooms
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,2.0
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,2.0
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,5.0
5,2.0,4,2016-04-19 04:24:47,,West 18th Street,40.7429,-74.0028,7995,350 West 18th Street,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,6.0
6,1.0,2,2016-04-27 03:19:56,Stunning unit with a great location and lots o...,West 107th Street,40.8012,-73.966,3600,210 West 107th Street,low,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,3.0


In [15]:



df_train['features']=df_train['elevator']+df_train['cats_allowed']+df_train['hardwood_floors']+df_train['dogs_allowed']+df_train['dishwasher']+df_train['no_fee']+df_train['laundry_in_building']+df_train['fitness_center']+df_train['pre-war']+df_train['laundry_in_unit']+df_train['roof_deck']+df_train['outdoor_space']+df_train['dining_room']+df_train['high_speed_internet']+df_train['balcony']+df_train['new_construction']+df_train['terrace']+df_train['exclusive']+df_train['loft']+df_train['garden_patio']+df_train['wheelchair_access']+df_train['common_outdoor_space']

df_train.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,Month,rooms,features
2,1.0,1,2016-04-17 03:26:41,"Top Top West Village location, beautiful Pre-w...",W 13 Street,40.7388,-74.0018,2850,241 W 13 Street,high,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,2.0,3
3,1.0,1,2016-04-18 02:22:02,Building Amenities - Garage - Garden - fitness...,East 49th Street,40.7539,-73.9677,3275,333 East 49th Street,low,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,2.0,2
4,1.0,4,2016-04-28 01:32:41,Beautifully renovated 3 bedroom flex 4 bedroom...,West 143rd Street,40.8241,-73.9493,3350,500 West 143rd Street,low,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,5.0,1
5,2.0,4,2016-04-19 04:24:47,,West 18th Street,40.7429,-74.0028,7995,350 West 18th Street,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,6.0,0
6,1.0,2,2016-04-27 03:19:56,Stunning unit with a great location and lots o...,West 107th Street,40.8012,-73.966,3600,210 West 107th Street,low,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,April,3.0,3


In [18]:
df_test['features']=df_test['elevator']+df_test['cats_allowed']+df_test['hardwood_floors']+df_test['dogs_allowed']+df_test['dishwasher']+df_test['no_fee']+df_test['laundry_in_building']+df_test['fitness_center']+df_test['pre-war']+df_test['laundry_in_unit']+df_test['roof_deck']+df_test['outdoor_space']+df_test['dining_room']+df_test['high_speed_internet']+df_test['balcony']+df_test['new_construction']+df_test['terrace']+df_test['exclusive']+df_test['loft']+df_test['garden_patio']+df_test['wheelchair_access']+df_test['common_outdoor_space']

df_test.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,bathrooms,bedrooms,created,description,display_address,latitude,longitude,price,street_address,interest_level,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,Month,rooms,features
0,1.5,3,2016-06-24 07:54:24,A Brand New 3 Bedroom 1.5 bath ApartmentEnjoy ...,Metropolitan Avenue,40.7145,-73.9425,3000,792 Metropolitan Avenue,medium,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,June,4.5,0
1,1.0,2,2016-06-12 12:19:27,,Columbus Avenue,40.7947,-73.9667,5465,808 Columbus Avenue,low,1,1,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,June,3.0,4
11,1.0,1,2016-06-03 03:21:22,Check out this one bedroom apartment in a grea...,W. 173rd Street,40.8448,-73.9396,1675,644 W. 173rd Street,low,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,June,2.0,0
14,1.0,1,2016-06-01 03:11:01,Spacious 1-Bedroom to fit King-sized bed comfo...,East 56th St..,40.7584,-73.9648,3050,315 East 56th St..,low,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,June,2.0,3
24,2.0,4,2016-06-07 04:39:56,SPRAWLING 2 BEDROOM FOUND! ENJOY THE LUXURY OF...,W 18 St.,40.7391,-73.9936,7400,30 W 18 St.,medium,1,1,1,1,1,1,0,0,1,0,0,0,1,0,1,1,0,0,1,0,0,0,0,0,June,6.0,10


In [62]:
df_train.corr()

Unnamed: 0,bathrooms,bedrooms,latitude,longitude,price,elevator,cats_allowed,hardwood_floors,dogs_allowed,doorman,dishwasher,no_fee,laundry_in_building,fitness_center,pre-war,laundry_in_unit,roof_deck,outdoor_space,dining_room,high_speed_internet,balcony,swimming_pool,new_construction,terrace,exclusive,loft,garden_patio,wheelchair_access,common_outdoor_space,rooms,features,features2
bathrooms,1.0,0.526102,0.012872,-0.019719,0.684137,0.128704,0.0221,0.095403,0.02507,0.154125,0.173795,0.129592,-0.014814,0.1479,-0.014654,0.211802,0.103708,0.135735,0.229534,0.090885,0.128851,0.113007,0.065157,0.133543,-0.000402,0.007647,0.090583,0.067371,-0.009281,0.74634,0.228665,0.25446
bedrooms,0.526102,1.0,0.00465,0.055544,0.5365,-0.030263,-0.008355,0.096108,-0.006896,-0.046827,0.15662,0.16256,0.000825,0.015655,-0.002732,0.153591,0.044011,0.118787,0.190639,0.058695,0.098536,0.03438,-0.002055,0.099822,-0.01556,-0.10975,0.073061,0.011869,-0.005031,0.958661,0.133949,0.093512
latitude,0.012872,0.00465,1.0,0.329175,-0.039129,-0.016379,-0.035711,0.019477,-0.038169,-0.042532,-0.02418,-0.018046,-0.055495,-0.107993,0.02826,-0.047701,-0.062222,-0.084935,0.016496,-0.033745,0.019896,0.028281,-0.054351,0.004899,-0.05391,-0.016551,-0.002173,-0.072748,-0.124857,0.007946,-0.075983,-0.0587
longitude,-0.019719,0.055544,0.329175,1.0,-0.250091,-0.190341,-0.064892,-0.106368,-0.077785,-0.274412,-0.162148,-0.087616,-0.058856,-0.256357,0.002466,-0.123731,-0.158661,-0.107538,-0.01746,-0.128477,-0.03596,-0.071829,-0.107124,-0.052616,0.048941,-0.058037,-0.029241,-0.064151,-0.115252,0.036873,-0.237701,-0.27416
price,0.684137,0.5365,-0.039129,-0.250091,1.0,0.204558,0.052167,0.105506,0.060905,0.272624,0.227775,0.135182,-0.020344,0.226138,-0.029749,0.279104,0.123536,0.136653,0.239696,0.092171,0.130938,0.132301,0.071246,0.142655,-0.010897,0.000185,0.092367,0.07306,0.006269,0.648791,0.284938,0.361714
elevator,0.128704,-0.030263,-0.016379,-0.190341,0.204558,1.0,0.039135,0.267743,0.038985,0.617265,0.342497,0.230979,0.14428,0.431833,-0.096178,0.128841,0.330937,0.212922,0.197731,0.276124,0.172863,0.183826,0.187847,0.14218,0.024406,0.052694,0.088001,0.158554,0.123025,0.019379,0.556349,0.732767
cats_allowed,0.0221,-0.008355,-0.035711,-0.064892,0.052167,0.039135,1.0,-0.165084,0.936082,0.098092,-0.039279,-0.022044,0.106905,0.135813,0.045873,0.000729,0.034979,0.081362,-0.020895,0.083517,0.021379,0.011551,0.056047,0.011132,0.032739,-0.036948,0.010058,0.043787,0.112179,0.000855,0.35171,0.059079
hardwood_floors,0.095403,0.096108,0.019477,-0.106368,0.105506,0.267743,-0.165084,1.0,-0.173728,0.191773,0.634526,0.347272,-0.147549,0.160174,0.011397,0.353154,0.272214,0.18387,0.316261,0.241037,0.175504,0.167263,0.185692,0.179247,-0.192835,0.116145,0.16297,0.12385,-0.125314,0.107139,0.514831,0.485081
dogs_allowed,0.02507,-0.006896,-0.038169,-0.077785,0.060905,0.038985,0.936082,-0.173728,1.0,0.104055,-0.036654,-0.010719,0.09141,0.139972,0.051973,0.011688,0.041319,0.081291,-0.017096,0.097098,0.024257,0.010426,0.069262,0.009273,0.032616,-0.041726,0.013057,0.053235,0.114579,0.002992,0.35787,0.065603
doorman,0.154125,-0.046827,-0.042532,-0.274412,0.272624,0.617265,0.098092,0.191773,0.104055,1.0,0.299848,0.260018,0.086744,0.605068,-0.054007,0.154492,0.390499,0.21372,0.188729,0.317876,0.163822,0.262984,0.221016,0.134242,-0.072733,0.006645,0.079066,0.17169,0.13654,0.014922,0.530984,0.767941


In [42]:

df_train['features2']=df_train['elevator']+df_train['dishwasher']+df_train['fitness_center']+df_train['doorman']+df_train['laundry_in_unit']+df_train['dining_room']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [43]:

df_test['features2']=df_test['elevator']+df_test['dishwasher']+df_test['fitness_center']+df_test['doorman']+df_test['laundry_in_unit']+df_test['dining_room']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


 Fit a linear regression model with at least two features.

In [0]:
#1 Import the appropriate estimator class from Scikit-Learn
from sklearn.linear_model import LinearRegression

#2 Instantiate this class
model = LinearRegression()

#3 Arrange X features matrix & y target vector
features = ['rooms','features']
target='price'

X=df_train[features]
y=df_train[target]

#4 Fit the model
model.fit(X,y)

#5 Apply the Model

y_pred = model.predict(X)

 Get the model's coefficients and intercept.

In [20]:
  #Explanation of the model equation for the model coefficient
  print('y= mx + nz + b')
  print(f'y= {model.coef_[0]}*x + {model.coef_[1]}*z +{model.intercept_}')
  print(f'price={model.coef_[0]}*rooms+{model.coef_[1]}*features+{model.intercept_}')

y= mx + nz + b
y= 770.5353565041589*x + 99.29291144627857*z +1051.1104978097264
price=770.5353565041589*rooms+99.29291144627857*features+1051.1104978097264


 Get regression metrics RMSE, MAE, and  R2 , for both the train and test data.

In [34]:

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

features = ['rooms','features']
target='price'


 # Make predictions
    
x = df_train[features]
y = df_train[target]
y_pred = model.coef_[0]*df_train['rooms']+model.coef_[1]*df_train['features']+model.intercept_

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Mean Squared Error: 1708182.9438065388
Root Mean Squared Error: 1306.9747295975308
Mean Absolute Error: 866.2869475401131
R^2: 0.4498655651617107


In [32]:
features = ['rooms','features']
target='price'


 # Make predictions
    
x = df_test[features]
y = df_test[target]
y_pred = model.coef_[0]*df_test['rooms']+model.coef_[1]*df_test['features']+model.intercept_

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)

Mean Squared Error: 1704645.5768457658
Root Mean Squared Error: 1305.6207630264487
Mean Absolute Error: 878.655528439358
R^2: 0.45153349030215495


In [48]:
#1 Import the appropriate estimator class from Scikit-Learn
from sklearn.linear_model import LinearRegression

#2 Instantiate this class
model = LinearRegression()

#3 Arrange X features matrix & y target vector
features = ['bathrooms','bedrooms','features2']
target='price'

X=df_train[features]
y=df_train[target]

#4 Fit the model
model.fit(X,y)

#5 Apply the Model

y_pred = model.predict(X)


 # Make predictions
    
x = df_train[features]
y = df_train[target]
y_pred =model.predict(x)

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('---------------Train------------------')
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)


 # Make predictions
    
x = df_test[features]
y = df_test[target]
y_pred = model.predict(x)

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('---------------Test------------------')
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)




---------------Train------------------
Mean Squared Error: 1388090.2302284767
Root Mean Squared Error: 1178.172411079328
Mean Absolute Error: 772.5654688124806
R^2: 0.5529540690708478
---------------Test------------------
Mean Squared Error: 1366961.954908933
Root Mean Squared Error: 1169.1714822509712
Mean Absolute Error: 781.2345590185723
R^2: 0.5601825608312476


In [50]:
#1 Import the appropriate estimator class from Scikit-Learn
from sklearn.linear_model import LinearRegression

#2 Instantiate this class
model = LinearRegression()

#3 Arrange X features matrix & y target vector
features = ['bathrooms','bedrooms']
target='price'

X=df_train[features]
y=df_train[target]

#4 Fit the model
model.fit(X,y)

#5 Apply the Model

y_pred = model.predict(X)


 # Make predictions
    
x = df_train[features]
y = df_train[target]
y_pred = model.predict(x)

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('---------------Train------------------')
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)


 # Make predictions
    
x = df_test[features]
y = df_test[target]
y_pred = model.predict(x)

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('---------------Test------------------')
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)


---------------Train------------------
Mean Squared Error: 1517879.6665142523
Root Mean Squared Error: 1232.0225917223484
Mean Absolute Error: 818.5310213271714
R^2: 0.5111543084316607
---------------Test------------------
Mean Squared Error: 1487715.3104108905
Root Mean Squared Error: 1219.719357233823
Mean Absolute Error: 825.8987822403527
R^2: 0.5213303957090345


In [51]:
#1 Import the appropriate estimator class from Scikit-Learn
from sklearn.linear_model import LinearRegression

#2 Instantiate this class
model = LinearRegression()

#3 Arrange X features matrix & y target vector
features = ['bathrooms','bedrooms','features2','latitude','longitude']
target='price'

X=df_train[features]
y=df_train[target]

#4 Fit the model
model.fit(X,y)

#5 Apply the Model

y_pred = model.predict(X)


 # Make predictions
    
x = df_train[features]
y = df_train[target]
y_pred = model.predict(x)

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('---------------Train------------------')
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)


 # Make predictions
    
x = df_test[features]
y = df_test[target]
y_pred = model.predict(x)

    # Print regression metrics
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y, y_pred)
r2 = r2_score(y, y_pred)
print('---------------Test------------------')
print('Mean Squared Error:', mse)
print('Root Mean Squared Error:', rmse)
print('Mean Absolute Error:', mae)
print('R^2:', r2)


---------------Train------------------
Mean Squared Error: 1253805.4721682507
Root Mean Squared Error: 1119.7345543334147
Mean Absolute Error: 711.9387984279833
R^2: 0.5962015852404188
---------------Test------------------
Mean Squared Error: 1228886.6813627714
Root Mean Squared Error: 1108.5516142078236
Mean Absolute Error: 719.4938715419461
R^2: 0.6046080205198044
