# Multiple Linear Regression

<p>Uses two or more independent variables to predict the values of the dependent variable.</p>

## Imports

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split

## Import dataset

In [13]:
streeteasy = pd.read_csv("manhattan.csv")
df = pd.DataFrame(streeteasy)

print(df.head())

   rental_id   rent  bedrooms  bathrooms  size_sqft  min_to_subway  floor  \
0       1545   2550       0.0          1        480              9    2.0   
1       2472  11500       2.0          2       2000              4    1.0   
2       2919   4500       1.0          1        916              2   51.0   
3       2790   4795       1.0          1        975              3    8.0   
4       3946  17500       2.0          2       4800              3    4.0   

   building_age_yrs  no_fee  has_roofdeck  has_washer_dryer  has_doorman  \
0                17       1             1                 0            0   
1                96       0             0                 0            0   
2                29       0             1                 0            1   
3                31       0             0                 0            1   
4               136       0             0                 0            1   

   has_elevator  has_dishwasher  has_patio  has_gym       neighborhood  \
0     

In [14]:
x = df[['bedrooms', 'bathrooms', 'size_sqft', 'min_to_subway', 'floor', 'building_age_yrs', 'no_fee', 'has_roofdeck', 'has_washer_dryer', 'has_doorman', 'has_elevator', 'has_dishwasher', 'has_patio', 'has_gym']]
y = df[['rent']]

## Training Set vs. Test Set
As with most machine learning algorithms, we have to split our dataset into:
<ul>
    <li><strong>Training set:</strong> the data used to fit the model</li>
    <li><strong>Test set:</strong> the data partitioned away at the very start of the experiment (to provide an unbiased evaluation of the model)</li>
</ul>

In general, putting 80% of your data in the training set and 20% of your data in the test set is a good place to start.

In [19]:
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, test_size=0.2, random_state=6)
print(x_train.shape)
print(x_test.shape)
 
print(y_train.shape)
print(y_test.shape)

(2831, 14)
(708, 14)
(2831, 1)
(708, 1)


Here are the parameters:
<ul>
    <li><strong>train_size:</strong> the proportion of the dataset to include in the train split (between 0.0 and 1.0)</li>
    <li><strong>test_size:</strong> the proportion of the dataset to include in the test split (between 0.0 and 1.0)</li>
    <li><strong>random_state:</strong> the seed used by the random number generator [optional]</li>
</ul>

### Import LinearRegression

In [21]:
from sklearn.linear_model import LinearRegression

### Create a LinearRegression model
fit it to your x_train and y_train data:

In [22]:
mlr = LinearRegression()
# finds the coefficients and the intercept value
mlr.fit(x_train, y_train) 

In [26]:
"""
Takes values calculated by `.fit()` and the `x` values, 
plugs them into the multiple linear regression equation, and calculates the predicted y values.
""" 
y_predicted = mlr.predict(x_test)

### Test

In [27]:
# Sonny doesn't have an elevator so the 11th item in the list is a 0
sonny_apartment = [[1, 1, 620, 16, 1, 98, 1, 0, 1, 0, 0, 1, 1, 0]]
 
predict = mlr.predict(sonny_apartment)
 
print("Predicted rent: $%.2f" % predict)

Predicted rent: $2393.58


