# Street Easy
This project will fit a multiple linear regression model to the data of [StreetEasy](https://streeteasy.com) to try and predict the price of the rental. The data contains a sample of 5,000 rentals of `Manhattan`, `Brooklyn`, and `Queens` that are active on StreetEasy in June 2016.

The data has the next columns: 
- `rental_id`: rentaID
- `rent`: price of rent in dollars
- `bedrooms`: number of bedrooms
- `bathrooms`: number of bathrooms
- `size_sqft`: size in square feet
- `min_to_subway`: distance from subway station in minutes
- `floor`: floor number
- `building_age_yrs`: building’s age in years
- `no_fee`: does it have a broker fee? (0 for fee, 1 for no fee)
- `has_roofdeck`: does it have a roof deck? (0 for no, 1 for yes)
- `has_washer_dryer`: does it have washer/dryer in unit? (0/1)
- `has_doorman`: does it have a doorman? (0/1)
- `has_elevator`: does it have an elevator? (0/1)
- `has_dishwasher`: does it have a dishwasher (0/1)
- `has_patio`: does it have a patio? (0/1)
- `has_gym`: does the building have a gym? (0/1)
- `neighborhood`: (ex: Greenpoint)
- `borough`: (ex: Brooklyn)

## Import Python Modules
First, import the preliminary modules that will be used in this project:

In [1]:
import pandas as pd

## Load Data
Now, lets load the data of Manhattan into `rentals_manhattan` and see the information:

In [2]:
rentals_manhattan = pd.read_csv('https://raw.githubusercontent.com/Codecademy/datasets/master/streeteasy/manhattan.csv')
rentals_manhattan.head()

Unnamed: 0,rental_id,rent,bedrooms,bathrooms,size_sqft,min_to_subway,floor,building_age_yrs,no_fee,has_roofdeck,has_washer_dryer,has_doorman,has_elevator,has_dishwasher,has_patio,has_gym,neighborhood,borough
0,1545,2550,0.0,1,480,9,2.0,17,1,1,0,0,1,1,0,1,Upper East Side,Manhattan
1,2472,11500,2.0,2,2000,4,1.0,96,0,0,0,0,0,0,0,0,Greenwich Village,Manhattan
2,2919,4500,1.0,1,916,2,51.0,29,0,1,0,1,1,1,0,0,Midtown,Manhattan
3,2790,4795,1.0,1,975,3,8.0,31,0,0,0,1,1,1,0,1,Greenwich Village,Manhattan
4,3946,17500,2.0,2,4800,3,4.0,136,0,0,0,1,1,1,0,1,Soho,Manhattan


## Train and Test Set
Now, lets split the data into a train and test set to evaluate the model later on. A 80/20 ratio will be used.

In [6]:
# Function from sklearn to separate the data
from sklearn.model_selection import train_test_split

# Get the independent features to predict the rental price
features = rentals_manhattan[['bedrooms',
                            'bathrooms',
                            'size_sqft',
                            'min_to_subway',
                            'floor',
                            'building_age_yrs',
                            'no_fee',
                            'has_roofdeck',
                            'has_washer_dryer',
                            'has_doorman',
                            'has_elevator',
                            'has_dishwasher',
                            'has_patio',
                            'has_gym']]

rent_price = rentals_manhattan['rent']

# Get the different data
features_train, features_test, rent_price_train, rent_price_test = train_test_split(features, rent_price,\
                                                                                        train_size=0.8, test_size=0.2, random_state=6)
print('Shapes of the train and test data')
print(features_train.shape)
print(features_test.shape)
print(rent_price_train.shape)
print(rent_price_test.shape)

Shapes of the train and test data
(2831, 14)
(708, 14)
(2831,)
(708,)
