# Advice for applying machine learning



## Advice for applying machine learning

### Evaluating a model

It is a good practise to split the data into training and test sets.


In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import explained_variance_score
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


In [2]:
df_train = pd.read_csv('train_house_prices.csv')
df_train.head(5)

Unnamed: 0,POSTED_BY,UNDER_CONSTRUCTION,RERA,BHK_NO.,BHK_OR_RK,SQUARE_FT,READY_TO_MOVE,RESALE,ADDRESS,LONGITUDE,LATITUDE,TARGET(PRICE_IN_LACS)
0,Owner,0,0,2,BHK,1300.236407,1,1,"Ksfc Layout,Bangalore",12.96991,77.59796,55.0
1,Dealer,0,0,2,BHK,1275.0,1,1,"Vishweshwara Nagar,Mysore",12.274538,76.644605,51.0
2,Owner,0,0,2,BHK,933.159722,1,1,"Jigani,Bangalore",12.778033,77.632191,43.0
3,Owner,0,1,2,BHK,929.921143,1,1,"Sector-1 Vaishali,Ghaziabad",28.6423,77.3445,62.5
4,Dealer,1,0,2,BHK,999.009247,0,1,"New Town,Kolkata",22.5922,88.484911,60.5


In [3]:
df_train.shape, df_train.columns

((29451, 12),
 Index(['POSTED_BY', 'UNDER_CONSTRUCTION', 'RERA', 'BHK_NO.', 'BHK_OR_RK',
        'SQUARE_FT', 'READY_TO_MOVE', 'RESALE', 'ADDRESS', 'LONGITUDE',
        'LATITUDE', 'TARGET(PRICE_IN_LACS)'],
       dtype='object'))

Lets start by only using the variable `SQUARE_FT` to predict the price (`TARGET(PRICE_IN_LACS)`).



In [4]:
features = "SQUARE_FT"
target = "TARGET(PRICE_IN_LACS)"
X = df_train[features].values
y = df_train[target]

X.shape, type(X), y.shape, type(y)


((29451,), numpy.ndarray, (29451,), pandas.core.series.Series)

We know that the data are already split into training and test sets.
However, we can also use the `train_test_split` function to split the data into training and test sets.



- Step 0: Create multiple models and evaluate them
- Step 1: Split the data into training, cross-validation and test sets. (60-20-20)
- Step 2: Fit the models to the training set.
- Step 3: Evaluate the models on the cross-validation set.
- Step 4: Pick the lowest cross-validation set for a given model.
- Step 5: Evaluate this model on the test set.