## Problem Statement

![](https://datahack-prod.s3.ap-south-1.amazonaws.com/__sized__/contest_cover/jantahack_-thumbnail-1200x1200-90.jpg)

You are working with the government to transform your city into a smart city. The vision is to convert it into a digital and intelligent city to improve the efficiency of services for the citizens. One of the problems faced by the government is traffic. You are a data scientist working to manage the traffic of the city better and to provide input on infrastructure planning for the future.

The government wants to implement a robust traffic system for the city by being prepared for traffic peaks. They want to understand the traffic patterns of the four junctions of the city. Traffic patterns on holidays, as well as on various other occasions during the year, differ from normal working days. This is important to take into account for your forecasting. 

**Your task**

To predict traffic patterns in each of these four junctions for the next 4 months.

The sensors on each of these junctions were collecting data at different times, hence you will see traffic data from different time periods. To add to the complexity, some of the junctions have provided limited or sparse data requiring thoughtfulness when creating future projections. Depending upon the historical data of 20 months, the government is looking to you to deliver accurate traffic projections for the coming four months. Your algorithm will become the foundation of a larger transformation to make your city smart and intelligent. 

**Data Dictionary**


| **Variable**  | **Description** |
| --- | --- |
| ID | Unique ID |
| DateTime |  Hourly Datetime Variable |
| Junction | Junction Type |
| Vehicles | Number of Vehicles (Target) |

**Evaluation Metric**

The evaluation metric for this competition is Root Mean Squared Error (RMSE).

In [1]:
## Import necessary libraries

import numpy as np ## Numpy Library ( will use to convert data frame to array or creating array etc...).
import pandas as pd ## Pandas Library (will use to load data,create data frame...etc).
import os ## For connecting to machine to get path for reading/writing files.
from sklearn.model_selection import train_test_split ## For splitting data into train and validation.
from sklearn.preprocessing import LabelEncoder ## For label encoding(converting categorical values to label).
from sklearn.tree import DecisionTreeRegressor ## For Decision tree model.
from sklearn.metrics import mean_squared_error ## For MSE metric.
from math import sqrt ## For applying square root value.
from sklearn.ensemble import RandomForestRegressor ## For RandomForest model.
from sklearn.ensemble import AdaBoostRegressor ## For ada boost model.
from sklearn.ensemble import GradientBoostingRegressor ## For GBR model.
from sklearn.model_selection import GridSearchCV ##For Grid search(cross validation).
from xgboost.sklearn import XGBRegressor ## For XGBR model.
from sklearn.svm import SVR ## For SVR model.
from sklearn.neighbors import KNeighborsRegressor ## For KNN model.
from keras.models import Sequential ## For squential model.
from keras.layers import Dense ## For fully connnected layer.
from sklearn.linear_model import LinearRegression ## For Linear regression model.

In [2]:
## Get current working directoory.
os.getcwd()

'D:\\Python\\Pratice'

In [2]:
## Set working directory.
os.chdir('D:/DataScience/Pratice/IOT_OptimizationProblem/')
os.getcwd()

'D:\\DataScience\\Pratice\\IOT_OptimizationProblem'

In [9]:
## Read data sets.
train = pd.read_csv('train.csv',header='infer',sep=',')
test = pd.read_csv('test.csv',header='infer',sep=',')

In [10]:
## Get first record of train data.
train.head(1)

Unnamed: 0,DateTime,Junction,Vehicles,ID
0,01-11-2015 00:00,1,15,20151101001


In [11]:
## Get last record of train data.
train.tail(1)

Unnamed: 0,DateTime,Junction,Vehicles,ID
48119,30-06-2017 23:00,4,12,20170630234


In [12]:
## Get first record of test data.
test.head(1)

Unnamed: 0,DateTime,Junction,ID
0,01-07-2017 00:00,1,20170701001


In [13]:
## Get last record of test data.
test.tail(1)

Unnamed: 0,DateTime,Junction,ID
11807,31-10-2017 23:00,4,20171031234


In [70]:
## Get summary statistics of train data.
train.describe(include='all')

Unnamed: 0,DateTime,Junction,Vehicles,ID
count,48120,48120.0,48120.0,48120.0
unique,14592,,,
top,31-03-2017 13:00,,,
freq,4,,,
mean,,2.180549,22.791334,20163300000.0
std,,0.966955,20.750063,5944854.0
min,,1.0,1.0,20151100000.0
25%,,1.0,9.0,20160420000.0
50%,,2.0,15.0,20160930000.0
75%,,3.0,29.0,20170230000.0


In [71]:
## Get summary statistics of test data.
test.describe(include='all')

Unnamed: 0,DateTime,Junction,ID
count,11808,11808.0,11808.0
unique,2952,,
top,16-07-2017 00:00,,
freq,4,,
mean,,2.5,20170870000.0
std,,1.118081,112466.5
min,,1.0,20170700000.0
25%,,1.75,20170730000.0
50%,,2.5,20170830000.0
75%,,3.25,20171000000.0


In [72]:
## Get columns data types for train data.
train.dtypes

DateTime    object
Junction     int64
Vehicles     int64
ID           int64
dtype: object

In [73]:
## Get columns data types for test data.
test.dtypes

DateTime    object
Junction     int64
ID           int64
dtype: object

In [74]:
## Get column names for train data.
train.columns

Index(['DateTime', 'Junction', 'Vehicles', 'ID'], dtype='object')

In [75]:
## Get column names for test data.
test.columns

Index(['DateTime', 'Junction', 'ID'], dtype='object')

In [76]:
## Get index range for train data.
train.index

RangeIndex(start=0, stop=48120, step=1)

In [77]:
## Get index range for test data.
test.index

RangeIndex(start=0, stop=11808, step=1)

In [6]:
## Cheeck NA values for train data.
train.isna().sum()

DateTime    0
Junction    0
Vehicles    0
ID          0
dtype: int64

In [7]:
## Cheeck NA values for test data.
test.isna().sum()

DateTime    0
Junction    0
ID          0
dtype: int64

In [8]:
## This method will return number of levels,null values,unique values,data types.

def Observations(df):
    return(pd.DataFrame({'dtypes' : df.dtypes,
                         'levels' : [df[x].unique() for x in df.columns],
                         'null_values' : df.isnull().sum(),
                         'Unique Values': df.nunique()
                        }))

In [81]:
## Get columns data types,numer of levels,null values,unique values for each column of train data.
Observations(train)

Unnamed: 0,dtypes,levels,null_values,Unique Values
DateTime,object,"[01-11-2015 00:00, 01-11-2015 01:00, 01-11-201...",0,14592
Junction,int64,"[1, 2, 3, 4]",0,4
Vehicles,int64,"[15, 13, 10, 7, 9, 6, 8, 11, 12, 17, 16, 20, 1...",0,141
ID,int64,"[20151101001, 20151101011, 20151101021, 201511...",0,48120


In [82]:
## Get columns data types,numer of levels,null values,unique values for each column of test data.
Observations(test)

Unnamed: 0,dtypes,levels,null_values,Unique Values
DateTime,object,"[01-07-2017 00:00, 01-07-2017 01:00, 01-07-201...",0,2952
Junction,int64,"[1, 2, 3, 4]",0,4
ID,int64,"[20170701001, 20170701011, 20170701021, 201707...",0,11808


In [14]:
## Convert DateTime object type into datetime format for train data.
train['date_time'] = pd.to_datetime(train['DateTime'])

In [84]:
## Convert DateTime object type into datetime format for test data.
test['date_time'] = pd.to_datetime(test['DateTime'])

In [85]:
## Extract date and time from date_time for train data.
train['date'] = [d.date() for d in train['date_time']]
train['time'] = [d.time() for d in train['date_time']]

In [86]:
## Extract date and time from date_time for test data.
test['date'] = [d.date() for d in test['date_time']]
test['time'] = [d.time() for d in test['date_time']]

In [87]:
## Drop DateTime, date_time columns from train data beacuse we have extracted features from them
## so those columns are not required.
train.drop(['DateTime', 'date_time'], axis=1, inplace=True)

In [88]:
## Drop DateTime, date_time columns from test data beacuse we have extracted features from them
## so those columns are not required.
test.drop(['DateTime', 'date_time'], axis=1, inplace=True)

In [89]:
## Extract day,month,year features from date column of train data.
train['year'] = train['date'].apply(lambda x: x.year)
train['month'] = train['date'].apply(lambda x: x.month)
train['day'] = train['date'].apply(lambda x: x.day)

In [90]:
## Extract day,month,year features from date column of test data.
test['year'] = test['date'].apply(lambda x: x.year)
test['month'] = test['date'].apply(lambda x: x.month)
test['day'] = test['date'].apply(lambda x: x.day)

In [91]:
## Extract hour feature from time column of train data.
train['hour'] = train['time'].apply(lambda x: x.hour)

In [92]:
## Extract hour feature from time column of test data.
test['hour'] = test['time'].apply(lambda x: x.hour)

In [94]:
## Drop date,time column from train data(we have extracted features from those two columns so not required to keep).
train.drop(['date', 'time'], axis=1, inplace=True)

In [95]:
## Drop date,time column from test data(we have extracted features from those two columns so not required to keep).
test.drop(['date', 'time'], axis=1, inplace=True)

In [96]:
## Display first 2 records of train data.
train.head(2)

Unnamed: 0,Junction,Vehicles,ID,year,month,day,hour
0,1,15,20151101001,2015,1,11,0
1,1,13,20151101011,2015,1,11,1


In [97]:
## Display first 2 records of test data.
test.head(2)

Unnamed: 0,Junction,ID,year,month,day,hour
0,1,20170701001,2017,1,7,0
1,1,20170701011,2017,1,7,1


In [98]:
## Get columns data types,numer of levels,null values,unique values for each column of train data.
Observations(train)

Unnamed: 0,dtypes,levels,null_values,Unique Values
Junction,int64,"[1, 2, 3, 4]",0,4
Vehicles,int64,"[15, 13, 10, 7, 9, 6, 8, 11, 12, 17, 16, 20, 1...",0,141
ID,int64,"[20151101001, 20151101011, 20151101021, 201511...",0,48120
year,int64,"[2015, 2016, 2017]",0,3
month,int64,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]",0,12
day,int64,"[11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 2...",0,31
hour,int64,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...",0,24


In [99]:
## Get columns data types,numer of levels,null values,unique values for each column of test data.
Observations(test)

Unnamed: 0,dtypes,levels,null_values,Unique Values
Junction,int64,"[1, 2, 3, 4]",0,4
ID,int64,"[20170701001, 20170701011, 20170701021, 201707...",0,11808
year,int64,[2017],0,1
month,int64,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]",0,12
day,int64,"[7, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23...",0,23
hour,int64,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...",0,24


In [100]:
## Set ID column as index to train data.
train.set_index('ID',inplace=True)

In [101]:
## Set ID column as index to test data.
test.set_index('ID',inplace=True)

In [102]:
## Display first 2 records of train data after setting index.
train.head(2)

Unnamed: 0_level_0,Junction,Vehicles,year,month,day,hour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
20151101001,1,15,2015,1,11,0
20151101011,1,13,2015,1,11,1


In [103]:
## Display first 2 records of test data after setting index.
test.head(2)

Unnamed: 0_level_0,Junction,year,month,day,hour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
20170701001,1,2017,1,7,0
20170701011,1,2017,1,7,1


In [129]:
## Store features into train_data (for train data).
train_data = train.drop('Vehicles', axis=1)

In [130]:
## Store traget into y(for train data).
y = train['Vehicles']

In [131]:
## Split the data into train and validation.
X_train,X_test,y_train,y_test = train_test_split(train_data,y,test_size=0.2,random_state =1234)

In [132]:
## Instantiate label encoder.
le_junction = LabelEncoder()
le_year = LabelEncoder()
le_month = LabelEncoder()
le_day = LabelEncoder()
le_hour = LabelEncoder()

In [133]:
## Do labe encoding for trian data .
X_train['Junction'] = le_junction.fit_transform(X_train['Junction'])
X_train['year'] = le_year.fit_transform(X_train['year'])
X_train['month'] = le_month.fit_transform(X_train['month'])
X_train['day'] = le_day.fit_transform(X_train['day'])
X_train['hour'] = le_hour.fit_transform(X_train['hour'])

In [134]:
## Do labe encoding for validation data.
X_test['Junction'] = le_junction.transform(X_test['Junction'])
X_test['year'] = le_year.transform(X_test['year'])
X_test['month'] = le_month.transform(X_test['month'])
X_test['day'] = le_day.transform(X_test['day'])
X_test['hour'] = le_hour.transform(X_test['hour'])

In [None]:
## Do labe encoding for test data. 
test['Junction'] = le_junction.transform(test['Junction'])
test['year'] = le_year.transform(test['year'])
test['month'] = le_month.transform(test['month'])
test['day'] = le_day.transform(test['day'])
test['hour'] = le_hour.transform(test['hour'])

In [136]:
## Display first record of train data after doing label encoding.
X_train.head(1)

Unnamed: 0_level_0,Junction,year,month,day,hour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
20161012113,2,1,11,9,11


In [137]:
## Display first record of validation data after doing label encoding.
X_test.head(1)

Unnamed: 0_level_0,Junction,year,month,day,hour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
20170521101,0,2,4,20,10


In [138]:
## Display first record of test data after doing label encoding.
test.head(1)

Unnamed: 0_level_0,Junction,year,month,day,hour
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
20170701001,0,2,0,6,0


In [140]:
## Instantiate decision tree model and fit it.
dtr = DecisionTreeRegressor(max_depth=7,min_samples_leaf=10,min_samples_split=5,random_state=123)
dtr.fit(X_train,y_train)

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=7,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=10, min_samples_split=5,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=123, splitter='best')

In [141]:
## Get the predictions on train and validation data.
pred_train = dtr.predict(X_train)
pred_test = dtr.predict(X_test)

In [142]:
## Get predictions on test data.
test_pred = dtr.predict(test)

In [144]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 8.15194544307838
Test Error: 8.484839367035523


In [145]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [147]:
## Copy dataframe data into a CSV file.
dataframe.to_csv('DT_Predictions.csv',index=False)

In [202]:
## Instantiate random forest model.
rc = RandomForestRegressor(n_estimators= 25, max_depth= 10)## ,min_samples_leaf = 2)## ,max_features='sqrt')

In [203]:
## Fit a model.
rc.fit(X_train,y_train)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=10, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=25, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)

In [204]:
## Get the predictions on train and validation data.
pred_train = rc.predict(X_train)
pred_test = rc.predict(X_test)

In [205]:
## Display RMSE values for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 6.719699018781504
Test Error: 7.285963090446608


In [153]:
## Get predictions on test data.
test_pred = rc.predict(test)

In [156]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [157]:
## Copy dataframe data into a CSV file.
dataframe.to_csv('RFPredictions.csv',index=False)

In [159]:
## Instantiate adaboost model and fit it.
Adaboost_model = AdaBoostRegressor(n_estimators=50,learning_rate=1)
%time Adaboost_model.fit(X_train, y_train)

Wall time: 399 ms


AdaBoostRegressor(base_estimator=None, learning_rate=1, loss='linear',
                  n_estimators=50, random_state=None)

In [160]:
## Get the predictions on train and validation data,
pred_train = Adaboost_model.predict(X_train)
pred_test = Adaboost_model.predict(X_test)

In [161]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 11.722037502574286
Test Error: 11.949507960901505


In [162]:
## Get predictions on test data.
test_pred = Adaboost_model.predict(test)

In [163]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [164]:
## Copy dataframe data into a CSV file.
dataframe.to_csv('AdaBoost.csv',index=False)

In [166]:
## Instantiate GDR model and fit it.
gbm = GradientBoostingRegressor(n_estimators=50,learning_rate=0.8,random_state=474)
%time gbm.fit(X=X_train, y=y_train)

Wall time: 610 ms


GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=0.8, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=50,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=474, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)

In [167]:
## Get the predictions on train and validation data.
pred_train = gbm.predict(X_train)
pred_test = gbm.predict(X_test)

In [168]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 7.6221840992263346
Test Error: 7.898128303436905


In [169]:
## Get predictions on test data.
test_pred = gbm.predict(test)

In [170]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [171]:
## Copy dataframe into a CSV file.
dataframe.to_csv('GB.csv',index=False)

In [174]:
## Run Grid search to get best parrameters to build a XGB model.
xgb = XGBRegressor() ## Instantiate XGBClassifier model.

optimization_dict = {'max_depth': [2,3,4,5,6,7], ## Trying with different max_depth,n_estimators to find best model.
                      'n_estimators': [50,60,70,80,90,100,150,200]} 

## Build best model with Grid Search params.
model = GridSearchCV(xgb, ## XGB model
                     optimization_dict, ## Dictory with different max_depth,n_estimators.
                     verbose=1, ## For messaging purpose.
                     n_jobs=-1) ## Number of jobs to run in parallel. ''-1' means use all processors.

%time model.fit(X_train, y_train) ## Fit a model.
print(model.best_score_) ## Display best score calues.
print(model.best_params_) ## Display best parameters.

Fitting 5 folds for each of 48 candidates, totalling 240 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:   10.1s
[Parallel(n_jobs=-1)]: Done 176 tasks      | elapsed:   42.8s
[Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed:  1.1min finished


Wall time: 1min 9s
0.9258342993200774
{'max_depth': 7, 'n_estimators': 200}


In [206]:
## Build a XGB model with best params which we found after grid search CV (above code).
xgb_model = XGBRegressor(max_depth=10, n_estimators=200)
%time xgb_model.fit(X_train, y_train)

Wall time: 5.72 s


XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=10, min_child_weight=1, missing=None, n_estimators=200,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

In [207]:
## Get the predictions on train and validation data.
pred_train = xgb_model.predict(X_train)
pred_test = xgb_model.predict(X_test)

In [208]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 2.3982054773442028
Test Error: 4.283494263052786


In [209]:
## Get predictions onn test data.
test_pred = xgb_model.predict(test)

In [210]:
## Prepares a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [211]:
## Copy dataframe data into a CSV file.
dataframe.to_csv('XGB.csv',index=False)

In [213]:
## Instantiate SVR model.
svr_model = SVR()
svr_model

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [214]:
## Fit a model.
svr_model.fit(X = X_train, y = y_train)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [215]:
## Get the predictions on train and validation data.
pred_train = svr_model.predict(X_train)
pred_test = svr_model.predict(X_test)

In [216]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 14.704433150467267
Test Error: 14.53589057940764


In [217]:
## Get predictions on test data.
test_pred = svr_model.predict(test)

In [218]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [219]:
## Copy dataframe into a CSV file.
dataframe.to_csv('SVR.csv',index=False)

In [221]:
## Instantiate KNN model and fit it.
knn = KNeighborsRegressor(algorithm = 'brute', n_neighbors = 4,
                           metric = "euclidean")
knn.fit(X_train, y_train)

KNeighborsRegressor(algorithm='brute', leaf_size=30, metric='euclidean',
                    metric_params=None, n_jobs=None, n_neighbors=4, p=2,
                    weights='uniform')

In [222]:
## Get the predictions on train and validation data.
pred_train = knn.predict(X_train)
pred_test = knn.predict(X_test)

In [223]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 7.33015053073975
Test Error: 8.867127023804045


In [224]:
## Get predictions on test data.
test_pred = knn.predict(test)

In [225]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [226]:
## Copy dataframe dta into a CSV file.
dataframe.to_csv('KNN.csv',index=False)

In [228]:
## Instantiate sequential model.
model = Sequential()

## Add fully connnected layer to model.
model.add(Dense(1, input_dim=X_train.shape[1]))

## Add compiler to model.
model.compile(loss='mse', optimizer='rmsprop')

## Fit a model.
model.fit(X_train, y_train, epochs=150, batch_size=32)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<keras.callbacks.callbacks.History at 0x184a41fb3c8>

In [229]:
## Get the predictions on train and validation data.
pred_train = model.predict(X_train)
pred_test = model.predict(X_test)

In [230]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 13.631776134468168
Test Error: 13.595299040985216


In [234]:
## Instantiate Linear regression model and fit it.
linreg=LinearRegression()
linear_model=linreg.fit(X_train,y_train)

In [237]:
## Get the predictions on train and validation data.
pred_train = linear_model.predict(X_train)
pred_test = linear_model.predict(X_test)

In [238]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))

Train Error: 13.63167804798419
Test Error: 13.594990621818756


In [239]:
## Get predictions on test data.
test_pred = linear_model.predict(test)

In [240]:
## Prepare a dataframe with test index,test predictions data.
dataframe = pd.DataFrame({'ID' : test.index,
                          'Vehicles' : test_pred})

In [241]:
## Copy dataframe data into a SCV file.
dataframe.to_csv('Linear.csv',index=False)