**Training and Testing**
<p>Once the data has been preprocessed, we have to build a model using the data. </p>
<p>After building the model, we test it using different data sets and measure it's performance (i.e. accuracy)</p>
<li>We will try a simple <b>feed forward network</b> with different parameters i.e. number of layers, epoch, activation function, and scaled and unscaled data</li>

In [None]:
# importing the libraries
import pandas as pd # for reading csv

df=pd.read_csv('drive/MyDrive/Colab Notebooks/Clean_Dataset.csv', na_values=['NA','?']) #ignore missing values
# Dropping column 'Unnamed: 0'
df=df.drop('Unnamed: 0',axis=1)
df.head(3)

Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956


In [None]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder() #convert txt data to numeric
for col in df.columns:
    if df[col].dtype=='object': # no need to convert duration, days_left, and price columns
        df[col]=encoder.fit_transform(df[col])
df.head(3)

Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,4,1408,2,2,2,5,5,1,2.17,1,5953
1,4,1387,2,1,2,4,5,1,2.33,1,5953
2,0,1213,2,1,2,1,5,1,2.17,1,5956


In [None]:
price = df['price'] # target
features = df.drop('price',axis=1) #data features

In [None]:
# from sklearn.preprocessing import StandardScaler 

# sc = StandardScaler()
# sc.fit(features)
# features= sc.transform(features)
# features[0:3]

# from scipy.stats import zscore
# for col in features.columns:
#     features[col] = zscore(features[col])
# features[0:3]

from sklearn.preprocessing import MinMaxScaler
mmscaler=MinMaxScaler(feature_range=(0,1))
features=pd.DataFrame(mmscaler.fit_transform(features))
features.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.8,0.902564,0.4,0.4,1.0,1.0,1.0,1.0,0.027347,0.0
1,0.8,0.889103,0.4,0.2,1.0,0.8,1.0,1.0,0.030612,0.0
2,0.0,0.777564,0.4,0.2,1.0,0.2,1.0,1.0,0.027347,0.0


In [None]:
from sklearn.model_selection import train_test_split
#splitting the data
xtrain,xtest,ytrain,ytest=train_test_split(features,price,test_size=0.30,random_state=42)
print("Sample in test set ",xtest.shape[0])
print("Sample in train set ",xtrain.shape[0])

Sample in test set  90046
Sample in train set  210107


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()
model.add(Dense(64, input_dim=xtrain.shape[1], activation='relu')) # Hidden 1
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(xtrain,ytrain,verbose=2,epochs=2000)

Epoch 1/2000
6566/6566 - 10s - loss: 779342336.0000 - 10s/epoch - 2ms/step
Epoch 2/2000
6566/6566 - 9s - loss: 478039968.0000 - 9s/epoch - 1ms/step
Epoch 3/2000
6566/6566 - 9s - loss: 376816320.0000 - 9s/epoch - 1ms/step
Epoch 4/2000
6566/6566 - 10s - loss: 276333632.0000 - 10s/epoch - 2ms/step
Epoch 5/2000
6566/6566 - 9s - loss: 180574384.0000 - 9s/epoch - 1ms/step
Epoch 6/2000
6566/6566 - 9s - loss: 113782088.0000 - 9s/epoch - 1ms/step
Epoch 7/2000
6566/6566 - 9s - loss: 77764864.0000 - 9s/epoch - 1ms/step
Epoch 8/2000
6566/6566 - 9s - loss: 58105484.0000 - 9s/epoch - 1ms/step
Epoch 9/2000
6566/6566 - 9s - loss: 49274796.0000 - 9s/epoch - 1ms/step
Epoch 10/2000
6566/6566 - 9s - loss: 44946900.0000 - 9s/epoch - 1ms/step
Epoch 11/2000
6566/6566 - 9s - loss: 42324648.0000 - 9s/epoch - 1ms/step
Epoch 12/2000
6566/6566 - 9s - loss: 40613800.0000 - 9s/epoch - 1ms/step
Epoch 13/2000
6566/6566 - 9s - loss: 39427100.0000 - 9s/epoch - 1ms/step
Epoch 14/2000
6566/6566 - 9s - loss: 38608032.0000

In [None]:
from sklearn import metrics
import numpy as np

pred = model.predict(xtest)
# print("Shape: {}".format(pred.shape))
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,ytest))
print(f"Final score (RMSE): {score}")