Ticket price Prediction

https://www.machinehack.com/course/predict-the-flight-ticket-price-hackathon/

Flight ticket prices can be something hard to guess, today we might see a price, check out the price of the same flight tomorrow, it will be a different story. We might have often heard travellers saying that flight ticket prices are so unpredictable. Huh! Here we take on the challenge! As data scientists, we are gonna prove that given the right data anything can be predicted. Here you will be provided with prices of flight tickets for various airlines between the months of March and June of 2019 and between various cities.

Size of training set: 10683 records

Size of test set: 2671 records

FEATURES:
Airline: The name of the airline.

Date_of_Journey: The date of the journey

Source: The source from which the service begins.

Destination: The destination where the service ends.

Route: The route taken by the flight to reach the destination.

Dep_Time: The time when the journey starts from the source.

Arrival_Time: Time of arrival at the destination.

Duration: Total duration of the flight.

Total_Stops: Total stops between the source and destination.

Additional_Info: Additional information about the flight

Price: The price of the ticket


In [1]:
#Importing Libraries

#data analysis and wrangling
import pandas as pd
import numpy as np
import random as rnd
import math

# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.offline as py

# machine learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error, roc_auc_score
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn import metrics
import time
import datetime


In [2]:
train_data=pd.read_excel('data_train.xlsx')
test_data=pd.read_excel('data_test.xlsx')

In [3]:
train_data.head()

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302


In [4]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
dtypes: int64(1), object(10)
memory usage: 918.2+ KB


In [5]:
test_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2671 entries, 0 to 2670
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          2671 non-null   object
 1   Date_of_Journey  2671 non-null   object
 2   Source           2671 non-null   object
 3   Destination      2671 non-null   object
 4   Route            2671 non-null   object
 5   Dep_Time         2671 non-null   object
 6   Arrival_Time     2671 non-null   object
 7   Duration         2671 non-null   object
 8   Total_Stops      2671 non-null   object
 9   Additional_Info  2671 non-null   object
dtypes: object(10)
memory usage: 208.8+ KB


In [6]:
statistics_of_data = []
for col in train_data.columns:
  statistics_of_data.append((col,
                             train_data[col].nunique(),
                             train_data[col].isnull().sum()*100/train_data.shape[0],
                             train_data[col].value_counts(normalize=True, dropna=False).values[0] * 100, 
                             train_data[col].dtype
                             ))
stats_df = pd.DataFrame(statistics_of_data, columns=['Feature', 'Uniq_val', 'missing_val', 'val_biggest_cat', 'type'])

In [7]:
stats_df.sort_values('missing_val', ascending=False)

Unnamed: 0,Feature,Uniq_val,missing_val,val_biggest_cat,type
4,Route,128,0.009361,22.240944,object
8,Total_Stops,5,0.009361,52.653749,object
0,Airline,12,0.0,36.029205,object
1,Date_of_Journey,44,0.0,4.717776,object
2,Source,5,0.0,42.469344,object
3,Destination,6,0.0,42.469344,object
5,Dep_Time,222,0.0,2.181035,object
6,Arrival_Time,1343,0.0,3.959562,object
7,Duration,368,0.0,5.148367,object
9,Additional_Info,10,0.0,78.114762,object


There are missing values in 2 columns, Additional_Info has the biggest same values (78%)

Now we need to do Exploratory data analysis to find out which feature is more important for target (i.e. Price) variable.

In [8]:
def exploreFeatures(col):
  top_n=10
  top_n = top_n if train_data[col].nunique() > top_n else train_data[col].nunique()
  print(f"{col} has {train_data[col].nunique()} unique values, 5 most occured vaues and their type: {train_data[col].dtype}.")
  print(train_data[col].value_counts(normalize=True, dropna=False).head())

In [9]:
exploreFeatures('Airline')

Airline has 12 unique values, 5 most occured vaues and their type: object.
Jet Airways          0.360292
IndiGo               0.192174
Air India            0.163999
Multiple carriers    0.111954
SpiceJet             0.076570
Name: Airline, dtype: float64


In [10]:
exploreFeatures('Source')

Source has 5 unique values, 5 most occured vaues and their type: object.
Delhi       0.424693
Kolkata     0.268745
Banglore    0.205654
Mumbai      0.065244
Chennai     0.035664
Name: Source, dtype: float64


In [11]:
exploreFeatures('Destination')

Destination has 6 unique values, 5 most occured vaues and their type: object.
Cochin       0.424693
Banglore     0.268745
Delhi        0.118412
New Delhi    0.087241
Hyderabad    0.065244
Name: Destination, dtype: float64


In [12]:
exploreFeatures('Dep_Time')

Dep_Time has 222 unique values, 5 most occured vaues and their type: object.
18:55    0.021810
17:00    0.021249
07:05    0.019189
10:00    0.019002
07:10    0.018909
Name: Dep_Time, dtype: float64


In [13]:
exploreFeatures('Arrival_Time')

Arrival_Time has 1343 unique values, 5 most occured vaues and their type: object.
19:00    0.039596
21:00    0.033698
19:15    0.031171
16:10    0.014415
12:35    0.011420
Name: Arrival_Time, dtype: float64


In [14]:
exploreFeatures('Duration')

Duration has 368 unique values, 5 most occured vaues and their type: object.
2h 50m    0.051484
1h 30m    0.036132
2h 45m    0.031545
2h 55m    0.031545
2h 35m    0.030797
Name: Duration, dtype: float64


In [15]:
exploreFeatures('Total_Stops')

Total_Stops has 5 unique values, 5 most occured vaues and their type: object.
1 stop      0.526537
non-stop    0.326781
2 stops     0.142282
3 stops     0.004212
4 stops     0.000094
Name: Total_Stops, dtype: float64


In [16]:
exploreFeatures('Additional_Info')

Additional_Info has 10 unique values, 5 most occured vaues and their type: object.
No info                         0.781148
In-flight meal not included     0.185528
No check-in baggage included    0.029954
1 Long layover                  0.001779
Change airports                 0.000655
Name: Additional_Info, dtype: float64


In [17]:
exploreFeatures('Price')

Price has 1870 unique values, 5 most occured vaues and their type: int64.
10262    0.024151
10844    0.019845
7229     0.015164
4804     0.014977
4823     0.012262
Name: Price, dtype: float64


Observations:

* 36% airlines covered with Jet Airways
* 42% flights are form Dehli
* Coachin have highest count in detination of flights i.e. 42%. Banglore is at 2nd with 26%
* 52% airlines give connected flights, while 32% are direct.


Actions:
* Cater missing values in both sets.
* delete unwanted columns.
* split wanted columns like datestamp.



In [18]:
train_data.isnull().sum()
test_data.isnull().sum()

Airline            0
Date_of_Journey    0
Source             0
Destination        0
Route              0
Dep_Time           0
Arrival_Time       0
Duration           0
Total_Stops        0
Additional_Info    0
dtype: int64

In [19]:
#Catering missing Values

train_data[train_data['Total_Stops'].isnull()]


Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
9039,Air India,6/05/2019,Delhi,Cochin,,09:45,09:25 07 May,23h 40m,,No info,7480


In [20]:
train_data = train_data.dropna()
train_data = train_data.reset_index(drop=True)
#As there are two NaN values, better to drop them.

In [21]:
train_data.head()

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302


In [22]:
combine = [train_data,test_data]
print("Before", train_data.shape, test_data.shape,combine[0].shape, combine[1].shape)

Before (10682, 11) (2671, 10) (10682, 11) (2671, 10)


In [23]:
#Mapping of Total_Stops
titlemapping = {'non-stop':0, '1 stop':1,'2 stop':2, '3 stop':3,'4 stop':4}
for row in combine:
    row["Total_Stops"] = row["Total_Stops"].map(titlemapping)
    row['Total_Stops'] = row['Total_Stops'].fillna(0)
    row['Total_Stops'] = row['Total_Stops'].astype(int)
    

In [24]:
train_data['Route_1']=''
train_data['Route_2']=''
train_data['Route_3']=''
train_data['Route_4']=''
train_data['Route_5']=''
test_data['Route_1']=''
test_data['Route_2']=''
test_data['Route_3']=''
test_data['Route_4']=''
test_data['Route_5']=''

for row in combine:
    row['Route_1']=row['Route'].str.split('→ ').str[0]
    row['Route_2']=row['Route'].str.split('→ ').str[1]
    row['Route_3']=row['Route'].str.split('→ ').str[2]
    row['Route_4']=row['Route'].str.split('→ ').str[3]
    row['Route_5']=row['Route'].str.split('→ ').str[4]
    
    

In [25]:
for row in combine:
    row['Route_1'].fillna("None",inplace=True)
    row['Route_2'].fillna("None",inplace=True)
    row['Route_3'].fillna("None",inplace=True)
    row['Route_4'].fillna("None",inplace=True)
    row['Route_5'].fillna("None",inplace=True)

In [26]:
#combine['Date']=big_df['Date'].astype(int)
for row in combine:
    row['Date']=row['Date_of_Journey'].str.split('/').str[0]
    row['Month']=row['Date_of_Journey'].str.split('/').str[1]
    row['Year']=row['Date_of_Journey'].str.split('/').str[2]
    row['Date'] = row['Date'].astype(int)
    row['Month'] = row['Month'].astype(int)
    row['Year'] = row['Year'].astype(int)
    
    

In [27]:
for row in combine:
    row ['dep_Hour'] = row['Dep_Time'].str.split(':').str[0]
    row ['dep_Min'] = row['Dep_Time'].str.split(':').str[1]

In [28]:
train_data['Arrival_Time']=train_data['Arrival_Time'].str.split(' ').str[0]
test_data['Arrival_Time']=test_data['Arrival_Time'].str.split(' ').str[0]

In [29]:
for row in  combine:
    row['arr_Hour'] = row['Arrival_Time'].str.split(':').str[0]
    row['arr_Min'] = row['Arrival_Time'].str.split(':').str[1]

In [30]:
train_data[['Duration','Price']].groupby(['Duration'],as_index = False).mean().sort_values(by = 'Price',ascending = False)

Unnamed: 0,Duration,Price
42,13h 35m,22294.000000
272,37h 10m,21314.000000
294,47h 40m,20694.000000
293,47h,20064.000000
268,35h 35m,19907.000000
...,...,...
296,4h 10m,4226.000000
118,1h 15m,3944.333333
121,1h 30m,3721.484456
119,1h 20m,3286.377049


In [31]:
#We couldnt find any coorelation between Duration price, Lets delete this column too
combine = [train_data,test_data]
print("Before", train_data.shape, test_data.shape,combine[0].shape, combine[1].shape)
train_data = train_data.drop(['Duration','Dep_Time','Arrival_Time','Route','Date_of_Journey'], axis=1)
test_data = test_data.drop(['Duration','Dep_Time','Arrival_Time','Route','Date_of_Journey'], axis=1)
combine = [train_data, test_data]

print("After", train_data.shape, test_data.shape, combine[0].shape, combine[1].shape)

Before (10682, 23) (2671, 22) (10682, 23) (2671, 22)
After (10682, 18) (2671, 17) (10682, 18) (2671, 17)


In [32]:
train_data.head()

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Price,Route_1,Route_2,Route_3,Route_4,Route_5,Date,Month,Year,dep_Hour,dep_Min,arr_Hour,arr_Min
0,IndiGo,Banglore,New Delhi,0,No info,3897,BLR,DEL,,,,24,3,2019,22,20,1,10
1,Air India,Kolkata,Banglore,0,No info,7662,CCU,IXR,BBI,BLR,,1,5,2019,5,50,13,15
2,Jet Airways,Delhi,Cochin,0,No info,13882,DEL,LKO,BOM,COK,,9,6,2019,9,25,4,25
3,IndiGo,Kolkata,Banglore,1,No info,6218,CCU,NAG,BLR,,,12,5,2019,18,5,23,30
4,IndiGo,Banglore,New Delhi,1,No info,13302,BLR,NAG,DEL,,,1,3,2019,16,50,21,35


In [33]:
#So We will Use Label Encoder for Encoding Technique as we have text in our columns.
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()

In [34]:
train_data["Airline"]=encoder.fit_transform(train_data['Airline'])
train_data["Source"]=encoder.fit_transform(train_data['Source'])
train_data["Destination"]=encoder.fit_transform(train_data['Destination'])
train_data["Additional_Info"]=encoder.fit_transform(train_data['Additional_Info'])
train_data["Route_1"]=encoder.fit_transform(train_data['Route_1'])
train_data["Route_2"]=encoder.fit_transform(train_data['Route_2'])
train_data["Route_3"]=encoder.fit_transform(train_data['Route_3'])
train_data["Route_4"]=encoder.fit_transform(train_data['Route_4'])
train_data["Route_5"]=encoder.fit_transform(train_data['Route_5'])

test_data["Airline"]=encoder.fit_transform(test_data['Airline'])
test_data["Source"]=encoder.fit_transform(test_data['Source'])
test_data["Destination"]=encoder.fit_transform(test_data['Destination'])
test_data["Additional_Info"]=encoder.fit_transform(test_data['Additional_Info'])
test_data["Route_1"]=encoder.fit_transform(test_data['Route_1'])
test_data["Route_2"]=encoder.fit_transform(test_data['Route_2'])
test_data["Route_3"]=encoder.fit_transform(test_data['Route_3'])
test_data["Route_4"]=encoder.fit_transform(test_data['Route_4'])
test_data["Route_5"]=encoder.fit_transform(test_data['Route_5'])

In [35]:
train_data.head()

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Price,Route_1,Route_2,Route_3,Route_4,Route_5,Date,Month,Year,dep_Hour,dep_Min,arr_Hour,arr_Min
0,3,0,5,0,8,3897,0,13,24,12,4,24,3,2019,22,20,1,10
1,1,3,0,0,8,7662,2,25,1,3,4,1,5,2019,5,50,13,15
2,4,2,1,0,8,13882,3,32,4,5,4,9,6,2019,9,25,4,25
3,3,3,0,1,8,6218,2,34,3,12,4,12,5,2019,18,5,23,30
4,3,0,5,1,8,13302,0,34,8,12,4,1,3,2019,16,50,21,35


Look how well it looked like, all numeric values.... ahhh ,  just fantastic.

### Feature Selection

In [36]:
from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel

In [37]:
df_train=train_data
df_test = test_data

In [38]:
X_Train =df_train.drop(['Price'],axis=1)
Y_Train =df_train.Price
X_Test  = test_data

In [39]:
model=SelectFromModel(Lasso(alpha=0.005,random_state=0))

In [40]:
model.fit(X_Train,Y_Train)

SelectFromModel(estimator=Lasso(alpha=0.005, copy_X=True, fit_intercept=True,
                                max_iter=1000, normalize=False, positive=False,
                                precompute=False, random_state=0,
                                selection='cyclic', tol=0.0001,
                                warm_start=False),
                max_features=None, norm_order=1, prefit=False, threshold=None)

In [41]:
model.get_support()

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True,  True,  True,  True])

In [43]:
selected_features=X_Train.columns[(model.get_support())]

In [44]:
selected_features

Index(['Airline', 'Source', 'Destination', 'Total_Stops', 'Additional_Info',
       'Route_1', 'Route_2', 'Route_3', 'Route_4', 'Route_5', 'Date', 'Month',
       'dep_Hour', 'dep_Min', 'arr_Hour', 'arr_Min'],
      dtype='object')

In [45]:
X_Train=X_Train.drop(['Year'],axis=1)

In [46]:
X_Test=X_Test.drop(['Year'],axis=1)

## RandomForestRegressor

In [47]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor
#Randomized Search CV

In [48]:
rf = RandomForestRegressor(random_state = 42)
from pprint import pprint
# Look at parameters used by our current forest
print('Parameters currently in use:\n')
pprint(rf.get_params())

Parameters currently in use:

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'criterion': 'mse',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 42,
 'verbose': 0,
 'warm_start': False}


In [49]:
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4]
# Method of selecting samples for training each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}
pprint(random_grid)
{'bootstrap': [True, False],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}

{'bootstrap': [True, False],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}


{'bootstrap': [True, False],
 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}

In [50]:
# Use the random grid to search for best hyperparameters
# First create the base model to tune
rf = RandomForestRegressor()
# Random search of parameters, using 3 fold cross validation, 
# search across 100 different combinations, and use all available cores
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the random search model
#l#rf_random.fit(X_train, y_train)

In [51]:
rf_random.fit(X_Train, Y_Train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  8.9min
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed: 17.8min finished


RandomizedSearchCV(cv=3, error_score=nan,
                   estimator=RandomForestRegressor(bootstrap=True,
                                                   ccp_alpha=0.0,
                                                   criterion='mse',
                                                   max_depth=None,
                                                   max_features='auto',
                                                   max_leaf_nodes=None,
                                                   max_samples=None,
                                                   min_impurity_decrease=0.0,
                                                   min_impurity_split=None,
                                                   min_samples_leaf=1,
                                                   min_samples_split=2,
                                                   min_weight_fraction_leaf=0.0,
                                                   n_estimators=100,
                              

In [52]:
Y_pred = rf_random.predict(X_Test)


In [53]:
acc_rf = round(rf_random.score(X_Train, Y_Train) * 100, 2)
acc_rf

98.33

So we got 98.33% acuuracy for this dataset, We got better accuracy due to proper training through correct hyperparameter tuning. So guyz this is very important to acknowldge that hyperparameter tuning is the key for good resulsts in DS 

In [73]:
chk =X_Train[2:3]

In [79]:
chk.head()

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Route_1,Route_2,Route_3,Route_4,Route_5,Date,Month,dep_Hour,dep_Min,arr_Hour,arr_Min
2,4,2,1,0,8,3,32,4,5,4,9,6,9,25,4,25


In [80]:
train_data.head()

Unnamed: 0,Airline,Source,Destination,Total_Stops,Additional_Info,Price,Route_1,Route_2,Route_3,Route_4,Route_5,Date,Month,Year,dep_Hour,dep_Min,arr_Hour,arr_Min
0,3,0,5,0,8,3897,0,13,24,12,4,24,3,2019,22,20,1,10
1,1,3,0,0,8,7662,2,25,1,3,4,1,5,2019,5,50,13,15
2,4,2,1,0,8,13882,3,32,4,5,4,9,6,2019,9,25,4,25
3,3,3,0,1,8,6218,2,34,3,12,4,12,5,2019,18,5,23,30
4,3,0,5,1,8,13302,0,34,8,12,4,1,3,2019,16,50,21,35


According to the record the data row which I have taken to predict, we have price : 13882, Now lets see what our model give price.  

In [84]:
Y_pred = rf_random.predict(chk)

In [85]:
Y_pred

array([13880.53714286])

Oh thats amazing, we got 13880.5.... Guyz practice this and stay safe.. 