## EDA and Feature Engineering -- Flight Price Dataset

You can download the dataset from [[here](https://www.kaggle.com/datasets/shubhambathwal/flight-price-prediction)]

### Features:
1) <u>Airline</u>: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines. 
2) <u>Flight</u>: Flight stores information regarding the plane's flight code. It is a categorical feature. 
3) <u>Source City</u>: City from which the flight takes off. It is a categorical feature having 6 unique cities. 
4) <u>Departure Time</u>: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels. 
5) <u>Stops</u>: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities. 
6) <u>Arrival Time</u>: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time. 
7) <u>Destination City</u>: City where the flight will land. It is a categorical feature having 6 unique cities.
8) <u>Class</u>: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy. 
9) <u>Duration</u>: A continuous feature that displays the overall amount of time it takes to travel between cities in hours. 
10) <u>Days Left</u>: This is a derived characteristic that is calculated by subtracting the trip date by the booking date. 
11) <u>Price</u>: Target variable stores information of the ticket price.

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
df = pd.read_excel("flight_price.xlsx")
df.head()

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302


In [3]:
df.shape

(10683, 11)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
dtypes: int64(1), object(10)
memory usage: 918.2+ KB


In [5]:
df.describe()

Unnamed: 0,Price
count,10683.0
mean,9087.064121
std,4611.359167
min,1759.0
25%,5277.0
50%,8372.0
75%,12373.0
max,79512.0


In [6]:
# Check for duplicates
df[df.duplicated()]

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
683,Jet Airways,1/06/2019,Delhi,Cochin,DEL → NAG → BOM → COK,14:35,04:25 02 Jun,13h 50m,2 stops,No info,13376
1061,Air India,21/05/2019,Delhi,Cochin,DEL → GOI → BOM → COK,22:00,19:15 22 May,21h 15m,2 stops,No info,10231
1348,Air India,18/05/2019,Delhi,Cochin,DEL → HYD → BOM → COK,17:15,19:15 19 May,26h,2 stops,No info,12392
1418,Jet Airways,6/06/2019,Delhi,Cochin,DEL → JAI → BOM → COK,05:30,04:25 07 Jun,22h 55m,2 stops,In-flight meal not included,10368
1674,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,18:25,21:20,2h 55m,non-stop,No info,7303
...,...,...,...,...,...,...,...,...,...,...,...
10594,Jet Airways,27/06/2019,Delhi,Cochin,DEL → AMD → BOM → COK,23:05,12:35 28 Jun,13h 30m,2 stops,No info,12819
10616,Jet Airways,1/06/2019,Delhi,Cochin,DEL → JAI → BOM → COK,09:40,12:35 02 Jun,26h 55m,2 stops,No info,13014
10634,Jet Airways,6/06/2019,Delhi,Cochin,DEL → JAI → BOM → COK,09:40,12:35 07 Jun,26h 55m,2 stops,In-flight meal not included,11733
10672,Jet Airways,27/06/2019,Delhi,Cochin,DEL → AMD → BOM → COK,23:05,19:00 28 Jun,19h 55m,2 stops,In-flight meal not included,11150


In [7]:
# remove duplicates
df.drop_duplicates(inplace=True)
df.head()

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302


In [8]:
df.shape

(10463, 11)

In [9]:
df['Date_of_Journey'].str.split('/').str[0]


0        24
1         1
2         9
3        12
4        01
         ..
10678     9
10679    27
10680    27
10681    01
10682     9
Name: Date_of_Journey, Length: 10463, dtype: object

## Feature Engineering: 

In [10]:
# Separating the date,month and year 
df['Date'] = df['Date_of_Journey'].str.split('/').str[0]
df['Month'] = df['Date_of_Journey'].str.split('/').str[1]
df['Year'] = df['Date_of_Journey'].str.split('/').str[2]

df.head()

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,Date,Month,Year
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897,24,3,2019
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,2019
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882,9,6,2019
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218,12,5,2019
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302,1,3,2019


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10463 entries, 0 to 10682
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10463 non-null  object
 1   Date_of_Journey  10463 non-null  object
 2   Source           10463 non-null  object
 3   Destination      10463 non-null  object
 4   Route            10462 non-null  object
 5   Dep_Time         10463 non-null  object
 6   Arrival_Time     10463 non-null  object
 7   Duration         10463 non-null  object
 8   Total_Stops      10462 non-null  object
 9   Additional_Info  10463 non-null  object
 10  Price            10463 non-null  int64 
 11  Date             10463 non-null  object
 12  Month            10463 non-null  object
 13  Year             10463 non-null  object
dtypes: int64(1), object(13)
memory usage: 1.2+ MB


In [12]:
df['Date'] = df['Date'].astype(int)
df['Month'] = df['Month'].astype(int)
df['Year'] = df['Year'].astype(int)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10463 entries, 0 to 10682
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10463 non-null  object
 1   Date_of_Journey  10463 non-null  object
 2   Source           10463 non-null  object
 3   Destination      10463 non-null  object
 4   Route            10462 non-null  object
 5   Dep_Time         10463 non-null  object
 6   Arrival_Time     10463 non-null  object
 7   Duration         10463 non-null  object
 8   Total_Stops      10462 non-null  object
 9   Additional_Info  10463 non-null  object
 10  Price            10463 non-null  int64 
 11  Date             10463 non-null  int64 
 12  Month            10463 non-null  int64 
 13  Year             10463 non-null  int64 
dtypes: int64(4), object(10)
memory usage: 1.2+ MB


In [13]:
df.drop("Date_of_Journey",axis = 1,inplace=True)
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,Date,Month,Year
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897,24,3,2019
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,2019
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882,9,6,2019
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218,12,5,2019
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302,1,3,2019


In [14]:
df['Arrival_hour'] = df['Arrival_Time'].str.split(':').str[0]
df['Arrival_min'] = df['Arrival_Time'].str.split(':').str[1].str.split(" ").str[0]
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897,24,3,2019,1,10
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,2019,13,15
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882,9,6,2019,4,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218,12,5,2019,23,30
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302,1,3,2019,21,35


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10463 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10463 non-null  object
 1   Source           10463 non-null  object
 2   Destination      10463 non-null  object
 3   Route            10462 non-null  object
 4   Dep_Time         10463 non-null  object
 5   Arrival_Time     10463 non-null  object
 6   Duration         10463 non-null  object
 7   Total_Stops      10462 non-null  object
 8   Additional_Info  10463 non-null  object
 9   Price            10463 non-null  int64 
 10  Date             10463 non-null  int64 
 11  Month            10463 non-null  int64 
 12  Year             10463 non-null  int64 
 13  Arrival_hour     10463 non-null  object
 14  Arrival_min      10463 non-null  object
dtypes: int64(4), object(11)
memory usage: 1.3+ MB


In [16]:
df['Arrival_hour'] = df['Arrival_hour'].astype(int)
df['Arrival_min'] = df['Arrival_min'].astype(int)

In [17]:
df.drop("Arrival_Time",axis=1,inplace=True)
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Duration,Total_Stops,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,2h 50m,non-stop,No info,3897,24,3,2019,1,10
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,7h 25m,2 stops,No info,7662,1,5,2019,13,15
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,19h,2 stops,No info,13882,9,6,2019,4,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,18:05,5h 25m,1 stop,No info,6218,12,5,2019,23,30
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,16:50,4h 45m,1 stop,No info,13302,1,3,2019,21,35


In [18]:
df['Dep_Time'].str.split(":").str[0]

0        22
1        05
2        09
3        18
4        16
         ..
10678    19
10679    20
10680    08
10681    11
10682    10
Name: Dep_Time, Length: 10463, dtype: object

In [19]:
df['Dept_Hour'] = df['Dep_Time'].str.split(":").str[0]
df['Dept_Min'] = df['Dep_Time'].str.split(":").str[1]
df.isna().sum()

Airline            0
Source             0
Destination        0
Route              1
Dep_Time           0
Duration           0
Total_Stops        1
Additional_Info    0
Price              0
Date               0
Month              0
Year               0
Arrival_hour       0
Arrival_min        0
Dept_Hour          0
Dept_Min           0
dtype: int64

In [20]:
df['Dept_Hour'] = df['Dept_Hour'].astype(int)
df['Dept_Min'] = df['Dept_Min'].astype(int)
df.drop("Dep_Time",axis = 1,inplace=True)
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min,Dept_Hour,Dept_Min
0,IndiGo,Banglore,New Delhi,BLR → DEL,2h 50m,non-stop,No info,3897,24,3,2019,1,10,22,20
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,7h 25m,2 stops,No info,7662,1,5,2019,13,15,5,50
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,19h,2 stops,No info,13882,9,6,2019,4,25,9,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,5h 25m,1 stop,No info,6218,12,5,2019,23,30,18,5
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,4h 45m,1 stop,No info,13302,1,3,2019,21,35,16,50


In [22]:
df['Duration'].str.replace("h ",":")

0        2:50m
1        7:25m
2          19h
3        5:25m
4        4:45m
         ...  
10678    2:30m
10679    2:35m
10680       3h
10681    2:40m
10682    8:20m
Name: Duration, Length: 10463, dtype: object

In [23]:
df['Duration'] = df['Duration'].str.replace("h ",":")
df['Duration'] = df['Duration'].str.replace("h","")
df['Duration'] = df['Duration'].str.replace('m','')
df['Duration']

0        2:50
1        7:25
2          19
3        5:25
4        4:45
         ... 
10678    2:30
10679    2:35
10680       3
10681    2:40
10682    8:20
Name: Duration, Length: 10463, dtype: object

In [30]:
import numpy as np

### Converting the duration to minutes:

In [33]:
df['Duration'].str.split(":").str[0].astype(int)*60 

0         120
1         420
2        1140
3         300
4         240
         ... 
10678     120
10679     120
10680     180
10681     120
10682     480
Name: Duration, Length: 10463, dtype: int64

In [37]:
df['Duration'].str.split(":").str[1]

0         50
1         25
2        NaN
3         25
4         45
        ... 
10678     30
10679     35
10680    NaN
10681     40
10682     20
Name: Duration, Length: 10463, dtype: object

In [36]:
h = df['Duration'].str.split(":").str[0].astype(int)*60
m = df['Duration'].str.split(":").str[1].replace(np.nan,0).astype(int)
print(h,"\n",m)

0         120
1         420
2        1140
3         300
4         240
         ... 
10678     120
10679     120
10680     180
10681     120
10682     480
Name: Duration, Length: 10463, dtype: int64 
 0        50
1        25
2         0
3        25
4        45
         ..
10678    30
10679    35
10680     0
10681    40
10682    20
Name: Duration, Length: 10463, dtype: int64


In [39]:
df['Duration(in mins)'] = h+m
df.head(3)

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min,Dept_Hour,Dept_Min,Duration(in mins)
0,IndiGo,Banglore,New Delhi,BLR → DEL,2:50,non-stop,No info,3897,24,3,2019,1,10,22,20,170
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,7:25,2 stops,No info,7662,1,5,2019,13,15,5,50,445
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,19,2 stops,No info,13882,9,6,2019,4,25,9,25,1140


In [40]:
df.drop("Duration",axis = 1,inplace=True)
df.head(2)

Unnamed: 0,Airline,Source,Destination,Route,Total_Stops,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min,Dept_Hour,Dept_Min,Duration(in mins)
0,IndiGo,Banglore,New Delhi,BLR → DEL,non-stop,No info,3897,24,3,2019,1,10,22,20,170
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,2 stops,No info,7662,1,5,2019,13,15,5,50,445


In [41]:
df['Total_Stops'].unique()

array(['non-stop', '2 stops', '1 stop', '3 stops', nan, '4 stops'],
      dtype=object)

In [43]:
df['Total_Stops'].mode()

0    1 stop
Name: Total_Stops, dtype: object

In [45]:
df['Total_stops'] = df['Total_Stops'].map(
    {
        'non-stop':0,
        '1 stop':1,
        '2 stops' :2,
        '3 stops' : 3,
        '4 stops' : 4,
        np.nan : 1
    }
)

df.drop("Total_Stops",axis = 1, inplace=True)
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min,Dept_Hour,Dept_Min,Duration(in mins),Total_stops
0,IndiGo,Banglore,New Delhi,BLR → DEL,No info,3897,24,3,2019,1,10,22,20,170,0
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,No info,7662,1,5,2019,13,15,5,50,445,2
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,No info,13882,9,6,2019,4,25,9,25,1140,2
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,No info,6218,12,5,2019,23,30,18,5,325,1
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,No info,13302,1,3,2019,21,35,16,50,285,1


In [47]:
df.drop("Route",axis = 1,inplace=True)
df.head()

Unnamed: 0,Airline,Source,Destination,Additional_Info,Price,Date,Month,Year,Arrival_hour,Arrival_min,Dept_Hour,Dept_Min,Duration(in mins),Total_stops
0,IndiGo,Banglore,New Delhi,No info,3897,24,3,2019,1,10,22,20,170,0
1,Air India,Kolkata,Banglore,No info,7662,1,5,2019,13,15,5,50,445,2
2,Jet Airways,Delhi,Cochin,No info,13882,9,6,2019,4,25,9,25,1140,2
3,IndiGo,Kolkata,Banglore,No info,6218,12,5,2019,23,30,18,5,325,1
4,IndiGo,Banglore,New Delhi,No info,13302,1,3,2019,21,35,16,50,285,1


In [48]:
df.Source.unique()

array(['Banglore', 'Kolkata', 'Delhi', 'Chennai', 'Mumbai'], dtype=object)

In [49]:
df.Destination.unique()

array(['New Delhi', 'Banglore', 'Cochin', 'Kolkata', 'Delhi', 'Hyderabad'],
      dtype=object)

In [50]:
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder()

In [53]:
encoded = ohe.fit_transform(df[['Airline','Source','Destination','Additional_Info']]).toarray()
encoded

array([[0., 0., 0., ..., 0., 1., 0.],
       [0., 1., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 1., 0., ..., 0., 1., 0.]], shape=(10463, 33))

In [55]:
df_encoded = pd.DataFrame(encoded,columns=ohe.get_feature_names_out())
df_encoded.head()

Unnamed: 0,Airline_Air Asia,Airline_Air India,Airline_GoAir,Airline_IndiGo,Airline_Jet Airways,Airline_Jet Airways Business,Airline_Multiple carriers,Airline_Multiple carriers Premium economy,Airline_SpiceJet,Airline_Trujet,...,Additional_Info_1 Long layover,Additional_Info_1 Short layover,Additional_Info_2 Long layover,Additional_Info_Business class,Additional_Info_Change airports,Additional_Info_In-flight meal not included,Additional_Info_No Info,Additional_Info_No check-in baggage included,Additional_Info_No info,Additional_Info_Red-eye flight
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [56]:
df1 = df.drop(['Airline','Source','Destination','Additional_Info'],axis = 1)
df_final = pd.concat([df_encoded,df1],axis = 1)
df_final.head()

Unnamed: 0,Airline_Air Asia,Airline_Air India,Airline_GoAir,Airline_IndiGo,Airline_Jet Airways,Airline_Jet Airways Business,Airline_Multiple carriers,Airline_Multiple carriers Premium economy,Airline_SpiceJet,Airline_Trujet,...,Price,Date,Month,Year,Arrival_hour,Arrival_min,Dept_Hour,Dept_Min,Duration(in mins),Total_stops
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,3897.0,24.0,3.0,2019.0,1.0,10.0,22.0,20.0,170.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,7662.0,1.0,5.0,2019.0,13.0,15.0,5.0,50.0,445.0,2.0
2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,13882.0,9.0,6.0,2019.0,4.0,25.0,9.0,25.0,1140.0,2.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,6218.0,12.0,5.0,2019.0,23.0,30.0,18.0,5.0,325.0,1.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,13302.0,1.0,3.0,2019.0,21.0,35.0,16.0,50.0,285.0,1.0


In [58]:
df_final.columns

Index(['Airline_Air Asia', 'Airline_Air India', 'Airline_GoAir',
       'Airline_IndiGo', 'Airline_Jet Airways', 'Airline_Jet Airways Business',
       'Airline_Multiple carriers',
       'Airline_Multiple carriers Premium economy', 'Airline_SpiceJet',
       'Airline_Trujet', 'Airline_Vistara', 'Airline_Vistara Premium economy',
       'Source_Banglore', 'Source_Chennai', 'Source_Delhi', 'Source_Kolkata',
       'Source_Mumbai', 'Destination_Banglore', 'Destination_Cochin',
       'Destination_Delhi', 'Destination_Hyderabad', 'Destination_Kolkata',
       'Destination_New Delhi', 'Additional_Info_1 Long layover',
       'Additional_Info_1 Short layover', 'Additional_Info_2 Long layover',
       'Additional_Info_Business class', 'Additional_Info_Change airports',
       'Additional_Info_In-flight meal not included',
       'Additional_Info_No Info',
       'Additional_Info_No check-in baggage included',
       'Additional_Info_No info', 'Additional_Info_Red-eye flight', 'Price',
      

In [60]:
df_final.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10671 entries, 0 to 10682
Data columns (total 43 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   Airline_Air Asia                              10463 non-null  float64
 1   Airline_Air India                             10463 non-null  float64
 2   Airline_GoAir                                 10463 non-null  float64
 3   Airline_IndiGo                                10463 non-null  float64
 4   Airline_Jet Airways                           10463 non-null  float64
 5   Airline_Jet Airways Business                  10463 non-null  float64
 6   Airline_Multiple carriers                     10463 non-null  float64
 7   Airline_Multiple carriers Premium economy     10463 non-null  float64
 8   Airline_SpiceJet                              10463 non-null  float64
 9   Airline_Trujet                                10463 non-null  floa

This dataset is now ready to be given to ML Model for training purpose