<a href="https://colab.research.google.com/github/chipkarsaish/Flight-Price-Prediction-EDA/blob/main/FlightPricePrediciton.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EDA And Feature Engineering Flight Price Prediction

Dataset contains information about flight booking options from the website Easemytrip for flight travel between India's top 6 metro cities. There are 300261 datapoints and 11 features in the cleaned dataset.

### FEATURES
The various features of the cleaned dataset are explained below:
1) Airline: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
2) Flight: Flight stores information regarding the plane's flight code. It is a categorical feature.
3) Source City: City from which the flight takes off. It is a categorical feature having 6 unique cities.
4) Departure Time: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
5) Stops: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
6) Arrival Time: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
7) Destination City: City where the flight will land. It is a categorical feature having 6 unique cities.
8) Class: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
9) Duration: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
10) Days Left: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.
11) Price: Target variable stores information of the ticket price.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:

df = pd.read_excel('FlightPrice.xlsx')

In [None]:
print(df.head())

       Airline Date_of_Journey    Source Destination                  Route  \
0       IndiGo      24/03/2019  Banglore   New Delhi              BLR → DEL   
1    Air India       1/05/2019   Kolkata    Banglore  CCU → IXR → BBI → BLR   
2  Jet Airways       9/06/2019     Delhi      Cochin  DEL → LKO → BOM → COK   
3       IndiGo      12/05/2019   Kolkata    Banglore        CCU → NAG → BLR   
4       IndiGo      01/03/2019  Banglore   New Delhi        BLR → NAG → DEL   

  Dep_Time  Arrival_Time Duration Total_Stops Additional_Info  Price  
0    22:20  01:10 22 Mar   2h 50m    non-stop         No info   3897  
1    05:50         13:15   7h 25m     2 stops         No info   7662  
2    09:25  04:25 10 Jun      19h     2 stops         No info  13882  
3    18:05         23:30   5h 25m      1 stop         No info   6218  
4    16:50         21:35   4h 45m      1 stop         No info  13302  


In [None]:
df.columns

Index(['Airline', 'Date_of_Journey', 'Source', 'Destination', 'Route',
       'Dep_Time', 'Arrival_Time', 'Duration', 'Total_Stops',
       'Additional_Info', 'Price'],
      dtype='object')

In [None]:
df.isnull().sum()

Unnamed: 0,0
Airline,0
Date_of_Journey,0
Source,0
Destination,0
Route,1
Dep_Time,0
Arrival_Time,0
Duration,0
Total_Stops,1
Additional_Info,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
dtypes: int64(1), object(10)
memory usage: 918.2+ KB


Need to convert object type to numerical type

In [None]:
df.describe()

Unnamed: 0,Price
count,10683.0
mean,9087.064121
std,4611.359167
min,1759.0
25%,5277.0
50%,8372.0
75%,12373.0
max,79512.0


In [None]:
df.head()

Unnamed: 0,Airline,Date_of_Journey,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price
0,IndiGo,24/03/2019,Banglore,New Delhi,BLR → DEL,22:20,01:10 22 Mar,2h 50m,non-stop,No info,3897
1,Air India,1/05/2019,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662
2,Jet Airways,9/06/2019,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25 10 Jun,19h,2 stops,No info,13882
3,IndiGo,12/05/2019,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218
4,IndiGo,01/03/2019,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302


## Feature Engineering process Starts

For feature 'Date_of_Journey'

In [None]:
## The split() method in Python is used to split a string into a list based on a specified separator (called a delimiter).
df['Day'] = df['Date_of_Journey'].str.split('/').str[0]
df['Month'] = df['Date_of_Journey'].str.split('/').str[1]
df['Year'] = df['Date_of_Journey'].str.split('/').str[2]

In [None]:
df.info()
## But the new Day, Month, Year are still Object type and not in Int (numericall) Type

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
 11  Day              10683 non-null  object
 12  Month            10683 non-null  object
 13  Year             10683 non-null  object
dtypes: int64(1), object(13)
memory usage: 1.1+ MB


In [None]:
df['Day'] = df['Day'].astype(int)
df['Month'] = df['Month'].astype(int)

df['Year'] = df['Year'].astype(int)


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Date_of_Journey  10683 non-null  object
 2   Source           10683 non-null  object
 3   Destination      10683 non-null  object
 4   Route            10682 non-null  object
 5   Dep_Time         10683 non-null  object
 6   Arrival_Time     10683 non-null  object
 7   Duration         10683 non-null  object
 8   Total_Stops      10682 non-null  object
 9   Additional_Info  10683 non-null  object
 10  Price            10683 non-null  int64 
 11  Day              10683 non-null  int64 
 12  Month            10683 non-null  int64 
 13  Year             10683 non-null  int64 
dtypes: int64(4), object(10)
memory usage: 1.1+ MB


In [None]:
## Now we can drop Date_of_Journey

df.drop('Date_of_Journey', axis=1, inplace=True )
print(df.head())

       Airline    Source Destination                  Route Dep_Time  \
0       IndiGo  Banglore   New Delhi              BLR → DEL    22:20   
1    Air India   Kolkata    Banglore  CCU → IXR → BBI → BLR    05:50   
2  Jet Airways     Delhi      Cochin  DEL → LKO → BOM → COK    09:25   
3       IndiGo   Kolkata    Banglore        CCU → NAG → BLR    18:05   
4       IndiGo  Banglore   New Delhi        BLR → NAG → DEL    16:50   

   Arrival_Time Duration Total_Stops Additional_Info  Price  Day  Month  Year  
0  01:10 22 Mar   2h 50m    non-stop         No info   3897   24      3  2019  
1         13:15   7h 25m     2 stops         No info   7662    1      5  2019  
2  04:25 10 Jun      19h     2 stops         No info  13882    9      6  2019  
3         23:30   5h 25m      1 stop         No info   6218   12      5  2019  
4         21:35   4h 45m      1 stop         No info  13302    1      3  2019  


For 'Arrival_Time'

In [None]:
df['Arrival_Time'] = df['Arrival_Time'].apply(lambda x: x.split(' ')[0])

## To exclude the date part in 'Arrival_Time'

In [None]:
## In Python, particularly with Pandas, the .apply() method is used to apply a
## function to each element, row, or column of a DataFrame or Series.

df['Arrival_Hour'] = df['Arrival_Time'].str.split(':').str[0]
df['Arrival_Minute'] = df['Arrival_Time'].str.split(':').str[1]


In [None]:
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Dep_Time,Arrival_Time,Duration,Total_Stops,Additional_Info,Price,Day,Month,Year,Arrival_Hour,Arrival_Minute
0,IndiGo,Banglore,New Delhi,BLR → DEL,22:20,01:10,2h 50m,non-stop,No info,3897,24,3,2019,1,10
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,05:50,13:15,7h 25m,2 stops,No info,7662,1,5,2019,13,15
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,09:25,04:25,19h,2 stops,No info,13882,9,6,2019,4,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,18:05,23:30,5h 25m,1 stop,No info,6218,12,5,2019,23,30
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,16:50,21:35,4h 45m,1 stop,No info,13302,1,3,2019,21,35


In [None]:
df['Arrival_Hour'] = df['Arrival_Hour'].astype(int)
df['Arrival_Minute'] = df['Arrival_Minute'].astype(int)


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Source           10683 non-null  object
 2   Destination      10683 non-null  object
 3   Route            10682 non-null  object
 4   Dep_Time         10683 non-null  object
 5   Arrival_Time     10683 non-null  object
 6   Duration         10683 non-null  object
 7   Total_Stops      10682 non-null  object
 8   Additional_Info  10683 non-null  object
 9   Price            10683 non-null  int64 
 10  Day              10683 non-null  int64 
 11  Month            10683 non-null  int64 
 12  Year             10683 non-null  int64 
 13  Arrival_Hour     10683 non-null  int64 
 14  Arrival_Minute   10683 non-null  int64 
dtypes: int64(6), object(9)
memory usage: 1.2+ MB


In [None]:
## Frop 'Arrival_Time'
df.drop('Arrival_Time', axis=1, inplace=True)
print(df.head())

       Airline    Source Destination                  Route Dep_Time Duration  \
0       IndiGo  Banglore   New Delhi              BLR → DEL    22:20   2h 50m   
1    Air India   Kolkata    Banglore  CCU → IXR → BBI → BLR    05:50   7h 25m   
2  Jet Airways     Delhi      Cochin  DEL → LKO → BOM → COK    09:25      19h   
3       IndiGo   Kolkata    Banglore        CCU → NAG → BLR    18:05   5h 25m   
4       IndiGo  Banglore   New Delhi        BLR → NAG → DEL    16:50   4h 45m   

  Total_Stops Additional_Info  Price  Day  Month  Year  Arrival_Hour  \
0    non-stop         No info   3897   24      3  2019             1   
1     2 stops         No info   7662    1      5  2019            13   
2     2 stops         No info  13882    9      6  2019             4   
3      1 stop         No info   6218   12      5  2019            23   
4      1 stop         No info  13302    1      3  2019            21   

   Arrival_Minute  
0              10  
1              15  
2              25  


For 'Dep_Time'

In [None]:
df['Dep_hour'] = df['Dep_Time'].str.split(':').str[0]
df['Dep_Min'] = df['Dep_Time'].str.split(':').str[1]

In [None]:
df['Dep_hour'] = df['Dep_hour'].astype(int)
df['Dep_Min'] = df['Dep_Min'].astype(int)

In [None]:
df.drop('Dep_Time', axis=1, inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Source           10683 non-null  object
 2   Destination      10683 non-null  object
 3   Route            10682 non-null  object
 4   Duration         10683 non-null  object
 5   Total_Stops      10682 non-null  object
 6   Additional_Info  10683 non-null  object
 7   Price            10683 non-null  int64 
 8   Day              10683 non-null  int64 
 9   Month            10683 non-null  int64 
 10  Year             10683 non-null  int64 
 11  Arrival_Hour     10683 non-null  int64 
 12  Arrival_Minute   10683 non-null  int64 
 13  Dep_hour         10683 non-null  int64 
 14  Dep_Min          10683 non-null  int64 
dtypes: int64(8), object(7)
memory usage: 1.2+ MB


In [None]:
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,Day,Month,Year,Arrival_Hour,Arrival_Minute,Dep_hour,Dep_Min
0,IndiGo,Banglore,New Delhi,BLR → DEL,2h 50m,non-stop,No info,3897,24,3,2019,1,10,22,20
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,7h 25m,2 stops,No info,7662,1,5,2019,13,15,5,50
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,19h,2 stops,No info,13882,9,6,2019,4,25,9,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,5h 25m,1 stop,No info,6218,12,5,2019,23,30,18,5
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,4h 45m,1 stop,No info,13302,1,3,2019,21,35,16,50


for 'Total_Stops'

In [None]:
df['Total_Stops'].unique()

array(['non-stop', '2 stops', '1 stop', '3 stops', nan, '4 stops'],
      dtype=object)

In [None]:
df[df['Total_Stops'].isnull()]

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,Day,Month,Year,Arrival_Hour,Arrival_Minute,Dep_hour,Dep_Min
9039,Air India,Delhi,Cochin,,23h 40m,,No info,7480,6,5,2019,9,25,9,45


In [None]:
df['Total_Stops'] = df['Total_Stops'].map({'non-stop' : 0, '2 stops': 2, '1 stop': 1, '3 stops': 3 , np.nan:1 , '4 stops':4})

In [None]:
df.head()

Unnamed: 0,Airline,Source,Destination,Route,Duration,Total_Stops,Additional_Info,Price,Day,Month,Year,Arrival_Hour,Arrival_Minute,Dep_hour,Dep_Min
0,IndiGo,Banglore,New Delhi,BLR → DEL,2h 50m,0,No info,3897,24,3,2019,1,10,22,20
1,Air India,Kolkata,Banglore,CCU → IXR → BBI → BLR,7h 25m,2,No info,7662,1,5,2019,13,15,5,50
2,Jet Airways,Delhi,Cochin,DEL → LKO → BOM → COK,19h,2,No info,13882,9,6,2019,4,25,9,25
3,IndiGo,Kolkata,Banglore,CCU → NAG → BLR,5h 25m,1,No info,6218,12,5,2019,23,30,18,5
4,IndiGo,Banglore,New Delhi,BLR → NAG → DEL,4h 45m,1,No info,13302,1,3,2019,21,35,16,50


For 'Route'


In [None]:
df.drop('Route', axis=1, inplace=True)

In [None]:
df.head(2)

Unnamed: 0,Airline,Source,Destination,Duration,Total_Stops,Additional_Info,Price,Day,Month,Year,Arrival_Hour,Arrival_Minute,Dep_hour,Dep_Min
0,IndiGo,Banglore,New Delhi,2h 50m,0,No info,3897,24,3,2019,1,10,22,20
1,Air India,Kolkata,Banglore,7h 25m,2,No info,7662,1,5,2019,13,15,5,50


For 'Duration'

In [None]:
df['Duration'].isnull().sum()

np.int64(0)

In [None]:
def extract_hours(Duration):
  if Duration == np.nan:
    return 0
  elif 'h' in Duration:
    return int(Duration.split('h')[0].strip())
  else:
    return 0


In [None]:

def extract_min(duration):
    if duration == np.nan:
        return 0
    elif 'h' in duration and 'm' in duration:
        return int(duration.split('h')[1].replace('m', '').strip())
    elif 'm' in duration:
        return int(duration.replace('m', '').strip())
    else:
        return 0

In [None]:
df['Duration_Hour'] = df['Duration'].apply(extract_hours)
df['Duration_Min'] = df['Duration'].apply(extract_min)

In [None]:
df['Duration_Hour'].head(2)

Unnamed: 0,Duration_Hour
0,2
1,7


In [None]:
df['Duration_Min'].head(2)

Unnamed: 0,Duration_Min
0,50
1,25


In [None]:
df.drop('Duration', axis=1, inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Airline          10683 non-null  object
 1   Source           10683 non-null  object
 2   Destination      10683 non-null  object
 3   Total_Stops      10683 non-null  int64 
 4   Additional_Info  10683 non-null  object
 5   Price            10683 non-null  int64 
 6   Day              10683 non-null  int64 
 7   Month            10683 non-null  int64 
 8   Year             10683 non-null  int64 
 9   Arrival_Hour     10683 non-null  int64 
 10  Arrival_Minute   10683 non-null  int64 
 11  Dep_hour         10683 non-null  int64 
 12  Dep_Min          10683 non-null  int64 
 13  Duration_Hour    10683 non-null  int64 
 14  Duration_Min     10683 non-null  int64 
dtypes: int64(11), object(4)
memory usage: 1.2+ MB


For 'Airline', 'Destination', 'Source'

In [None]:
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder()

In [None]:
encoded = encoder.fit_transform(df[['Airline', 'Source', 'Destination']]).toarray()

In [None]:
print(encoded)

[[0. 0. 0. ... 0. 0. 1.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 1. 0. ... 0. 0. 0.]]


In [None]:
encoded_df = pd.DataFrame(encoded, columns=encoder.get_feature_names_out())

In [None]:
print(encoded_df)

       Airline_Air Asia  Airline_Air India  Airline_GoAir  Airline_IndiGo  \
0                   0.0                0.0            0.0             1.0   
1                   0.0                1.0            0.0             0.0   
2                   0.0                0.0            0.0             0.0   
3                   0.0                0.0            0.0             1.0   
4                   0.0                0.0            0.0             1.0   
...                 ...                ...            ...             ...   
10678               1.0                0.0            0.0             0.0   
10679               0.0                1.0            0.0             0.0   
10680               0.0                0.0            0.0             0.0   
10681               0.0                0.0            0.0             0.0   
10682               0.0                1.0            0.0             0.0   

       Airline_Jet Airways  Airline_Jet Airways Business  \
0              

In [None]:
df = pd.concat([df, encoded_df], axis=1)

In [None]:
print(df.head(2))

     Airline    Source Destination  Total_Stops Additional_Info  Price  Day  \
0     IndiGo  Banglore   New Delhi            0         No info   3897   24   
1  Air India   Kolkata    Banglore            2         No info   7662    1   

   Month  Year  Arrival_Hour  ...  Source_Chennai  Source_Delhi  \
0      3  2019             1  ...             0.0           0.0   
1      5  2019            13  ...             0.0           0.0   

   Source_Kolkata  Source_Mumbai  Destination_Banglore  Destination_Cochin  \
0             0.0            0.0                   0.0                 0.0   
1             1.0            0.0                   1.0                 0.0   

   Destination_Delhi  Destination_Hyderabad  Destination_Kolkata  \
0                0.0                    0.0                  0.0   
1                0.0                    0.0                  0.0   

   Destination_New Delhi  
0                    1.0  
1                    0.0  

[2 rows x 38 columns]


In [None]:
df.drop('Source', axis=1, inplace=True)

In [None]:
df.drop('Destination', axis=1, inplace=True)


In [None]:
df.drop('Airline', axis=1, inplace=True)


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 35 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   Total_Stops                                10683 non-null  int64  
 1   Additional_Info                            10683 non-null  object 
 2   Price                                      10683 non-null  int64  
 3   Day                                        10683 non-null  int64  
 4   Month                                      10683 non-null  int64  
 5   Year                                       10683 non-null  int64  
 6   Arrival_Hour                               10683 non-null  int64  
 7   Arrival_Minute                             10683 non-null  int64  
 8   Dep_hour                                   10683 non-null  int64  
 9   Dep_Min                                    10683 non-null  int64  
 10  Duration_Hour         

For 'Additional_Info'


In [None]:
df['Additional_Info'].unique()

array(['No info', 'In-flight meal not included',
       'No check-in baggage included', '1 Short layover', 'No Info',
       '1 Long layover', 'Change airports', 'Business class',
       'Red-eye flight', '2 Long layover'], dtype=object)

In [None]:
## Use One Hot encoding as well for this

encoded_info = encoder.fit_transform(df[['Additional_Info', ]]).toarray()

In [None]:
print(encoded_info)

[[0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 ...
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 1. 0.]]


In [None]:
df_Info = pd.DataFrame(encoded_info, columns=encoder.get_feature_names_out())

In [None]:
df = pd.concat([df_Info, df], axis=1)

In [None]:
df.drop('Additional_Info', axis=1, inplace=True)

In [None]:
df.head(2)

Unnamed: 0,Additional_Info_1 Long layover,Additional_Info_1 Short layover,Additional_Info_2 Long layover,Additional_Info_Business class,Additional_Info_Change airports,Additional_Info_In-flight meal not included,Additional_Info_No Info,Additional_Info_No check-in baggage included,Additional_Info_No info,Additional_Info_Red-eye flight,...,Source_Chennai,Source_Delhi,Source_Kolkata,Source_Mumbai,Destination_Banglore,Destination_Cochin,Destination_Delhi,Destination_Hyderabad,Destination_Kolkata,Destination_New Delhi
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [None]:
df.columns

Index(['Additional_Info_1 Long layover', 'Additional_Info_1 Short layover',
       'Additional_Info_2 Long layover', 'Additional_Info_Business class',
       'Additional_Info_Change airports',
       'Additional_Info_In-flight meal not included',
       'Additional_Info_No Info',
       'Additional_Info_No check-in baggage included',
       'Additional_Info_No info', 'Additional_Info_Red-eye flight',
       'Total_Stops', 'Price', 'Day', 'Month', 'Year', 'Arrival_Hour',
       'Arrival_Minute', 'Dep_hour', 'Dep_Min', 'Duration_Hour',
       'Duration_Min', 'Airline_Air Asia', 'Airline_Air India',
       'Airline_GoAir', 'Airline_IndiGo', 'Airline_Jet Airways',
       'Airline_Jet Airways Business', 'Airline_Multiple carriers',
       'Airline_Multiple carriers Premium economy', 'Airline_SpiceJet',
       'Airline_Trujet', 'Airline_Vistara', 'Airline_Vistara Premium economy',
       'Source_Banglore', 'Source_Chennai', 'Source_Delhi', 'Source_Kolkata',
       'Source_Mumbai', 'Destinati

In [None]:
df.shape

(10683, 44)

In [None]:
df.isnull().sum()

Unnamed: 0,0
Additional_Info_1 Long layover,0
Additional_Info_1 Short layover,0
Additional_Info_2 Long layover,0
Additional_Info_Business class,0
Additional_Info_Change airports,0
Additional_Info_In-flight meal not included,0
Additional_Info_No Info,0
Additional_Info_No check-in baggage included,0
Additional_Info_No info,0
Additional_Info_Red-eye flight,0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10683 entries, 0 to 10682
Data columns (total 44 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   Additional_Info_1 Long layover                10683 non-null  float64
 1   Additional_Info_1 Short layover               10683 non-null  float64
 2   Additional_Info_2 Long layover                10683 non-null  float64
 3   Additional_Info_Business class                10683 non-null  float64
 4   Additional_Info_Change airports               10683 non-null  float64
 5   Additional_Info_In-flight meal not included   10683 non-null  float64
 6   Additional_Info_No Info                       10683 non-null  float64
 7   Additional_Info_No check-in baggage included  10683 non-null  float64
 8   Additional_Info_No info                       10683 non-null  float64
 9   Additional_Info_Red-eye flight                10683 non-null 