# Data Description
Dataset contains information about flight booking options from the website Easemytrip for flight travel between India's top 6 metro cities. There are 300153 datapoints and 11 features in the cleaned dataset.

## Feature:
* Airline: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
* Flight: Flight stores information regarding the plane's flight code. It is a categorical feature.
* Source City: City from which the flight takes off. It is a categorical feature having 6 unique cities.
* Departure Time: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
* Stops: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
* Arrival Time: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
* Destination City: City where the flight will land. It is a categorical feature having 6 unique cities.
* Class: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
* Duration: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
* Days Left: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.

### Target Column
* Price: Target variable stores information of the ticket price.

Source of Dataset: https://www.kaggle.com/datasets/shubhambathwal/flight-price-prediction

# Import Libraries

In [2]:
import pandas as pd
import numpy as np

# Data Ingestion

In [8]:
df = pd.read_csv("data/Flight_Price.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955


### Basic Sanity Check

In [6]:
# Get shape of df 
df.shape

(300153, 12)

In [10]:
# Check if there are any null values
df.isnull().sum()

# There are no null values in the dataset

Unnamed: 0          0
airline             0
flight              0
source_city         0
departure_time      0
stops               0
arrival_time        0
destination_city    0
class               0
duration            0
days_left           0
price               0
dtype: int64

In [18]:
# seperate numerical and categorical column
categorical_col = df.columns[df.dtypes == 'object']
print("categorical Columns: ",categorical_col)

numerical_col = df.columns[df.dtypes != 'object']
print("Numerical Columns: ",numerical_col)

# check unique values for all categorical column
for i in categorical_col:
    print("\nColumn Name: ", i)
    print(df[i].unique())

categorical Columns:  Index(['airline', 'flight', 'source_city', 'departure_time', 'stops',
       'arrival_time', 'destination_city', 'class'],
      dtype='object')
Numerical Columns:  Index(['Unnamed: 0', 'duration', 'days_left', 'price'], dtype='object')

Column Name:  airline
['SpiceJet' 'AirAsia' 'Vistara' 'GO_FIRST' 'Indigo' 'Air_India']

Column Name:  flight
['SG-8709' 'SG-8157' 'I5-764' ... '6E-7127' '6E-7259' 'AI-433']

Column Name:  source_city
['Delhi' 'Mumbai' 'Bangalore' 'Kolkata' 'Hyderabad' 'Chennai']

Column Name:  departure_time
['Evening' 'Early_Morning' 'Morning' 'Afternoon' 'Night' 'Late_Night']

Column Name:  stops
['zero' 'one' 'two_or_more']

Column Name:  arrival_time
['Night' 'Morning' 'Early_Morning' 'Afternoon' 'Evening' 'Late_Night']

Column Name:  destination_city
['Mumbai' 'Bangalore' 'Kolkata' 'Hyderabad' 'Chennai' 'Delhi']

Column Name:  class
['Economy' 'Business']
