# ‚úàÔ∏è Airlines Flights Dataset Analysis
## Flight Booking Data for Various Cities in India

---

### üìä Dataset Overview

The **Flights Booking Dataset** contains scraped data from a famous travel website, structured to provide comprehensive insights into flight travel details between cities in India. This dataset is ideal for professionals working in the **Airlines and Travel domain** and offers valuable insights into pricing patterns, airline operations, and travel trends.

**Data Format:** CSV  
**Analysis Tool:** Pandas DataFrame  
**Domain:** Airlines & Travel Industry

---

## üéØ Research Questions

This project addresses the following analytical questions:

1. **Q.1** - What are the airlines in the dataset, accompanied by their frequencies?

2. **Q.2** - Show Bar Graphs representing the Departure Time & Arrival Time.

3. **Q.3** - Show Bar Graphs representing the Source City & Destination City.

4. **Q.4** - Does price vary with airlines?

5. **Q.5** - Does ticket price change based on the departure time and arrival time?

6. **Q.6** - How does the price change with change in Source and Destination?

7. **Q.7** - How is the price affected when tickets are bought in just 1 or 2 days before departure?

8. **Q.8** - How does the ticket price vary between Economy and Business class?

9. **Q.9** - What will be the Average Price of Vistara airline for a flight from Delhi to Hyderabad in Business Class?

---

## üìã Dataset Features

### Categorical Features:

| Feature | Description | Unique Values |
|---------|-------------|---------------|
| **Airline** | Name of the airline company | 6 different airlines |
| **Flight** | Plane's flight code | Multiple unique codes |
| **Source City** | City from which the flight takes off | 6 unique cities |
| **Departure Time** | Derived feature - Time periods grouped into bins | 6 unique time labels |
| **Stops** | Number of stops between source and destination | 3 distinct values |
| **Arrival Time** | Derived feature - Time intervals grouped into bins | 6 distinct time labels |
| **Destination City** | City where the flight will land | 6 unique cities |
| **Class** | Seat class information | 2 values (Business & Economy) |

### Continuous Features:

| Feature | Description | Type |
|---------|-------------|------|
| **Duration** | Total travel time between cities (in hours) | Continuous |
| **Days Left** | Derived feature: Trip date - Booking date | Continuous |
| **Price** | Ticket price | **Target Variable** |

---

## üìù Detailed Feature Descriptions

1. **Airline**: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.

2. **Flight**: Flight stores information regarding the plane's flight code. It is a categorical feature.

3. **Source City**: City from which the flight takes off. It is a categorical feature having 6 unique cities.

4. **Departure Time**: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.

5. **Stops**: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.

6. **Arrival Time**: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.

7. **Destination City**: City where the flight will land. It is a categorical feature having 6 unique cities.

8. **Class**: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.

9. **Duration**: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.

10. **Days Left**: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.

11. **Price**: Target variable stores information of the ticket price.

<div style="background: linear-gradient(135deg, #ff6b6b 0%, #ff8e53 50%, #ffa726 100%); padding: 30px 35px; border-radius: 15px; color: #ffffff; margin: 25px 0; box-shadow: 0 10px 25px rgba(255, 107, 107, 0.4);">
    <h3 style="color: #ffffff; margin: 0 0 15px 0; font-size: 1.7rem; font-weight: 700;">üë®‚Äçüíª Project Information</h3>
    <div style="line-height: 1.8;">
        <strong>Author:</strong> Sajjad Ali Shah<br>
        <strong>LinkedIn:</strong> <a href="https://www.linkedin.com/in/sajjad-ali-shah47/" target="_blank" style="color: #ffffff; text-decoration: underline;">Connect with me</a><br>
        <strong>Dataset:</strong> <a href="https://www.kaggle.com/datasets/rohitgrewal/airlines-flights-data/data" target="_blank" style="color: #ffffff; text-decoration: underline;">Airlines Flights Dataset on Kaggle</a>
    </div>
</div>

<div style="background: linear-gradient(135deg, #ff6b6b 0%, #ff8e53 50%, #ffa726 100%); padding: 30px 35px; border-radius: 15px; color: #ffffff; margin: 25px 0; box-shadow: 0 10px 25px rgba(255, 107, 107, 0.4);">
    <h3 style="color: #ffffff; margin: 0 0 15px 0; font-size: 1.7rem; font-weight: 700; text-align: center;">üìö Import Libraries</h3>
    
</div>

In [1]:
# import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# import the time series libraries
import statsmodels.api as sm
import warnings
warnings.filterwarnings("ignore")


In [2]:
# load the data
df=pd.read_csv("Data/airlines_flights_data.csv")

In [11]:
# check the data shape info  and the first 5 rows
print(df.shape)

print("="*50)

print(df.info())

print("="*50)

df.head()



(300153, 12)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300153 entries, 0 to 300152
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   index             300153 non-null  int64  
 1   airline           300153 non-null  object 
 2   flight            300153 non-null  object 
 3   source_city       300153 non-null  object 
 4   departure_time    300153 non-null  object 
 5   stops             300153 non-null  object 
 6   arrival_time      300153 non-null  object 
 7   destination_city  300153 non-null  object 
 8   class             300153 non-null  object 
 9   duration          300153 non-null  float64
 10  days_left         300153 non-null  int64  
 11  price             300153 non-null  int64  
dtypes: float64(1), int64(3), object(8)
memory usage: 27.5+ MB
None


Unnamed: 0,index,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955
