This project aims to analyze a flight booking dataset obtained from the “Ease My Trip” website to derive meaningful insights and predict flight prices using various statistical and machine learning techniques.
The dataset contains information about flight booking options for travel between India's top 6 metro cities. It includes 300,261 data points and 11 features. The features are as follows:
- Airline: The name of the airline company.
- Flight: Flight code.
- Source City: City from which the flight takes off.
- Departure Time: Time of departure, grouped into bins.
- Stops: Number of stops between the source and destination cities.
- Arrival Time: Time of arrival, grouped into bins.
- Destination City: City where the flight will land.
- Class: Seat class, either Business or Economy.
- Duration: Total travel time between cities in hours.
- Days Left: Days left for the journey from the booking date.
- Price: Target variable, the price of the ticket.
flight_price_prediction.ipynb
: Jupyter Notebook containing the data exploration, preprocessing, EDA, and model building steps.Clean_Dataset.csv
: The dataset used for analysis and model building.- Download data from the
https://www.kaggle.com/datasets/shubhambathwal/flight-price-prediction/data
README.md
: This readme file.
To get started with this project, follow these steps:
- Python 3.7 or above
- Jupyter Notebook or Jupyter Lab
- The following Python packages:
- pandas
- numpy
- seaborn
- matplotlib
- plotly
- scikit-learn
- statsmodels
-
Clone the repository:
git clone https://github.com/IamNanduni/flight_price_prediction_mode-.git cd flight-price-prediction
-
Install the required packages:
pip install pandas numpy seaborn matplotlib plotly scikit-learn statsmodels
-
Open the Jupyter Notebook:
jupyter notebook flight_price_prediction.ipynb
-
Follow the steps in the notebook to:
- Load and explore the dataset.
- Preprocess the data.
- Perform exploratory data analysis (EDA).
- Build and evaluate regression models to predict flight prices.
- Check assumptions and improve the models.
In this project, we performed EDA to understand the distribution of data and relationships between features. Some of the visualizations include:
- Distribution of airlines and their average pricing.
- Class distribution in airlines.
- Price trends based on departure time and arrival time.
- Analysis of business and economy class flights.
We used Linear Regression and Statsmodels to build and evaluate models for flight price prediction. We also checked assumptions like normality and equal variance to ensure model reliability. Additionally, we explored transforming the target variable to improve model performance.
The models were evaluated using metrics like R-squared and Root Mean Square Error (RMSE). The final model was selected based on the assumption checks and performance metrics.
If you have suggestions or improvements, feel free to create a pull request or open an issue.
This project is licensed under the MIT License. See the LICENSE file for details.