Skip to content

Predicting flight ticket prices using a random forest regression model based on scraped data from Kayak. A Kayak scraper is also provided.

Notifications You must be signed in to change notification settings

MeshalAlamr/flight-price-prediction

Repository files navigation

flight-price-prediction

SDAIA Bootcamp project 2 - web scraping/linear regression.

This project aims to predict ticket prices for upcoming flights to help customers in selecting the optimum time for travel and the cheapest flight to the desired destination. A random forest regression model is applied to forecast the flight prices based on data scraped from Kayak.

Table of Contents

Project Proposal

The project proposal can be found here.

Project MVP

The project MVP can be found here.

Scraping

The Kayak Scraper Notebook can be found here.

Here's a demo of the scraper in action (played at 2x speed):

scraper (1)

The scraped data can be found here.

image

In total, the data consists of 55,363 rows and 7 columns.

Analysis and Results

The project notebook can be found here.

Selected features are:

  • Source (4 Sources were selected for this project)
  • Destination (4 Destinations were selected for this project)
  • Total Stops
  • Average Price per Airline
  • Duration
  • Price (Target)

Correlation of features:

image

Experimenting with different models:

image

The final selected model is the random forest regression model with:

Metric Score
MAE 61.87
MSE 40409.87
RMSE 201.02

Therefore, the final model is able to predict flight ticket prices within around ≈ $61.87.

The final model can be found here.

image

Presentation

The presentation can be found here.

Mobile App

We've also developed an app on Android that finds the average estimated prices for a selected route and month based on our scraped data.

image image

Below, a demo of the mobile app is shown:

flight-pred-app

Authors