# Flight Fare Optimizer

## Overview

This model solves a minimum-cost flight itinerary optimization problem using linear programming. It selects a path from a source city to a destination city by chaining together valid flights while minimizing the total fare. The model ensures logical routing with flow conservation constraints and supports multiple intermediate connections.

## Data source
A copy of the [Airelines Flights Data](https://www.kaggle.com/datasets/rohitgrewal/airlines-flights-data/data) provided in the Kaggle repository was saved in this project's folder locally. The dataset contained inside a csv file which includes the following attributes/columns described by the Author:

1)  Airline: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
2)	Flight: Stores information regarding the plane’s flight code. It is a categorical feature.
3)	Source City: City from which the flight takes off. It is a categorical feature having 6 unique cities.
4)	Departure Time: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
5)	Stops: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
6)	Arrival Time: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
7)	Destination City: City where the flight will land. It is a categorical feature having 6 unique cities.
8)	Class: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
9)	Duration: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
10)	Days Left: This is a derived characteristic that is calculated by substracting the trip date from the booking date.
11)	Price: Target variable stores information of the ticket price.

## Exploratory Data Analysis
The data analysis is initiated by importing the dataset. But first is important to check which python environment is the notebook using. This is done because is important to make sure that we are running Python inside the virtual environment.

In [2]:
import sys
sys.executable

'C:\\Users\\Andres\\flight-fare-optimizer\\.venv\\Scripts\\python.exe'

The results points to the python.exe file within the '.venv > Scripts' folder inside the project's directory. If the result was something like 'AppData\Local\Programs\Python\Python3X', that would indicate that global Python is being run, not the virtual environment (.venv). 

It is important to validate this because some specific libraries will be used for importing and visualizing the data and is better for those to be installed locally in the project's directory so that it does not create conflict with other versions installed for other projects. Now, the libraries and the dataset can be properly imported for analysis.

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load flights dataset
df = pd.read_csv('../data/raw/airlines_flights_data.csv')

# Preview the data
print(df.shape)
df.head()

(300153, 12)


Unnamed: 0,index,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955
