# Determining Airline Prices
By: Chirstopher Kuzemka : [Github](https://git.generalassemb.ly)

## Problem Statement

Aviation is one of the largest industries dominating our global market today. Commercial aviation has made it possible for people to connect with each other in ways that may have been unimaginable over a century ago. However, a lot of thought must be put into the FAA standards and routes that modern planes must make today to make such connections possible.

Consider the case example where a startup airliner, known as "Kruze", wants to establish itself as a top competitor against existing airliners today. A part of this startup process focuses on understanding the costs that will come into play when managing flights. Our job as data scientists today is to help Kruze determine the minimum threshold cost the airliner must charge their passengers on a ticket class basis in order to break even with a profit. To do this, we are going to use existing flight routes (velocity and altitude data), existing data on jet fuel pricing, and existing flight ticket prices (as a prediction) to help us create a supervised learning model. 

To start, we will approach the project with the intention of expressing a minimum proof of concept. With such introduction, we will make some limitations to our study and decrease the potential for scope increase by:

- conducting an idealized thermal jet propulsion cycle for feature engineering purposes (focusing on an open Brayton cycle in particular)
- analyzing flight route data across the U.S. domestically; choosing up to 3 routes of varying sizes and suggesting their reverse flight paths as data inputs as well. 
    - **Houston, TX** to **Los Angeles, CA** (IAH - LAX)
    - **New York City, NY** to **Miami, FL** (JFK - MIA)
    - **Portland, WA** to **Chicago, IL** (PDX - ORD)
- assuming air to be treated as an ideal gas
- assuming operating engine conditions to be steady state
- assuming kinetic energy and potential energy to be negligible in our system, except at inlet and exit conditioins of jet engine itself
- assuming atmospheric temperature, pressure, and air density to be an averaged value between 0 and 15,000 meters altitude
- assuming data incorporating head or tail wind effects to be negligible
- assuming passenger weight to be negligible
- assuming external costs from the study (including food/maintenance/crew salary) to be negligible
- using price data from future flights as opposed to previous flights as previous flight pricing is not readily available


All current assumptions labeled are set to allow us to achieve (or attempt to achieve) our goal within a certain time frame, as Kruze is requiring an answer from us quickly! With this in mind, we will consider discussing how such assumptions can contribute to any error throughout our study, as well as remind ourselves that integrating negated features for future work may actually be very beneficial to us in achieveing a stronger prediction. Conducting an idealized thermal engine analysis will help us understand the average power output of a given plane's engines throughout different phases of its flight. Routes chosen throughout a variety of times and seasons will also help us determine how such elements play a role in pricing. Finally, some plane specifications (including aircraft type, number of seats it supports, as well as type/number of engines) will allow us to consider any extra technical factors for ticket pricing. 

As we are working with what is considerred to be a continuous variable, we will analyze common price trends utilizing a supervised regression model, such as Linear Regression, Logistic Regression, SVR, AdaBoosting Regression, Gradient Boosting Regression, KNNRegression, and Naive Bayes Regression. We will ultimately be using the Mean Absolute Error against our predictions to help us gauge how well our selected model predicts the price and discuss what issues may be observed from the limitations of this study.



## Executive Summary

## Table of Contents
[1.00 Data Loading](#1.00-Data-Loading)

[2.00 Data Cleaning and Analysis](#2.00-Data-Cleaning-and-Moderate-Analysis)

- [2.01 Quick Check](#2.01-Quick-Check)

- [2.02 Data Documentation Exploration](#2.02-Data-Documentation-Exploration)

- [2.03 Cleaning](#2.03-Cleaning)

- [2.04 Exploratory Data Analysis and Visualization](#2.04-Exploratory-Data-Analysis-and-Visualization)

[3.00 Machine Learning Modeling and Visulalization](#3.00-Machine-Learning-Modeling-and-Visulalization)

- [3.01 Model Preparation](#3.01-Model-Preparation)

- [3.02 Modeling](#3.02-Modeling)

- [3.03 Model Selection](#3.03-Model-Selection)

- [3.04 Model Evaluation](#3.04-Model-Evaluation)

[4.00 Conclusions](#4.00-Conclusions)

[5.00 Sources and References](#5.00-Sources-and-References)

## Data Dictionary

## Variable Dictionary

$C_p$ = specific heat with constant pressure

$r_p$ = pressure ratio/compression ratio

$\dot{m}$ = mass flow rate

$\dot{W}$ = power

$h$ = enthalpy

$k$  = Boltzmann constant

$V$ = velocity

$T$ = temperature

$Q$ = heat added to the system

$F$ = force

## Idealized Jet Propulsive Data Conversion

### Understanding The Concept

**Insert picture and quickly discuss the different states of the idealied jet propulsion cycle

**Givens:**
$$r_p = 32.9$$

$$C_p = 1.005\frac{\text{kJ}}{\text{kg K}}$$

$$T_4 = 2000^{\circ} \text{C} + 273 = 2273\text{K} \text{ (combustion chamber temperature)}$$

$$k = 1.4$$

$$d = \text{diameter of jet engine air inlet assumed to be uniform}$$


**For Process 1-2 (Inlet through the Diffuser; Isentropic Compression)**

$$h_2 + \frac{{V_2}^2}{2} = h_1 + \frac{{V_1}^2}{2}$$

As the assumption was made where the air velocity leaving the diffuser will equal zero ($V_2 = 0\frac{\text{m}}{\text{sec}}$), the equatiion is revised:

$$0 = h_2 - h_1 - \frac{{V_1}^2}{2}$$

$$0  = C_p(T_2 - T-1) - \frac{{V_1}^2}{2}$$

$$T_2 = T_1 + \frac{{V_1}^2}{2C_p}$$

And for pressure:

$$P_2 = P_1(\frac{T_2}{T_1})^{\frac{k}{k-1}}$$

**For Process 2-3 (Diffuser through the Compressor: Isentropic Compression)**

$$P_3 = r_pP_2$$

NOTE that $P_3 = P_1$

$$T_3 = T_2(\frac{P_3}{P_2})^{\frac{k-1}{k}}$$

**For Process 4-5 (Combustion Chamber through the Turbine: Isentropic Expansion)**. Assuming the work between the compressor and turbine to be equal and knowing from the givens that $T_4 = 2000^{\circ} \text{C} = 2273\text{K}:$

$$h_3 - h_2 = h_4 - h_5$$

$$C_p(T_3 - T_2) = C_p(T_4 - T_5)$$

$$T_5 = T_4 - T_3 + T_2$$

Since $P_3 = P_4$:

$$P_5 = P_4(\frac{T_5}{T_4})^{\frac{k}{k-1}}$$

For Process 5-6 (Nozzle Entrance to Nozzle Exit" Isentropic Expansion). As the air rushes out to the atmosphere, into atmoshperic conditions. $P_1 = P_6 \therefore$

$$T_6 = T_5(\frac{P_6}{P_5})^{\frac{k-1}{k}}$$

For exit velocity:

$$h_6 + \frac{{V_6}^2}{2} = h_5(\frac{{V_5}^2}{2})$$

Where $V_5 = 0$ ideally is:

$$0 = C_p(T_6 - T_5) + \frac{{V_6}^2}{2}$$

$$V_6 = \sqrt{2C_p(T_6 - T_5)}$$

**To find propulsive thrust:**

$$\dot{W}_p = \dot{m}(V_{\text{exit}} - V_{\text{inlet}})V_{\text{aircraft}}$$

**To find Energy Input:**

$$ \dot{Q}_{\text{in}} = \dot{m}C_p(T_4 - T_3)$$

Where $\dot{m}$ is found using the average density of air, inlet velocity, and general area of jet shaft $(\dot{m} = \rho V_{\text{inlet}} A)$. The Thermal efficiency is found by taking the ratio of the propulsive power and energy input rate:

$$\eta = \frac{\dot{W}_p}{\dot{Q}_{\text{in}}}$$

Finally, the thrust is calculated by:

$$F = \frac{\dot{W}_p}{V_{\text{aircraft}}}$$

## Conclusions and Future Work

For the future, consider incorporating weather data, randomized passenger weight data, incorporate the dynamic changes in fuel/mass ratio throughout a flight, incorporate some demographical passenger data, more routes, the ability for the problem to become a UI tool rather than just a study.