# Team 3-3: Phase 2 EDA - Predicting Flight Delays to Mitigate Potential Delays and Costs

## Phase Leader Plan
| Week | Date | Phase | Owner | Deliverable Due |
|-----| ----- | ----- | ----- | ----- |
| Week 1 | Oct 27 | Phase 1 | Jason Dong | Nov 4
| Week 2 | Nov 3 | Phase 2 | Nick Gasser |
| Week 3 | Nov 10 | Phase 2 | Anson Quon |
| Week 4 | Nov 17 | Phase 2 | Gilbert Wong | Nov 24
| Week 5 | Nov 24 | Phase 3 | Sameer Karim |
| Week 6 | Dec 1 | Phase 3 | Jason Dong | Dec 8

## Project Abstract

Airline On-Time Performance, defined as a flight arriving within 15 minutes of expected arrival time, is a critical key performance indicator airlines and regulators track across the flight industry. The TranStats flight data from the US Department of Transportation, USDOT,  indicates 18% of flights were delayed by more than 15 minutes between 2015-2021, illustrating room for improvement across the industry. Each minute a flight is delayed can lead to escalating operational costs, especially with the USDOT’s new ruling requiring airline companies to provide automatic refunds for any domestic flights delayed three hours or more.

Predicting whether a flight will be delayed would inevitably produce two classes of errors we seek to minimize. When we have excessive false positive cases where flights are predicted to be delayed when they are on-time, we risk burdening airline businesses with unnecessary rearrangements and increased operational overhead. When we have excessive false negative cases where flights are predicted to be on-time when they are actually delayed, we risk breaking the trust between airlines and their customers, leading to potential customer churn and loss of business. That said, we are valuing false negatives more impactful and looking to minimize those first. The primary metric we will optimize for our classification model is the F-Beta score, weighted to prioritize recall (false negatives).

The goal of our classification model is to predict if a flight will be delayed two hours prior to the flight's expected departure time and optimizing for our F-Beta score metric. Flight data from TranStats has been used to identify effects of flight delays across the network. Additionally, we have supplemented flight information with weather data from the National Oceanic and Atmospheric Administration to further determine temporal trends and impacts to delays. Our model will not include features directly related to causing flight delays such as mechanical issues, IT failures, or staffing shortages. Our machine learning pipeline includes checkpoints for feature extraction and feature engineering to prevent data leakage, create temporal and graph features, and appropriately balancing and splitting our feature set prior to model training. 

We chose a logistic regression model as our baseline due to its interpretability and relative ease of implementation. The baseline model was separated into two parts - a base logistic regression model with 21 features, as well as a fuller logistic regression model with 21 features and 6 custom features. The base model achieved an F-Beta test score of 0.699 for the delayed flight class on 12 months of data, while the fuller model achieved an F-Beta test score of 0.714. With this improvement in F-Beta score, our custom features of whether the previous flight was delayed, the delay time, number of days between the previous and current flight, cyclical time feature, and arrival time of previous flight have shown to be important. Our next steps are to experiment with random forests, bagging/boosting and multilayer perceptron neural networks, as well as include more custom features such as weather forecasts and graph-based features.


## Data


### Description of Data

The data for the project will be sourced from three different datasets.
  - The first is flight information from the US Department of Transportation (DOT). This contains 109 features and ~31.7 million rows. It contains information related to the flights such as departure and arrival destinations, fligth durations (taxi and flight times), carrier information, distance traveled, and whether the flight was delayed or diverted. This data will be limited to US states and territories for depature and arrival locations. The full dataset contains information from 2015 to 2021 and totals 2.94 GB.
  - The second dataset contains weather information from the National Oceanic and Atmospheric Administration. This contains 177 features and ~630.9 million rows. It contains various weather information such as temperature, humidity, precipitation, visiblity, sunrise time, and sunset time. The location information and time information can be used to join with the flight dataset to gain information on potential weather features impacting flight delays. The full dataset contains information from 2015 to 2021 and totals 35.05 GB.
  - The final dataset contains airport information from the US DOT. This contains 10 features and ~18K rows. It contains location information for airports which can be used to merge the weather and flight datasets.

The data has been split into various time intervals to perform initial model development. The rolling window for time splits was set up to cover 24-hour periods, allowing us to incrementally shift the training and test sets to simulate real-time prediction scenarios. This approach ensured that each test set contained data from a subsequent day, providing a realistic evaluation of model performance over time. This approach involved splitting the dataset into training and test sets based on time, ensuring that the model was trained on past data and tested on future data. This method closely mimics real-world scenarios where predictions need to be made based on historical information. By using rolling windows, we also ensured that the model could adapt to seasonal variations and other temporal patterns that affect flight delays.

We masked the data by setting values to None if the twoHoursPriorDiffPrevF value was less than or equal to zero. This ensured that data from previous flights was only retained if it was relevant for predicting current flight conditions, effectively maintaining data quality and relevance. Initially, we filtered the dataset to focus only on flights originating and landing within the United States, as international flights often have additional complexities related to customs and air traffic control regulations. We also filtered out records with missing or erroneous values, particularly those related to departure and arrival times, which are crucial for predicting delays.

### Data Dictionary
The data from each source has been narrowed down to useful features. Please see the data dictionary to see the complete list of features for each dataset, which features could be used with model development, and the plan to merge the data together. [(Data Dictionary)](https://docs.google.com/spreadsheets/d/1cxMpgoy3YIUD1OGv9_BM3s-Q6DRTpuuF_pKyUKDrXJc/edit?gid=0#gid=0).

Key data elements to support our model prediction are listed below:
| Data Element | Objective |
| ----- | ----- |
| <b>Flight Data</b> | |
| Reporting Airline and Flight Number | Identifies a flight route with airline company between airports |
| Tail Number | Identifies an airline enabling the reconstruction of a plane's flight history |
| City, State, Latitude, Longitude | Filter data to US and US territories and connect with weather data |
| Destination and Arrival Airport | Basis to create graph features to measure an airport's influence on flight network |
| <b> Preventing Data Leakage </b> | |
| Departure and Arrival Time | Exclude data within two hours of expected departure time
| Weather Reading DateTime | Exclude weather data within two hours of expected departure time
| <b> Prediction Objectives </b> | |
| Departure Delay Indicator | Primary prediction: boolean indicator if a departure was delayed 15 minutes or more |
| Departure Delay Group | Delay time grouping in 15 minute intervals |
| Carrier Delay | Indicator if delay was due to carrier |
| Weather Delay | Indicator if delay was due to weather |



### Initial EDA

As mentioned earlier, the data was explored to gain an understanding of which features could potentially be used, potential feature correlation, and data distribution.
  - The project required use of US state and territory flights (departure and arrival). The flights dataset was reviewed and only contains US state and territory depatures and arrivals.
  - It was determined that each value in the dataset was duplicated. This was likely due to a departure and arrival record for each flight. Each dataset was deduplicated before proceeding.
  - Upon reviewing the data it was determined the predicted outcome variable (delayed more than 15 minutes) is skewed, 82% of the flights were on time and 18% were delayed. This makes sense as most flights aren't delayed. Models and evaluation metrics will need to account for this skew in the outcome variable.
  - There was also a right skew in the data for the delay times. The average delay time was 9.2 minutes but the median was -2.0 minutes (early). The minimum delay time was -29 minutes and the maximum was 1,175 minutes.







<img src="https://raw.githubusercontent.com/ngasserberk/mids-w261-final_project/refs/heads/main/delay_dist.png?token=GHSAT0AAAAAACZ5NDIXRVHTSCX3KMVXXOM2ZZIBO2Q">

  - To gain an understanding if there was a uniform distribution of flights, the count of flights and percentage of delayed flights by year, month, and day of week were reviewed on a sample of the full dataset (2015-2021). 
    - In 2018 and 2019, the count of flights began to increase before drastically decreasing in 2020 and 2021, likely due to COVID-19 pandemic.
    - There was a smaller number of delayed flights in 2020 compared to other years (2021 was similar to 2015-2019).
    - There appears to be seasonality across months of the data. Similar behavior occurs throughout the week, with a lower percent of delays on Monday and Tuesday.

<img src="https://raw.githubusercontent.com/ngasserberk/mids-w261-final_project/refs/heads/main/seasonality_flight_count.png?token=GHSAT0AAAAAACZ5NDIWIWUK3SDLMXMPZBESZZIBAKQ">

<img src="https://raw.githubusercontent.com/ngasserberk/mids-w261-final_project/refs/heads/main/seasonality_delay_perc.png?token=GHSAT0AAAAAACZ5NDIXY3M4UUICCHGHK32OZZH756Q">

  - The distribution of flights by airlines was reviewed and the distribution of delay time from the full dataset (2015-2021).
    - The most flights are with WN. The majority of the flights are by WN, DL, AA, OO, and UA. Following those, there are 15 other airlines with fewer flight counts.
    - There does not seem to be a destinct correlation between number of flights for an airline and delay time. As shown earlier, there is a large skew in the delay times for each airline. The figure was restricted to a maximum delay time of 100 minutes while we saw earlier the max delay was 1,175 minutes.
    - Airlines F9, B6, QX, and WN appear to have the widest distribution of delay times.

<img src="https://github.com/ngasserberk/mids-w261-final_project/blob/main/airline_box_delays.png?raw=true">

<img src="https://raw.githubusercontent.com/ngasserberk/mids-w261-final_project/refs/heads/main/airline_box_delays.png?token=GHSAT0AAAAAACZ5NDIWOKQRLZ45OTYHZM6OZZIBQLQ">

### Missing Data 

Missing data at the feature level was reviewed. The count and percent of non-null values were reviewed for each dataset
  1. Initial quick analysis of the flights dataset was reduced to remove any feature that had less than 15% percent of non-null values. While, normally you wouldn't want to include features that sparse, some are only filled for delayed flights, such as the delay indicators (weather, carrier, etc.). Thus, these are only ~18% filled. The remaining features with less than 15% filled values were dropped, 48. This included flight information such as grounded time away from gate, and flight deviation information.
  2. The weather dataset was reviewed and determined to be sparse for the majority of the features. However, this dataset is at the latitude and longitude level. Thus, taking a direct reduction from the full table wouldn't make sense as some coordinates won't join with the flight data. Missing values will be evaluated further after joining the data.

Our plan to address missing data is categorized into the following buckets:
- <b> Delay Data </b> - Data specific to delayed flights will be filled with a generic value for on time flights.
- <b> Weather / Temporal Data </b> - Linear or quadratic interpolation following further assessment of trends and time gaps.
- <b> Numerical Data </b> - Impute mean, median, or mode depending on the distribution of data.
- <b> Categorical Flight Data </b> - Carry last observation forward to maintain continuity of data per tail aircraft.

## Data Balancing 

We performed a series of data transformations, including undersampling and some categorical feature engineering. We started by addressing the class imbalance in our dataset. Most flights are on time, with only a small percentage experiencing significant delays. This imbalance can lead to biased model predictions, where the model becomes too focused on predicting the majority class (on-time flights) and fails to correctly identify delayed flights. To counteract this, we used undersampling to reduce the number of on-time flights to match the number of delayed flights. This balanced dataset helps the model learn more effectively about the factors contributing to delays, reducing the likelihood of it simply predicting that all flights are on time.

**Balance in Model Training**: By using a 50/50 ratio, we ensure that the model receives an equal number of delayed and on-time flights during training. This balance helps the model learn to recognize the factors associated with both classes more effectively, instead of being biased toward predicting the majority class (on-time flights). If we trained on the imbalanced dataset, the model would likely predict most flights as on-time, resulting in poor recall for the delayed class.

**Improving Recall for Delays**: In the context of flight delays, false negatives (i.e., predicting a flight will be on time when it is delayed) are more critical than false positives. With a balanced dataset, the model has an improved chance of detecting actual delays, thus increasing recall for the minority class. This is important because failing to predict a delayed flight can lead to operational disruptions and poor passenger experience.

**Model Generalizability**: Although undersampling reduces the number of on-time flights in the training set, it allows the model to focus more on learning the characteristics of delayed flights. This helps create a more generalizable model that performs well for both classes, rather than simply overfitting to the majority class. It ensures that the model does not always favor the majority (on-time flights), which would result in poor predictive performance for delayed flights.

## Machine Learning Algorithms and Metrics
For this project, we are focusing on predicting **departure delays** where a delay is defined as being 15 minutes or greater past the planned departure time. The prediction will be made at least **two hours before departure** to allow sufficient time for airlines and airports to notify passengers and adjust operations. Our primary stakeholders include airlines, airports, and passengers. This is framed as a **classification problem**, where the target variable is whether a flight will be delayed or not.


### Metrics
Our model is targeted towards helping airline companies predict delays to minimize costs and increase on-time performance performance. Incorrect predictions can lead to the following outcomes for the business:

| <div style='width:150px'> False Positive </div> | <div style='width:290px'> False Negative </div>|
| ----- | ----- |
| Incorrectly identifying flight delays may lead to unneeded resource allocation for flights actually on-time | Delay will be missed from the model and lead to inaction on delays that could've been mitigated. |

While false positives are detrimental, we believe false negatives would be worse as the business would never be notified of a missing flight delay prediction. A predicted delay doesn't require immediate action. Decision makers can be supplied context for triage and feasibility of mitigating the delay before action is taken on the data. Due to these reasons, we will use the following metrics, in relation to the delayed class, to evaluate our models:

**Primary Metric**

1. **F-beta score**:
   $$
   \text{F-beta Score} = ((1 + \beta^2)) \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
   $$
   F-beta scores provide a weighted balance between precision and recall ensuring a balance between optimizing to minimize against false positives vs negatives. Due to our desire to capture more delays at the risk of false positives, we want our f-beta score to prioritize recall. We choose a weight \\(\beta = 2\\) to have the recall metric be weighted twice as heavily as precision.

**Sub-metrics**

2. **Recall**:
   $$
   \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
   $$
   Essential for broadly identifying the delays to minimize unexpected delays. Higher recall may lead to incorrectly identifying flight delays.

3. **Precision**:
   $$
   \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
   $$
   Indicates how many of our predicted delays are actual delays to avoid false alerts to airline companies. Higher precision may lead to missed predictions.


### Models
Our baseline and model iterations to predict departure delays will focus on the following algorithms:

1. **Logistic Regression**:
   - **Implementation**: Using `PySpark`'s `LogisticRegression` class.
   - **Loss Function**: Binary Cross-Entropy Loss:
     $$
     L = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]
     $$
   - **Reasoning**: A simple baseline to understand feature importance and build interpretability.

2. **Decision Tree**:
   - **Implementation**: Using `PySpark`'s `DecisionTreeClassifier`.
   - **Reasoning**: Simple yet powerful model requiring minimal feature engineering while providing high interpretability.

Further experimentation:

3. **Random Forest Classifier**:
   - **Implementation**: Using `PySpark`'s `RandomForestClassifier`.
   - **Feature Importance**: Helps identify critical factors contributing to delays.
   - **Advantage**: Good for capturing non-linear relationships and robust to overfitting with proper tuning while providing high interpretability.

4. **Gradient Boosted Trees**:
   - **Implementation**: Using `PySpark`'s `SparkXGBRegressor`
   - **Loss Function**: Logistic loss for binary classification.
   - **Advantage**: Effective for handling imbalanced data and complex relationships.
   - **Reasoning**: Provides high predictive power while maintaining efficiency.

5. **Neural Network**:
   - **Implementation**: Using `PySpark`'s `MultilayerPerceptronClassifier`
   - **Loss Function**: Binary Cross-Entropy Loss
   - **Reasoning**: High tolerance for noisy data and ability to identify non-linear patterns for accurate classification.

![](path)

### Data Splits and Cross-Valiation

The temporal nature of the on time performance data requires special attention in splitting data during training and testing in order to maintain the time dependency between observations and to prevent data leakage of future values being used to predict the past. We use the following strategies:

1. **Single year (2015 data)** - we use this subset for initial baselines and model testing
   - Train: first 9 months
   - Test: last 3 months

2. **Entire dataset** - we will use a rolling blocked cross-validation strategy to iterate over the entire dataset
   - Train: 15 months to ensure seasonality effects are included while training the model 
   - Test: 6 months
   - Each block will have an overlap of 1 month to ensure continuity between blocks.
   - Example data split:
   
   ![Blocked validation](https://github.com/jasondongmids/mids_w261_final_project/blob/main/ref/Blocked%20Cross-validation.png?raw=true)


## Machine Learning Pipeline

Our current machine learning pipeline and checkpointing strategy is as follows:

![pipeline](https://github.com/jasondongmids/mids_w261_final_project/blob/main/ref/ML%20pipeline2.png?raw=true)

We begin our EDA and feature engineering on a subset of 3-month and 1-year flight data prior to expanding to the entire dataset. To ensure modularity, rapid prototyping, and failsafes against disruptions, we plan these checkpoints up to model training:
- <b> Data Ingestion</b> - Sync various data formats to Parquest for better utilization of distributed resources.
- <b> Data Dictionary and Feature Extraction </b> - Identify scope of data elements for model prediction and data type transformations.
- <b> Feature Engineering </b> - Complete tempoeral transformations, graph transformations, normalization, and derivations as a starting basis for model development and training.
- <b> Model Training </b> - Model architecture, callbacks, early stopping, and model saving will be employed to guard against training disruptions and provide a quick restart to downstream predictions and metric analyses.



## Feature Engineering

To further improve our model we performed feature engineering to clean current metrics as well as generate new metrics that could be useful to the model. .

We conducted categorical feature engineering to convert categorical variables into a format suitable for machine learning models. For features like airline carrier codes, airport codes, and flight origins and destinations, we used String Indexing. String Indexing assigns a numeric index to each category, while One-Hot Encoding creates binary vectors representing each category. These transformations are essential for enabling machine learning models to understand and leverage categorical information. For instance, identifying which airline or airport is associated with higher delays can help us enhance prediction accuracy.

The main feature engineering that was performed was merging on the previous flight information. This was achieved by merging based on the plane’s tail number and airline. Then, the merged values were partitioned and sorted to keep the most recent previous flight information. To avoid data leakage during training, if the previous flight did not leave within two hours of the current flights planned departure time, all previous flight information was masked.

Time is a cyclic feature. After 23:59 it doesn’t go to 24:00 but 00:00. Thus, sinusoidal transformation to the departure time was made to represent the cyclic nature.

### Weather Forecasting

To further enhance the predictive power of our model, we decided to include weather features for our future models. As weather information around the time of actual flight departure will not be available at prediction time, using actual weather data around flight departure time would constitute data leakage. Hence, we utilized Facebook's Prophet time-series model to forecast weather variables such as precipitation, visibility, wind speed, wind gust speed, wet bulb temperature, dry bulb temperature, and dew point temperature, all of which will be used as forecast features for our predictive models. Through our time-series plots, we were able to see that Prophet accurately captured the seasonalities and trends of all the variables.

### PageRank
Flight delays in the airport of origin will have downstream effects for the timely departure of subsequent flights in the destination airports. We decided to use the PageRank algorithm, developed by Google to rank web pages, to measure the influence of airports. In our graph network, we used airports as the nodes, which are then connected if a flight route exists between two airports. We weighted the connection by the number of flights on a given route to obtain the PageRank. Our initial results seen below match our intuition as we can see major airport hubs are the most influential. Certain airports such as JFK are notably missing, potentially due to the fact that we're only focusing on domestic flights routes. We will continue to experiment with the algorithm by adding the distance of a flight route as an additional weighting mechanism to determine an airports PageRank. 


![Pagerank](https://github.com/jasondongmids/mids_w261_final_project/blob/main/ref/Pagerank_11_25.png?raw=true)

## Results and Discussion

#### Logistic Regression Model
###### Train and Test Results for delayed flights with 2015 Flight Data for Class = 1 (Delayed Flight)
| <div style='width:150px'> Features </div> | <div style='width:290px'> F-Beta Score </div> | <div style='width:290px'> Recall </div> | <div style='width:290px'> Precision </div> | 
| ----- | ----- | ----- | ----- |
| Baseline Model <br> (Current Flight Data, 21 Features) | Train: 0.004 <br> Test: 0.012 | Train: 0.003 <br> Test: 0.010 | Train: 0.451 <br> Test: 0.425 | 
| Baseline Model w/ Balanced Data <br> (Current Flight Data, 21 Features) | Train: 0.625 <br> Test: 0.699 | Train: 0.628 <br> Test: 0.759 | Train: 0.612 <br> Test: 0.531 |
| Updated Model <br> (Current Flight Data + Feature Engineering, 27 Features) | Train: 0.667 <br> Test: 0.714 | Train: 0.658 <br> Test: 0.752 | Train: 0.707 <br> Test: 0.593 |


**Features with highest absolute weights:** destination region, hourly wind direction, arrival time previous flight, destination airport type


#### Decision Tree Model
###### Train and Test Results for delayed flights with 2015 Flight Data for Class = 1 (Delayed Flight)
| <div style='width:150px'> Features </div> | <div style='width:290px'> F-Beta Score </div> | <div style='width:290px'> Recall </div> | <div style='width:290px'> Precision </div> |
| ----- | ----- | ----- | ----- |
| Baseline Model <br> (Current Flight Data, 25 Features) | Train: 0.000 <br> Test: 0.000 | Train: 0.000 <br> Test: 0.000 | Train: 0.442 <br> Test: 0.000 |
| Baseline Model w/ Balanced Data <br> (Current Flight Data, 25 Features) | Train: 0.663 <br> Test: 0.712 | Train: 0.676 <br> Test: 0.781 | Train: 0.616 <br> Test: 0.527 |
| Updated Model <br> (Current Flight Data + Feature Engineering, 30 Features) | Train: 0.417 <br> Test: 0.419 | Train: 0.381 <br> Test: 0.386 | Train: 0.671 <br> Test: 0.639 |


**Features used by decision nodes:** date difference between previous flights, departure delay of previous flight, cyclical scheduled departure time

As noted before, we used the first nine months of the 2015 dataset for training and the last three months for testing our baseline models. For both logistic regression and our decision, we saw a notable increase of our F-beta score for label 1 (delayed flights) from our baseline model with no custom features, to undersampling, and finally updated model with test features we've completed. The results indicate that undersampling and the addition of a few basic custom features (previous flight information and treating time as cyclical) had measurable impacts on the model.

Between our models, the decision tree has a slight edge for our undersampled model (Test F-beta: 0.712 vs. 0.699); however, the logistic regression performed much better on our updated model (Test F-beta: 0.714 vs. 0.419). The discrepancy may be due to the shallow depth of our decision tress and the need to also undersample our model with the custom features as decision trees are prone to predicting the majority class, on-time flights, for imbalanced data. This disparity can be seen in the recall vs. precision for our updated decision tree model (Recall: 0.386 vs. Precision: 0.639), which also negatively impacts our F-beta score which is weighted towards recall. Our train and test F-beta scores are consistent with each other indicating that our models are generalizing well and not overfitting on the data.

The initial results are promising and we expect further improvements we add additional custom features including additional flight information, forecasted weather data, and PageRank scores in custom data model. Beyond feature engineering, we anticipate gains in performance as we implement more powerful models such as random forests and perform hyperparameter tuning to optimize predictions.

## Conclusions and Next Steps
To enable airline companies to mitigate costs from flight delays, we have created initial baseline classification models to predict if a flight will be delayed two hours prior to it's expected departure. Initial models showed the importance of the imbalanced data. The logistic regression model with balanced data and no feature engineering will be defined as the baseline model (F-beta on 12 months of data in 2015 was 0.625 for train and 0.699 for test). This will be the metric value we will aim to improve upon.

Our hypothesis is we can further improve initial models based on the results from our advanced EDA, feature engineering, custom data model, and label balancing.

We will continue to refine our model by performing the following:
- Added a PageRank model for airports to determine the most influential airports. This feature is in progress and will be added to the model for the final deliverable.
- Binarization of categorical features with a large number of unique values will be performed to reduce to compact the feature and potentially improve the model performance. 
- Creating a custom joined dataset. Initial modeling was performed on a provided, joined, dataset containing flight and weather information. We are currently in progress of developing a custom joined dataset to improve the features within the supplied dataset, such as missing weather data and previous flight information.
- We will continue to determine if there are other features we could engineer to impact the model.  In addition, model tuning and hyperparameter tuning will be performed.
- Our training and testing has been on flights from 2015. We will expand our data timeframe from 2015 to 2021. This will allow us to extract seasonality effects by expanding across multiple years.
- We have seen the decision tree has performed the best on the one year dataset. We will expand this model into a random forest to determine if we can continue to improve the predictions.
- As requested by the stakeholder we will evaluate a multilayer perceptron (MLP) neural network (NN) model on our data.

### Open Issues and Problems
- EDA on COVID's potential impact on the recent end of our data and the impact to feature engineering and data splitting.
- Further review data leakage

## Team Members

- Jason Dong 
- Nick Gasser
- Sameer Karim
- Anson Quon
- Gilbert Wong
