# Will it be delayed?

Everyone who has flown has experienced a delayed or cancelled flight. Both airlines and airports would like to improve their on-time performance and predict when a flight will be delayed or cancelled several days in advance. You are being hired to build a model that can predict if a flight will be delayed. To learn more, you must schedule a meeting with your client (me). To schedule an appointment with your client, send an event request through Google Calendar for a 15 minute meeting. Both you and your project partner must attend the meeting. Come prepared with questions to ask your client. Remember that your client is not a data scientist and you will need to explain things in a way that is easy to understand. Make sure that your communications are efficient, thought out, and not redundant as your client might get frustrated and "fire" you (this only applies to getting information from your client, this does not necessary apply to asking for help with the actual project itself - you should continuously ask questions for getting help).

For this project you must go through most all steps in the checklist. You must write responses for all items as done in the homeworks, however sometimes the item will simply be "does not apply". Keep your progress and thoughts organized in this document and use formatting as appropriate (using markdown to add headers and sub-headers for each major part). Some changes to the checklist:

* Do not do the final part (launching the product).
* Your presentation will be done as information written in this document in a dedicated section (no slides or anything like that). It should include high-level summary of your results (including what you learned about the data, the "accuracy" of your model, what features were important, etc). It should be written for your client, not your professor or teammates. It should include the best summary plots/graphics/data points.
* The models and hyperparameters you should consider during short-listing and fine-tuning will be released at a later time (dependent on how far we get over the next two weeks).
* Data retrieval must be automatic as part of the code (so it can easily be re-run and grab the latest data). Do not commit any data to the repository.
* Your submission must include a pickled final model along with this notebook.

Frame the Problem
=====================================

1. **Define the objective in business terms:** 
    We are being hired by Dr. Jeff "client" Bush who is looking for a ML model that can detect delayed and cancled flights before they occur.
2. **How will your solution be used?** 
    Our model will be used to quicken the regulation process, allowing penalisations to occur faster pressuring airlines to fix their problems sooner, hopefully decreasing delayed/canceled flights in the process.
3. **What are the current solutions/workarounds (if any)?** 
    Currently airlines are being penalized by the Department of Transportation, although this process has led to a decrease in delayed/canceled flights, it is an extremely slow process.
4. **How should you frame this problem?** 
    This is a classification problem since the goal is to predict whether a flight will be delayed or canceled based on historical data. The model will be trained on a labeled dataset where flights are categorized as on-time, delayed, or canceled, allowing it to learn patterns that contribute to disruptions. Since the predictions are intended to assist in regulatory enforcement rather than real time operational decisions, the model does not need to function as an online system. However, given the large volume of flight data, considerations for scalability and batch processing may be necessary. If real time predictions become a future requirement, the architecture may need adjustments to handle streaming data efficiently.
5. **How should performance be measured? Is the performance measure aligned with the business objective?** 
    Performance is measured by precision, making sure that flights are not marked as delayed when they would be on time. In otherwords, avoiding false positives. The performance measure is aligned with the business objective as flights that are falsely marked as delayed could lead to unnecessary ticket refunds.
6. **What would be the minimum performance needed to reach the business objective?** 
    The minimum performance required is 75%, so long as 75% of flights marked as delayed/canceled are marked correctly we would reach our business objective set by the client.
7. **What are comparable problems? Can you reuse (personal or readily available) experience or tools?** 
    We have worked with predicting the status of a certain object through observing patterns and behaviors before with our divvy-data assignment. Looking at correlations of what makes a member, or in this case a delayed flight, will give us a good idea on the best way to determine the chances of a flight being delayed or canceled.
8. **Is human expertise available?** 
    Yes, through our client
9. **How would you solve the problem manually?**
    Manually predicting flight delays and cancellations would involve analyzing historical flight data, identifying key factors such as weather conditions, airline performance, airport congestion, and mechanical issues. A rule-based approach could be used, where past trends inform probability assessments—for example, if an airline has a history of delays on a particular route or adverse weather is forecasted, the likelihood of delay increases. This process would require cross-referencing multiple data sources, such as FAA advisories and airline reports, to validate predictions. While feasible on a small scale, manual prediction is not scalable due to the complexity and volume of data, making machine learning a more efficient solution.
10. **List the assumptions you (or others) have made so far. Verify assumptions if possible.** 
    Flight delays and cancellations are predictable based on historical patterns.
    The available data is accurate and comprehensive although airlines may underreport delays.
    Faster penalization will incentivize airlines to improve service quality.
    Prioritizing precision over recall aligns with business objectives, though recall may also be important (detecting delays more often).
