# Conclusions and next steps

### **How did the model perform in predicting F1 winners?**

The data science model I developed for predicting Formula 1 race winners has performed impressively well in the year 2024. The predictions made by the model for various races indicate a substantial understanding of the dynamics at play in the sport. For instance, our model accurately predicted the winner in the Saudi Arabian race with a confidence of 59%, and the Australian race with a 56% confidence level (e.g. hard predictions), leading to winnings from those events. In the two races it didn't give hard predictions, the two drivers with the highest soft predictions, ended up winning the race. 

Despite the inherent unpredictability in sports like Formula 1, where outcomes can be influenced by a multitude of variables ranging from weather to mechanical reliability, our model has shown resilience and accuracy. The winnings column reveals that when the model was confident enough to suggest a bet, it resulted in positive returns. The performance of the model not only confirms the strength of the predictive algorithms but also the robustness of the underlying data and the efficiency of the processing.

Analyzing the performance financially, the total earnings from the bets placed based on the model's predictions amounted to £92. Considering the initial bets were £200 (£100 each for the Saudi Arabian and Australian races), this represents a 46% return on investment. Such a return is notable, especially when no losses were recorded, which underscores the model's effectiveness in this context. The smart betting strategy employed here—betting only when the model's confidence was high—paid off, demonstrating the model’s potential as a decision-support tool in predictive sports analytics.

### **Next Steps**

Moving forward with the data science project on predicting Formula 1 race winners, we've outlined a roadmap that focuses on refinement and enhancement of our predictive capabilities. 

- 1. **Make the model simpler** Complexity doesn't always equate to better performance, especially in real-time scenarios. We will focus on streamlining the model to ensure that it is not only accurate but also efficient, balancing between the depth of the data and the speed of prediction. Also the model currently has 181 columns, with many one-hot encoded variables. From our feature extraction from our Random Forest SMOTE model, we know what features were the most important to the model in terms of predictive power, so I would like to do a v2 of the model that is simpler (and therefore requires less compute).

- 2. **Create other ML functions to predict pre-race variables** Alongside making it simpler, I would like to create some functions to help predict certain variables that we won't have before a race (e.g. if a driver is likely to crash, or a team is likely to have a mechanical issue), which significantly influence race outcomes. These enhancements are about moving our model to a more proactive, predictive model that can anticipate outcomes before the race begins.

- 3. **Automate Data Collection Pipeline** is crucial for the model to be up-to-date with the latest information. When building the 2024 dataset this required a lot of manual work pulling data from different sites (e.g. F1.com, Wikipedia, & formula1points.com). By automating this process, we ensure that the data fed into the model is fresh and reflective of current dynamics, such as weather conditions, track details, and driver performance.

- 4. **Improve UX for a customer facing application and intergrate with AWS** Finally from my side I would like to create a better user experience for the model, using AWS Amplify. AWS Amplify is a set of tools and services from Amazon Web Services that enables developers to build and deploy full-stack mobile and web applications that are scalable, secure, and integrate with AWS cloud services. AWS Amplify will help me set up a secure and scalable environment for the model and allow me to work on further integrating the data science skills I learned from BrainStation into a production environment. It will also facilitate better user interactions with the model's predictions, making the insights more accessible to stakeholders.

In summary, these next steps are designed to polish our model into a tool that's not only scientifically rigorous but also user-friendly and directly applicable to the dynamic world of Formula 1 racing.