# 2. F1 Prediction Project - Question driven EDA

# Table of Contents
- [1. Install Libraries & Load Dataset](#1-install-libraries)
- [Question Driven EDA](#question-driven-eda)
- [2.1. Which Teams Win the Most?](#which-teams-win-the-most)
- [2.2. Impact of Time on Number of Wins for Constructors](#22--what-impact-will-time-have-on-number-of-wins-for-constructors)
- [3. Which Engines Win the Most?](#which-engines-win-the-most)
- [4. How Important is Pole Position?](#are-there-certain-races-where-its-more-important-to-be-on-pole-position)
- [5. Impact of Constructor Points on Winning Likelihood](#impact-of-constructor-points-on-winning-likelihood)
- [6. Impact of Driver Points on Winning Likelihood Within Teams](#6-impact-of-driver-points-on-winning-likelihood)
- [7. Likelihood of Winning a Home Race](#7-likelihood-of-winning-a-home-race)
- [8. Conclusion](#conclusion)


# **Introduction**

Embarking on a quest within the high-octane world of Formula 1, this data science project's exploratory phase seeks to decode the myriad factors influencing race outcomes. Through a rigorous exploratory data analysis, we probe deep into a dataset brimming with the sport's storied legacy, tackling questions such as the impact of a team's prestige on victory likelihood, and whether historical success translates into future performance. 

# 1. Install Libraries

In [7]:
# Install libraries
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import plotly.express as px

#### Import CSV

In [2]:
eda_df = pd.read_csv('C:/Users/Alex/OneDrive/BrainStation/Data_Science_Bootcamp/Capstone_Project/capstone-Aboard89/data/data_analysis.csv')

In [3]:
eda_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11235 entries, 0 to 11234
Data columns (total 36 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Index                             11235 non-null  int64  
 1   resultId                          11235 non-null  int64  
 2   raceId                            11235 non-null  int64  
 3   year                              11235 non-null  int64  
 4   race                              11235 non-null  object 
 5   country                           11235 non-null  object 
 6   nationality_of_circuit            11235 non-null  object 
 7   driverId                          11235 non-null  int64  
 8   number                            11235 non-null  int64  
 9   driver_name                       11235 non-null  object 
 10  F2_champion                       11235 non-null  int64  
 11  Former_F1_World_Champion          11235 non-null  int64  
 12  Nati

# Question Driven EDA

2. #### **Which teams win the most?**

As part of the project, I am interested in identifying what variables make the biggest impact to winning races. To start with I want to look at which Formula 1 teams have the most race victories within my dataset. I want to explore if racing for a certain team increases the odds of winning a grand prix, due to factors like; a) organisational prowess - certain teams that are used to winning may have better insights into what produces a race winning car/driver, b) some teams may find it easier to attract human talent - e.g. Ferrari is often see as one of (if not the most prestgious) team to work for. It will be interesting to see if that can lead to increase %'s of winning a race. 

However, this approach introduces potential risks, such as class imbalance—a scenario where our dataset might be skewed towards a handful of teams, thus biasing our model. This imbalance can hinder the model's ability to accurately predict outcomes for teams with fewer victories, thereby limiting the model's overall effectiveness and applicability across the diverse spectrum of race conditions and team strategies.

In [10]:
import plotly.express as px

# Assuming eda_df is your DataFrame and it's already defined

# Summarize wins per team
wins_per_team = eda_df[eda_df['race_win'] == 1].groupby('constructor')['race_win'].count().reset_index()
wins_per_team.columns = ['Team', 'Wins']

# Calculate total wins for percentage calculation
total_wins = wins_per_team['Wins'].sum()

# Calculate the percentage of total wins for each team
wins_per_team['Percentage'] = (wins_per_team['Wins'] / total_wins * 100).round(2)

# Sort the teams by number of wins in descending order
wins_per_team = wins_per_team.sort_values(by='Wins', ascending=False)

# Create a bar chart
fig = px.bar(wins_per_team, x='Team', y='Wins',
             title="Wins per Team",
             labels={"Wins": "Number of Wins"},
             color='Wins', # Color the bars by the number of wins
             template="plotly_dark", # Using a dark theme for better visibility
             text='Percentage') # Add percentage text to each bar

# Update layout for a more informative x-axis
fig.update_layout(xaxis={'categoryorder':'total descending'})

# Position the text on top of the bars
fig.update_traces(texttemplate='%{text}%', textposition='outside')

# Optional: Adjust y-axis to fit text
fig.update_layout(yaxis=dict(title='Number of Wins', range=[0, max(wins_per_team['Wins'])*1.2]))

# Show the plot
fig.show()




The bar chart of Formula 1 team wins reveals pivotal insights for predictive analysis: top teams like Ferrari, Mercedes, and Red Bull have historically outperformed others in our dataset. While past success might help us understand potential future performance (e.g. top teams might be able to recruit the best drivers & engineers, have more funding, etc), it will be crucial to assess the statistical significance of team success as a predictive feature. 

#### **2.2 : What impact will time have on number of wins for constructors?**



As we delve deeper into our analysis of Formula 1 championship outcomes, a compelling question arises: What impact does time have on the teams that clinch the titles? The fluid dynamics of drivers and team personnel transitioning between seasons, coupled with significant regulation changes, undeniably influence the ebb and flow of dominance within the sport. 

Consider, for instance, how a team adeptly adapting to a fresh set of regulations could spearhead a new era of supremacy—a pattern currently exemplified by Red Bull's recent ascendency (since the 2022 F1 regulation changes). This temporal factor is a facet that beckons further exploration, particularly how our predictive model accounts for such shifts in performance. Unpacking the interplay of these variables will undoubtedly enrich our study, offering insights into the transient nature of success in the high-octane world of Formula 1 racing. As we progress, unpacking the layers of time's influence on team victories stands as a fascinating avenue for our investigation.

In [11]:
import plotly.express as px

# Summarize wins per team by year
wins_per_team_yearly = eda_df[eda_df['race_win'] == 1].groupby(['constructor', 'year'])['race_win'].count().reset_index()
wins_per_team_yearly.columns = ['Team', 'Year', 'Wins']

# Sort the teams by number of wins in descending order for each year
wins_per_team_yearly = wins_per_team_yearly.sort_values(by=['Year', 'Wins'], ascending=[True, False])

# Create a bar chart with a slider for year using Plotly Express
fig = px.bar(wins_per_team_yearly, x='Team', y='Wins',
             animation_frame='Year',
             range_y=[0, wins_per_team_yearly['Wins'].max()],
             title="Wins per Team Over Years",
             labels={"Wins": "Number of Wins", "Team": "Team"},
             color='Wins', # Color the bars by the number of wins
             template="plotly_dark") # Using a dark theme for better visibility

# Update layout for a more informative visualization
fig.update_layout(xaxis={'categoryorder':'total descending'}, yaxis=dict(type='linear'))

# Show the plot
fig.show()


- **1990's** - The sequence of charts depicting F1 wins from 1995 to 1999 unfolds a narrative of shifting dominance among top racing teams. In 1995, we witness Benetton at the pinnacle of success, capturing the highest number of wins. As time marches on to 1996, the tide turns in favor of Williams, which eclipses other teams with a commanding lead in victories. The following year, 1997, sees the emergence of a more balanced competition with Williams and Ferrari vying closely for supremacy.By 1998, a new challenger ascends; McLaren steps into the limelight, matching Ferrari win for win, signaling a shift in the competitive landscape. This trend of tight competition extends into 1999, with McLaren and Ferrari neck and neck, showcasing the sport's dynamism and the ever-evolving prowess of these engineering and strategic powerhouses.

- **2000's** Between 2000 and 2009, the Formula 1 landscape underwent a significant transformation. At the turn of the millennium, Ferrari reemerged as the dominant force, with their wins peaking in the early 2000s, signifying a golden era. McLaren and Williams, former titans of the 90s, saw their influence wane, occasionally breaking through but never quite matching Ferrari's prowess. Mid-decade, a fresh challenger in Renault disrupted the hierarchy, claiming significant victories and titles. As the decade drew to a close, the once-mighty Ferrari faced a new wave of competitors, with Brawn GP and Red Bull Racing securing crucial wins. This period marked a shift in the competitive dynamics of Formula 1, from Ferrari’s singular dominance to a more varied and unpredictable contest for supremacy, laying the foundation for the next generation of F1 excellence.

- **2010's** The visuals from 2010 to 2019 in Formula 1 depict a tale of evolving competitiveness and the rise of new eras. The early 2010s continued to showcase the rivalry between established teams like Ferrari and the surging Red Bull, which clinched multiple titles under Sebastian Vettel. Mercedes then rose to prominence, heralding a new epoch of dominance with Lewis Hamilton's series of victories, illustrating the dynamic shifts in team performances and strategies. 

- **2020's** In this period, the charts indicate a notable trend where Red Bull, capitalized on the new regulations changes that came into effect for the 2022 season, which has led them to dominate the 2022 and 2023 seasons. Their success is a testament to the team's agility in adapting to change and innovation, reflecting the ever-evolving nature of the sport where strategic foresight can turn into a winning streak. 

The charts capture the essence of F1's competitive spirit, where technological advancements and team dynamics lead to shifts in power, underscoring the relentless pursuit of excellence in this high-stakes sport.

3. #### **Which engines win the most?**

The next question I wanted to explore was, "Which engine supplier wins the most?". The distinction between manufacturing teams like Mercedes, Ferrari, and Renault (Alpine), who design their own power units & engines, and "works" teams such as Aston Martin (Mercedes), McLaren (Mercedes), and Haas (Ferrari), who harness the prowess of these manufacturers' engines, is fascinating. It poses an interesting query for our predictive models—what impact does possessing a particular type of engine have on a team's odds of winning a race? The nuanced relationship between the engine supplier and team performance is something our analysis must account for, as it could be a significant determinant in the alchemy of victory within F1 racing.

In [14]:
import plotly.express as px

# Aggregate the number of wins by engine type
engine_wins = eda_df[eda_df['race_win'] == 1].groupby('engine_manufacturer')['race_win'].count().reset_index()
engine_wins.columns = ['engine_manufacturer', 'Wins']

# Sort the engine types by number of wins in descending order
engine_wins = engine_wins.sort_values(by='Wins', ascending=False)

# Create a bar chart using Plotly Express
fig = px.bar(engine_wins, x='engine_manufacturer', y='Wins', title='Wins per Engine Manufacturer',
             labels={'Wins':'Number of Wins', 'Engine Manufacturer':'Engine Manufacturer'}, color='Wins', template="plotly_dark")

# Update layout for a more informative visualization
fig.update_layout(xaxis={'categoryorder':'total descending'}, yaxis=dict(type='linear'))

# Show the plot
fig.show()


This chart offers a fascinating look into the dominance of engine manufacturers in Formula 1 over an unspecified period, crucial for understanding trends that could influence the outcome of races. Mercedes stands out with the highest number of wins, indicating their engines might give teams an edge. Ferrari and Renault follow, suggesting they are also competitive, while the presence of, Honda, Red Bull, BMW, Mugen-Honda, and Ford illustrates a more diverse field of engine suppliers with victories. Analyzing these patterns helps us predict which manufacturers could contribute to a team's success, as a superior engine often translates into a critical advantage on the track. This information is invaluable for our predictive modeling, as it underscores the significant role of engine performance in racing triumphs.

In [26]:
import plotly.express as px

# Assuming your dataframe is named eda_df, and replacing it with the loaded dataframe name 'df'
# Summarize wins per engine type
engine_wins = eda_df[eda_df['race_win'] == 1].groupby(['engine_manufacturer', 'year'])['race_win'].count().reset_index()
engine_wins.columns = ['engine_manufacturer', 'Year', 'Wins']

# Sort the engine types by number of wins in descending order for each year
engine_wins = engine_wins.sort_values(by=['Year', 'Wins'], ascending=[True, False])

# Create a bar chart with a slider for year using Plotly Express
fig = px.bar(engine_wins, x='engine_manufacturer', y='Wins',
             animation_frame='Year',
             range_y=[0, engine_wins['Wins'].max()],
             title='Wins per Engine Type Over Years',
             labels={'Wins':'Number of Wins', 'Engine Manufacturer':'Engine Manufacturer'},
             color='Wins',
             template="plotly_dark")

# Update layout for a more informative visualization
fig.update_layout(xaxis={'categoryorder':'total descending'}, yaxis=dict(type='linear'))

# Show the plot
fig.show()


The main insight from this chart is the apparent dominance of certain engine manufacturers, with Ferrari leading significantly in the 2000's, followed by a period of dominance for Mercedes in the 2010's. This suggests that the engine is a crucial factor in a team's success. For our data science project on predicting F1 Grand Prix winners, this indicates that engine manufacturer should be considered a key feature in our predictive models. We should, however, be cautious about overfitting to historical data, as regulations and technological developments can alter the competitive landscape over time.

#### **4. How important is pole position?**
Visualize: Compare win rates from pole positions versus other starting positions.
Test: Perform hypothesis testing or logistic regression analysis.

In [31]:
import plotly.express as px
import plotly.graph_objs as go

# Calculate win rates for each grid position
win_rates = []
for position in range(1, 11):
    wins = eda_df[(eda_df['starting_grid_position'] == position) & (eda_df['race_win'] == 1)].shape[0]
    total_starts = eda_df[eda_df['starting_grid_position'] == position].shape[0]
    win_rate = wins / total_starts if total_starts > 0 else 0
    win_rates.append(win_rate)

# Calculate win rates for positions 10+
grouped_wins = eda_df[(eda_df['starting_grid_position'] >= 10) & (eda_df['race_win'] == 1)].shape[0]
grouped_total_starts = eda_df[eda_df['starting_grid_position'] >= 10].shape[0]
grouped_win_rate = grouped_wins / grouped_total_starts if grouped_total_starts > 0 else 0
win_rates.append(grouped_win_rate)

# Position labels
positions = [str(i) for i in range(1, 10)] + ['10+']

# Create a bar chart using Plotly to visualize win rates
fig = go.Figure(data=[
    go.Bar(
        x=positions,
        y=win_rates,
        text=[f'{rate:.2%}' for rate in win_rates],
        textposition='auto'
    )
])

# Update the layout for a clear visualization
fig.update_layout(
    title='Win Rates by Starting Grid Position',
    xaxis_title='Starting Grid Position',
    yaxis_title='Win Rate',
    yaxis=dict(tickformat=".2%"),
    template="plotly_dark"
)

# Show the plot
fig.show()


This chart compellingly illustrates the impact of starting grid positions on winning rates in Formula 1 races. It shows a stark decline in win rates as the starting position moves away from the pole, with over 40% of wins coming from the pole position (1st place) itself. This sharply contrasts with the subsequent positions, which all have significantly lower win rates, emphasizing the pole position's advantage. The data clearly suggests that starting at the front of the grid significantly increases a driver's chances of winning, likely due to the clear track ahead allowing for an uncontested race pace. Understanding these trends is key in our predictive models, as the starting grid position could be a substantial predictor of race outcomes.

### Are there certain races, where it's more important to be on pole position?

In [42]:
import plotly.express as px

# Identify the top 10 most common Grand Prix
top_grand_prix = eda_df['race'].value_counts().head(10).index

# Filter the dataset for races that are in the top 10 most common Grand Prix
top_grand_prix_df = eda_df[eda_df['race'].isin(top_grand_prix)]

# Calculate the percentage of wins from pole position for each of the top Grand Prix
pole_position_wins = top_grand_prix_df[top_grand_prix_df['starting_grid_position'] == 1].groupby('race')['race_win'].sum()
total_races = top_grand_prix_df.groupby('race')['race'].count()
pole_position_win_rates = (pole_position_wins / total_races * 100).reset_index()
pole_position_win_rates.columns = ['Grand Prix', 'Win Rate from Pole Position']

# Sort the win rates in descending order
pole_position_win_rates = round(pole_position_win_rates.sort_values(by='Win Rate from Pole Position', ascending=False),2)

# Create a bar chart using Plotly to visualize the win rates from pole position for each Grand Prix
fig = px.bar(pole_position_win_rates, x='Grand Prix', y='Win Rate from Pole Position',
             title="Percentage of Wins from Pole Position for Top 10 Grand Prix",
             labels={'Win Rate from Pole Position': 'Win Rate (%)'},
             text='Win Rate from Pole Position')

# Update the layout for a clear visualization
fig.update_layout(
    xaxis_title='Grand Prix',
    yaxis_title='Percentage of Wins from Pole Position',
    yaxis=dict(tickformat=".2f"),
    template="plotly_dark"
)

# Show the plot
fig.show()


This chart presents a clear visualization of how pole position influences race outcomes across different Grand Prix. It's evident that the Spanish, Italian and Japanese Grand Prix see a higher percentage of wins from the pole position, suggesting that the track layout or other factors in these locations might give the leading starter a more pronounced advantage. Conversely, races like the British and Belgian Grand Prix show a lower reliance on pole position for a win, indicating that these tracks may allow for more overtaking or that strategy and car performance play a more significant role. This analysis is invaluable as it directs our modeling efforts towards considering the unique characteristics of each circuit and their impact on race strategy and outcomes.

5. #### **Impact of constructor points on winning likelihood**
Visualize: Scatter plot of constructor points vs. win rates.
Test: Correlation analysis or regression modeling.

In [39]:
import plotly.express as px
import pandas as pd

# Assuming 'eda_df' is the DataFrame containing our data

# First, we need to calculate the win rate and average points per constructor
constructor_summary = eda_df.groupby('constructor').agg(
    Total_Wins=('race_win', 'sum'),
    Total_Races=('raceId', 'count'),
    Total_Points=('points', 'sum')
).reset_index()

constructor_summary['Win_Rate'] = constructor_summary['Total_Wins'] / constructor_summary['Total_Races']
constructor_summary['Avg_Points'] = constructor_summary['Total_Points'] / constructor_summary['Total_Races']

# Now, let's create the scatter plot with Plotly Express
fig = px.scatter(constructor_summary, x='Avg_Points', y='Win_Rate', color='constructor',
                 title='Impact of Constructor Points on Winning Likelihood',
                 labels={'Win_Rate': 'Win Rate', 'Avg_Points': 'Average Points'},
                 hover_data=['constructor'])

# Show the plot
fig.show()


This scatter plot reveals the relationship between the average points scored by constructors and their win rates. Each dot represents a different constructor and its position on the plot indicates the average points they have achieved and their corresponding win rate. There isn't a postive correlation visible in this data; some constructors with fewer average points still achieve higher win rates, and vice versa. This could suggest that while average points are an indicator of a team's competitiveness, they don't necessarily predict race wins, which could be influenced by a variety of factors like race conditions, driver skill, and strategy. The spread of data points emphasizes the complexity of predicting F1 race winners and indicates that our model should include more nuanced or diverse data to better capture the variables at play, although this chart could be useful for helping us with feature engineering (e.g. creating a variable of cumulative points in the season, to assess chances of winning).

#### 6. **Impact of driver points on winning likelihood**
Visualize: Plot driver points against win rates for drivers within the same team.
Test: Use paired comparisons or time-series analysis if data is over multiple seasons.

In [4]:
import plotly.express as px

# Assuming eda_df is your dataframe with the relevant columns.
# Calculating win rates and average points per driver within each team
driver_stats = eda_df.groupby(['driverId', 'constructor'])[['race_win', 'points']].agg({'race_win': 'mean', 'points': 'mean'}).reset_index()
driver_stats.columns = ['Driver ID', 'Team', 'Win Rate', 'Average Points']

# Normalizing win rate to be a percentage
driver_stats['Win Rate'] = driver_stats['Win Rate'] * 100

# Rounding to two decimal points
driver_stats['Win Rate'] = driver_stats['Win Rate'].round(2)

# Now we can create the scatter plot
fig = px.scatter(driver_stats, x='Average Points', y='Win Rate', color='Team', hover_data=['Driver ID'])

# Adding titles and labels
fig.update_layout(
    title='Impact of Driver Points on Winning Likelihood Within Teams',
    xaxis_title='Average Driver Points',
    yaxis_title='Win Rate (%)',
)

# Show the plot
fig.show()


This scatter plot charts the average points earned by drivers within each team against their win rates, providing insight into individual driver performance and its impact on the team's success. The spread of points indicates varying levels of correlation between a driver's average points and their likelihood of winning races. Some teams show a cluster of drivers with higher average points correlating to higher win rates, suggesting a strong team performance. Conversely, other teams display a wider spread, with some drivers achieving high win rates without a correspondingly high average point score, pointing towards individual driver skill and strategy playing a significant role in race outcomes. This visualization underscores the importance of considering driver-specific data, alongside team performance, to enhance the accuracy of our race winner predictions.

#### **7. Likelihood of winning a home race**
Visualize: Compare home wins to non-home wins for drivers.
Test: Analyze whether being a home race is a significant predictor of winning.

In [8]:
import plotly.express as px
import pandas as pd

# Assuming 'eda_df' is the DataFrame containing our data

# Calculate wins and total races for each driver by home or non-home race
driver_stats = eda_df.groupby(['driverId', 'driver_name', 'home_race']).agg(
    Total_Wins=('race_win', 'sum'),
    Total_Races=('race_win', 'count')
).reset_index()

# Pivot this data to have one row per driver with columns for home wins, non-home wins,
# home races, and non-home races
driver_pivot = driver_stats.pivot(index=['driverId', 'driver_name'], columns='home_race', values=['Total_Wins', 'Total_Races']).reset_index()

# Flatten the columns after pivoting
driver_pivot.columns = ['_'.join(map(str, col)).strip() if isinstance(col, tuple) else col for col in driver_pivot.columns.values]

# Replace NaN with 0 for drivers who have never participated in a home/non-home race
driver_pivot.fillna(0, inplace=True)

# Calculate win rate as (Total_Wins / Total_Races) * 100 for home and non-home
driver_pivot['Home_Win_Rate'] = (driver_pivot['Total_Wins_1'] / driver_pivot['Total_Races_1'] * 100).round(2)
driver_pivot['Non_Home_Win_Rate'] = (driver_pivot['Total_Wins_0'] / driver_pivot['Total_Races_0'] * 100).round(2)

# Replace NaN with 0 for win rate calculations where no races occurred
driver_pivot['Home_Win_Rate'].fillna(0, inplace=True)
driver_pivot['Non_Home_Win_Rate'].fillna(0, inplace=True)

# Calculate total participation for sorting
driver_pivot['Total_Participation'] = driver_pivot['Total_Races_1'] + driver_pivot['Total_Races_0']

# Sort by total participation and select the top 10
top_drivers = driver_pivot.sort_values(by='Total_Participation', ascending=False).head(10)

# Create a grouped bar chart for home vs non-home win rates
fig = px.bar(top_drivers, x='driver_name_', y=['Home_Win_Rate', 'Non_Home_Win_Rate'],
             title='Home Win Rate vs Non-Home Win Rate for Top 10 Drivers',
             labels={'value': 'Win Rate (%)', 'variable': 'Race Type'},
             barmode='group')

# Update x-axis to show driver names without overlapping
fig.update_xaxes(tickangle=-45)

# Show the plot
fig.show()


This chart illustrates the comparison between the home and non-home win rates for the top 10 Formula 1 drivers in the dataset. The blue bars represent each driver's win rate for races held in their home country, while the red bars show their win rate for races held outside their home country. This analysis can provide insights into whether being on home turf provides any advantage to drivers, which could be a significant factor in predicting race outcomes. The differences in win rates across drivers can also suggest the level of comfort and performance consistency in various racing environments. For example, it looks like Lewis Hamilton is very comfortable and performs very well at his home Grand Prix, whilst someone like Sergio Perez has not performed as well at his home race (maybe due to the large pressure he feels to perform in front of his home fans).

# 8. Conclusion

In our exploratory data analysis, we delved into the vibrant and intricate tapestry of Formula 1 racing to unravel the variables that most significantly affect race outcomes. We observed that the prestige and historical success of a team, represented by the number of wins, indeed correlate with the likelihood of winning, though this relationship is nuanced by the sport's inherent class imbalances. For example, the dominance of top teams like Ferrari, Mercedes, and Red Bull could be indicative of their organizational strength, capacity to attract talent, and financial resources.

We also considered the impact of time on team victories, noting how regulatory changes and team dynamics have shifted the landscape of F1 racing. Red Bull's ascension with the 2022 regulation changes underlines the importance of a team's ability to adapt and innovate. Additionally, the importance of engine manufacturers was highlighted, with the performance of teams often tethered to the prowess of their engine suppliers.

Moreover, the analysis revealed the critical role of pole position, with a stark decline in win rates as starting positions move down the grid. This underlines the strategic advantage of leading the pack and having a clear track ahead. However, the effect of pole position on race outcomes is not uniform across all circuits, suggesting the need for our predictive models to account for circuit-specific characteristics.

The EDA also shed light on the relationship between constructor and driver points with win rates, challenging the assumption that higher average points necessarily predict victories. This complexity reminds us that our predictive model must embrace a multifaceted approach, considering a range of variables including team strategy, driver skill, and the particularities of each race.

Lastly, the comparison of home versus non-home win rates for drivers suggested a 'home advantage' for some, which could influence our predictive modeling. The EDA's insights arm us with a deeper understanding of the sport's dynamics, which we can leverage to fine-tune our dataset & features (e.g. build features around constructor and driver points throughout the season), striving for a model that not only forecasts race winners with enhanced accuracy but also reflects the vibrant, ever-changing nature of Formula 1 racing.