# 🏎️ F1 Insights Dashboard: Exploring Race Dynamics and Driver Performance

Welcome to the Formula 1 Insights Dashboard project! This notebook explores race dynamics, driver performance, constructor trends, and historical patterns in Formula 1 racing using publicly available data.

**Goals:**
- Analyze career points, wins, and seasonal performance.
- Visualize constructor and driver dominance.
- Explore geographical and team-level distributions.

---

### 📦 Importing Required Libraries

We begin by importing the necessary Python libraries for data manipulation and visualization:

- **pandas**: For working with tabular data and performing data wrangling operations.
- **matplotlib.pyplot**: For creating static, basic visualizations.
- **seaborn**: Built on top of Matplotlib, it provides a high-level interface for attractive and informative statistical graphics.
- **plotly.express**: For creating interactive and dynamic visualizations, ideal for dashboards.

These libraries form the backbone of our analysis and visual storytelling.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

### 📥 Loading Raw Data

We load the core datasets required for our analysis from the `data/raw/` directory. These CSV files were manually downloaded from Kaggle due to API limitations.

The datasets include:

- `driver_standings.csv`: Points and ranks of drivers per race.
- `drivers.csv`: Metadata about F1 drivers (names, nationalities, etc.).
- `constructors.csv`: Constructor (team) information.
- `races.csv`: Race-level details including the year, round, and name.
- `results.csv`: Detailed race results including driver and constructor performances.

Each dataset plays a key role in shaping insights for different perspectives (driver, constructor, season).

In [27]:
driver_standings = pd.read_csv("../data/raw/driver_standings.csv")
drivers = pd.read_csv("../data/raw/drivers.csv")
constructors = pd.read_csv("../data/raw/constructors.csv")
races = pd.read_csv("../data/raw/races.csv")
results = pd.read_csv("../data/raw/results.csv")

### 🔗 Merging Datasets for Enriched Context

To perform meaningful analysis, we combine multiple datasets into a unified DataFrame called `merged_df`. This enriches the raw `driver_standings` data with contextual information from other tables:

- 🔄 **Step 1**: Merge with `results.csv` to get `constructorId` for each driver-race pair.
- 👤 **Step 2**: Merge with `drivers.csv` to attach driver surnames and nationalities.
- 🏎️ **Step 3**: Merge with `constructors.csv` to map constructor/team names.
- 🏁 **Step 4**: Merge with `races.csv` to extract race names and years.

We then select only the relevant columns for our analysis and rename them for clarity:
- `name` → `race_name`
- `surname` → `driver`
- `constructorRef` → `team`

In [28]:
# Merge driver_standings with results to get constructorId
merged_df = pd.merge(driver_standings, results[['raceId', 'driverId', 'constructorId']], on=['raceId', 'driverId'], how='left')

# Merge with drivers for driver name and nationality
merged_df = pd.merge(merged_df, drivers[['driverId', 'surname', 'nationality']], on='driverId', how='left')

# Merge with constructors for team name
merged_df = pd.merge(merged_df, constructors[['constructorId', 'constructorRef']], on='constructorId', how='left')

# Merge with races for year and race name
merged_df = pd.merge(merged_df, races[['raceId', 'year', 'name']], on='raceId', how='left')

# Final column selection and renaming
merged_df = merged_df[['year', 'name', 'position', 'points', 'wins', 'surname', 'nationality', 'constructorRef']]
merged_df.rename(columns={
    'name': 'race_name',
    'surname': 'driver',
    'constructorRef': 'team'
}, inplace=True)

### 🧹 Data Cleaning: Ensuring Numerical Consistency

Before proceeding with analysis, it's important to ensure that the `points` column is correctly treated as numeric data:

- ✅ Convert the `points` column to numeric using `pd.to_numeric`, coercing any invalid entries into `NaN`.
- 🧼 Drop rows where `points` could not be converted, as those are not useful for numerical analysis.

In [29]:
# Ensure 'points' is float
merged_df['points'] = pd.to_numeric(merged_df['points'], errors='coerce')
merged_df.dropna(subset=['points'], inplace=True)

### 🏆 Career Points Analysis: Top 10 Drivers

In this section, we identify the top 10 drivers based on **total career points**, considering only **season-ending standings**:

- We first aggregate the final points per driver per season by grouping and taking the maximum `points` (to avoid duplicate entries from multiple races).
- Then, we compute total career points per driver.
- Finally, we visualize the top 10 drivers using a bar plot with Plotly.

This gives us an overall view of driver consistency and performance over the years.

In [35]:
# Deduplicate: Get only the final standing per driver per season
final_standings = merged_df.groupby(['year', 'driver']).agg({
    'points': 'max',
    'team': 'first'
}).reset_index()

# Total career points
career_points = final_standings.groupby('driver')['points'].sum().reset_index()
career_points = career_points.sort_values(by='points', ascending=False).head(10)

# Plot
import plotly.express as px

fig = px.bar(
    career_points,
    x='driver',
    y='points',
    title='Top 10 Drivers by Career Points (Season-End Only)',
    labels={'points': 'Career Points', 'driver': 'Driver'},
    template='plotly_dark',
    color='driver'
)
fig.show()

### 📈 Season-wise Performance of Top 10 Drivers

Now that we've identified the top 10 drivers by career points, we take a closer look at how their **performance varied season by season**.

- We filter the dataset to include only the top 10 drivers.
- For each year, we plot their end-of-season point totals.
- The line plot shows how consistently each driver has performed and highlights dominant seasons.

This helps us visualize career trajectories, rivalries, and peak performance years.

In [36]:
top_driver_names = career_points['driver'].tolist()

# Filter only top 10 drivers
seasonal_df = final_standings[final_standings['driver'].isin(top_driver_names)]

# Line plot
fig = px.line(
    seasonal_df,
    x='year',
    y='points',
    color='driver',
    title='Season-wise Performance of Top 10 Drivers',
    labels={'points': 'Points', 'year': 'Year'},
    markers=True,
    template='plotly_dark'
)
fig.show()

### 🏎️ Top 10 Constructors by Total Points

In this section, we analyze constructor performance by summing up the **total points earned by each team** over the seasons.

- We group the data by constructor (`team`) and compute total points.
- The bar chart displays the **top 10 teams** based on cumulative points scored.

This visualization highlights the most dominant and successful teams in Formula 1 history.

In [37]:
top_teams = final_standings.groupby('team')['points'].sum().reset_index().sort_values(by='points', ascending=False).head(10)

fig = px.bar(
    top_teams,
    x='team',
    y='points',
    title='Top 10 Constructors by Total Points',
    labels={'points': 'Total Points'},
    template='plotly_dark',
    color='team'
)
fig.show()

### 🏁 Yearly Champion Drivers (Wins Count)

This section focuses on identifying the **race winners** for each season.

- We filter races where the driver finished in **position 1**.
- Then, we count how many races each driver won per year.
- The resulting bar chart shows **which drivers dominated specific seasons** based on win count.

This helps us understand which drivers were consistently outperforming others across different years.

In [38]:
season_winners = merged_df[merged_df['position'] == 1]
winners_per_season = season_winners.groupby(['year', 'driver']).size().reset_index(name='wins')

fig = px.bar(
    winners_per_season,
    x='year',
    y='wins',
    color='driver',
    title='Yearly Champion Driver (Wins Count)',
    template='plotly_dark'
)
fig.show()

### 📦 Points Distribution per Team

This box plot visualizes the **spread of points earned by drivers within each team** across all seasons.

- The box shows the **interquartile range (IQR)** of driver points for each team.
- Outliers and variability highlight how **balanced or dependent** teams are on specific drivers.
- Teams with consistent performance will show **narrower spreads**, while those with fluctuations or one-star drivers will show **wider distributions**.

This helps analyze **intra-team consistency** and overall team depth.

In [39]:
fig = px.box(
    final_standings,
    x='team',
    y='points',
    title='Points Distribution per Team',
    template='plotly_dark'
)
fig.show()

### 🌳 Driver Contribution to Team Points

This treemap breaks down the **total points contributed by each driver to their respective teams** across seasons.

- The size of each rectangle represents the **total points**.
- The hierarchy `team → driver` shows **how much each driver contributed within a team**.
- It's a great way to **visualize team dependencies**—highlighting if a team relied heavily on a single driver or had more balanced contributions.

This offers a quick overview of **driver impact within teams**.

In [40]:
fig = px.treemap(
    final_standings,
    path=['team', 'driver'],
    values='points',
    title='Driver Contribution to Team Points',
    template='plotly_dark'
)
fig.show()

### 🏆 Career Points & Season-Wise Performance of Top Drivers

To analyze long-term performance, we first aggregate total points earned by each driver at the end of every season. We then identify the top 10 drivers based on their career points.

These top drivers are used in subsequent analyses such as season-wise performance trends, nationality distribution, and team representation. This sets the foundation for deeper insights into F1 legends and consistent performers.

In [None]:
career_points = final_standings.groupby('driver')['points'].sum().reset_index()
career_points = career_points.sort_values(by='points', ascending=False).head(10)

top_driver_names = career_points['driver'].tolist()
driver_seasonal = final_standings[final_standings['driver'].isin(top_driver_names)]

### 🏁 Race Winners: Constructors and Drivers

This section focuses on race-winning performances. 

- We start by filtering the results to only include **race winners** (i.e., position = 1).
- Then, we merge in **constructor**, **driver**, and **race** information to build a comprehensive dataset of winners.
- We analyze:
  - 🛠️ **Top Constructors** based on total number of wins.
  - 👨‍✈️ **Driver Wins by Year** to see which drivers dominated each season.

This helps uncover which teams and drivers were consistently strong throughout F1 history.

In [77]:
results = pd.read_csv("../data/raw/results.csv")
constructors = pd.read_csv("../data/raw/constructors.csv")

# Convert position to numeric
results['position'] = pd.to_numeric(results['position'], errors='coerce')

# Filter for wins
wins = results[results['position'] == 1.0]

# Merge with constructors
wins = pd.merge(wins, constructors[['constructorId', 'constructorRef']], on='constructorId', how='left')

# Count wins per constructor
constructor_wins = wins.groupby('constructorRef').size().reset_index(name='wins')
constructor_wins.rename(columns={'constructorRef': 'team'}, inplace=True)
constructor_wins = constructor_wins.sort_values(by='wins', ascending=False).head(10)

drivers = pd.read_csv("../data/raw/drivers.csv")
races = pd.read_csv("../data/raw/races.csv")

# Merge for year and surname
wins = pd.merge(wins, drivers[['driverId', 'surname']], on='driverId', how='left')
wins = pd.merge(wins, races[['raceId', 'year']], on='raceId', how='left')

# Group wins per driver per year
driver_wins = wins.groupby(['year', 'surname']).size().reset_index(name='wins')
driver_wins.rename(columns={'surname': 'driver'}, inplace=True)

### 🌍 Nationality & Team Distribution of Top Drivers

Using the previously identified top 10 drivers, we now analyze two categorical distributions:

- 🏳️ **Driver Nationalities**: Which countries produced the most top-tier F1 talent?
- 🏎️ **Team Representation**: How many teams did these top drivers race for across their careers?

This offers a broader perspective on global talent distribution and the teams that played a key role in driver success.

In [76]:
nationality_share = merged_df[merged_df['driver'].isin(top_driver_names)]
nationality_share = nationality_share.groupby('nationality').size().reset_index(name='count')
nationality_share = nationality_share.sort_values(by='count', ascending=False)

team_distribution = merged_df[merged_df['driver'].isin(top_driver_names)]
team_distribution = team_distribution.groupby('team').size().reset_index(name='count')

### 📊 Final Interactive Dashboard: Consolidated Race & Driver Insights

To conclude the exploratory analysis, we present a unified interactive dashboard using Plotly’s `make_subplots`. This visual summary combines six key perspectives:

1. **Top 10 Drivers by Career Points** – Bar chart
2. **Season-wise Points Trend for Top Drivers** – Line chart
3. **Top 10 Constructors by Total Wins** – Bar chart
4. **Wins Over the Years (Top Drivers)** – Line chart
5. **Driver Nationalities (Top 10)** – Pie chart
6. **Team Representation Among Top 10 Drivers** – Pie chart

This comprehensive layout offers quick, interactive insights into both performance and diversity metrics across F1 history.

In [75]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Step 1: Create subplot layout
from plotly.subplots import make_subplots

fig = make_subplots(
    rows=3, cols=2,
    subplot_titles=(
        "Top 10 Drivers by Career Points",
        "Season-wise Points (Top Drivers)",
        "Top Constructors by Wins",
        "Wins Over Years (Top Drivers)",
        "Driver Nationalities (Top 10)",
        "Team Representation (Top 10 Drivers)"
    ),
    vertical_spacing=0.12,
    horizontal_spacing=0.08,
    specs=[
        [{"type": "xy"}, {"type": "xy"}],
        [{"type": "xy"}, {"type": "xy"}],
        [{"type": "domain"}, {"type": "domain"}]  # Pie charts go here
    ]
)


# 1. Career Points - Top 10 Drivers
fig.add_trace(
    go.Bar(x=career_points['driver'], y=career_points['points'], name="Career Points"),
    row=1, col=1
)

# 2. Driver Season-wise Points Line Plot
fig.add_trace(
    go.Scatter(x=driver_seasonal['year'], y=driver_seasonal['points'], mode='lines+markers', name="Seasonal Points", line=dict(color='royalblue')),
    row=1, col=2
)

# 3. Constructor Wins
fig.add_trace(
    go.Bar(x=constructor_wins['team'], y=constructor_wins['wins'], name="Constructor Wins", marker_color='darkorange'),
    row=2, col=1
)

# 4. Driver Wins Over Years
fig.add_trace(
    go.Scatter(x=driver_wins['year'], y=driver_wins['wins'], mode='lines+markers', name="Wins Over Years", line=dict(color='seagreen')),
    row=2, col=2
)

# 5. Driver Nationalities Pie Chart
fig.add_trace(
    go.Pie(labels=nationality_share['nationality'], values=nationality_share['count'], name="Nationalities"),
    row=3, col=1
)

# 6. Team Distribution Pie Chart
fig.add_trace(
    go.Pie(labels=team_distribution['team'], values=team_distribution['count'], name="Team Distribution"),
    row=3, col=2
)

# Final Layout
fig.update_layout(
    height=1000,
    title_text="🏎️ Formula 1 Insights Dashboard",
    showlegend=False,
    template='plotly_dark'
)

fig.show()

## ✅ Summary & Next Steps

This notebook explored key dynamics in Formula 1 racing using historical datasets:

- Merged and cleaned race, driver, constructor, and result data
- Analyzed top-performing drivers and teams
- Visualized patterns over time and across nationalities
- Built an interactive multi-panel dashboard for streamlined storytelling

---

### 📍 Next Steps:
- Build a Streamlit dashboard for interactive public sharing
- Add functionality to compare drivers head-to-head
- Explore circuit-level race performance using maps
- (Optional) Implement an ML-based podium prediction model

Stay tuned as we shift gears into building the **F1 Insights Dashboard App** 🚀