<a href="https://colab.research.google.com/github/davidgoins236/data_viz_final_project/blob/main/data_viz_final_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Title: Visual History of Formula 1 Racing: An Exploratory Data Analysis

Author: David Goins

Goal: Understand how Formula 1 has evolved over time using historical data with a seconary objective to evaluate which visualizations communicate these trends most effectively.

In [3]:
!git clone https://github.com/davidgoins236/data_viz_final_project
%cd data_viz_final_project

Cloning into 'data_viz_final_project'...
remote: Enumerating objects: 22, done.[K
remote: Counting objects: 100% (22/22), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 22 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (22/22), 6.24 MiB | 4.25 MiB/s, done.
Resolving deltas: 100% (3/3), done.
/content/data_viz_final_project


In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go


# Functions:
- For this project I leveraged some custom functions in scenarios where I wanted to explore

# Data Load, Wrangling & Cleaning

In [None]:
circuits = pd.read_csv('circuits.csv')
constructor_results = pd.read_csv('constructor_results.csv')
constructor_standings = pd.read_csv('constructor_standings.csv')
constructors = pd.read_csv('constructors.csv')
driver_standings = pd.read_csv('driver_standings.csv')
drivers = pd.read_csv('drivers.csv')
lap_times = pd.read_csv('lap_times.csv')
pit_stops = pd.read_csv('pit_stops.csv')
quali = pd.read_csv('qualifying.csv')
races = pd.read_csv('races.csv')
results = pd.read_csv('results.csv')
seasons = pd.read_csv('seasons.csv')
sprint = pd.read_csv('sprint_results.csv')
status = pd.read_csv('status.csv')

In [32]:
status

Unnamed: 0,statusId,status
0,1,Finished
1,2,Disqualified
2,3,Accident
3,4,Collision
4,5,Engine
...,...,...
134,137,Damage
135,138,Debris
136,139,Illness
137,140,Undertray


# Evolution of the Sport Over Time
 - n of Races per Season
 - n of Teams per Season
 - Circuit Debuts per Season
 - Fastest Lap Times per Track Through Time
 - Pit Stop Speed over Time

## Races per Season:

This time series plot is designed to show the expansion of the number of races over the history of the sport. Some key events are flagged using verticle lines that attempt to provide context for changes in trend. For example, the 2020 season was an obvious devaition from trend due to COVID 19. Overall, the plot shows rapid expansion from 1950 to 1980 followed by a period of stability that resulted from the consolidation of governance in the 80s. Expansion began again in the 2000s as F1 actively expanded to new markets in Asia and the Middle East (e.g., first Chinese Grand Prix in 2004, Singapore 2008).

In [61]:
#note: for this project I use plt_df as a temporary dataframe that is specific to the cell it is run int
plt_df = races.groupby(['year']).agg(race_count = ('raceId', 'count')).reset_index()
fig = px.line(plt_df, x = 'year', y = 'race_count', title = 'Number of Races per Season',markers='.',
              height =500,width=1000,template='seaborn')
#clean up the axis labels
fig.update_layout(xaxis_title = 'Season', yaxis_title = 'Number of Races')

#flag some relevant events from wikipeida: https://en.wikipedia.org/wiki/Formula_One

key_events = {
    1950: "FIA Established",
    1971: "Formation of FOCA",
    1981: "Concorde Agreement: Governance",
    2000: "Effort to Expand to New Markets",
    2020: "Races Canceled Due to Covid 19"
}
for year, event in key_events.items():
    fig.add_vline(
        x=year,
        annotation_text = event,
        opacity=1,
        annotation_textangle=45,
        line_width=2,
        line_dash="dash",
        line_color="grey",
    )

fig.show()

# Constructors & Drivers: Historical Performance
- Wins by Constructor Over Time
- Most Successful Drivers by Wins/Podiums/Poles vs Teammate

# Qualifying vs Finish Dynamics
-  Grid Position vs Finishing Position: To what extent does the qualifying result determine the race result?
- Track Dynamics: Which tracks are most determined by qualifying?
- Driver Dynamics: Which drivers are most capable of improving their results on race day?

# Summary Findings & Executive Summary
 - Key insights
 - Alternative Visualization Techniques