load final dataset used in 3.2

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy.stats import ttest_ind, mannwhitneyu

In [None]:
df = pd.read_csv("/Users/frankdong/Documents/Analytics Local/williams-racing-strategies/processed_data/laptimes_std.csv")

In [None]:
print(df.head())
print(df.info())


<br>
EDA - let the data speak for itself through visualisations. <br>
Get to know the data both visually and statistically, lay the groundwork for analysis and hypothesis testing.<br>
1. Start with core, rough visualisations. <br>
Histograms, boxplots, violin plots, scatterplots, barplots.  <br>
2. Save all interesting "candidate" plots, output and save by exporting as PNG for review.<br>
Plots may reveal surprising outliers, odd clusters, or clear trends<br>
3. Annotate and document observations in Markdown<br>
Write brief markdown notes next to each PNG. Note patterns, anomalies, large/small group sizes, etc.<br>
4. Identify outliers, check sample sizes, assess normality. <br>
Key for hypothesis testing - boxplots and histograms help spot outliers and skew.<br>
Check sample sizes with .groupby() or .value_counts() - are all groups (teams, circuits, years) large enough for statistical tests?<br>
5. Iterate and refine - select the most informative charts for polish, annotation, and inclusion in Tableau<br>


--- Steps 1, 2, 3. Rough Viz, save interesting plots, annotate observations ---

In [None]:
print(df)

"laptime consistency index" KPI now stored as laptime_std_ms. 

In [None]:
print(df["rookie_or_experienced"].value_counts())
print("\n")
# N(experienced) = 89. N(rookie) = 50. n > 30, which is good. 

-------------------- 1: Delta boxplot of grid delta by sector ------------------

set up new figure and size - taller boxplot helps compare heights 

In [None]:
plt.figure(figsize=(9, 9))

create boxplot

In [None]:
consistency_by_experience = sns.boxplot(
    x = "rookie_or_experienced", 
    y = "laptime_std_ms", 
    data = df, 
    hue = "rookie_or_experienced", 
    palette = "Set2", # unbiased colour set for boxplot visualisation
    order = ["experienced", "rookie"] # order the boxes this way
)

set title, xlabel and ylable

In [None]:
consistency_by_experience.set_title("Lap Time Consistency (Standard Deviation) by Experience Level")
consistency_by_experience.set_xlabel("Experience Level")
consistency_by_experience.set_ylabel("Standard Deviation (ms)")

In [None]:
plt.grid(linewidth = 0.25)
plt.show()


<br>
This boxplot breaks down Williams" lap time consistency, the standard deviation of all lap times<br>
recorded in a race session by an experienced or rookie driver. <br>
The box captures the middle 50% of lap time standard deviations (the interquartile range or IQR), <br>
while the median line indicates the typical lap time consistency for each driver skill level. <br>
Experienced drivers demonstrate a slightly superior consistency compared to rookie drivers, with <br>
a median laptime standard deviation of 3600 ms or 3.6 s, compared to the 4050 ms or 4.05 s of rookies.<br>
However, experienced drivers experience a more variable range of results, in both its box and whiskers.<br>
It"s box is larger compared to that of the rookies - 50% of values fall between 2.37 - 5.92s, compared to <br>
2.71 - 5.24 s of rookies. <br>
The "experienced" boxplot also has a larger whisker range between 0.25 - 11.00 s and two outliers between 12 and 14 s.<br>
These outliers need to be excluded in a final t-test.<br>


------------------- 2: Histplots for normality --------------------

plot a grid of histograms, with each of the three representing the sector type

In [None]:
grid = sns.FacetGrid(
    df, 
    col = "rookie_or_experienced", 
    col_order = ["experienced", "rookie"],
    sharex = True, sharey = True, 
    height = 4, aspect = 1
)
grid.map(
    sns.histplot, 
    "laptime_std_ms", 
    kde=True, 
    stat="count", 
    bins=15, 
    color="royalblue"
)

annotate counts to each plot on the grid

In [None]:
for ax, experience in zip(grid.axes.flat, ["experienced", "rookie"]):
    n = df[df["rookie_or_experienced"] == experience].shape[0]
    ax.text(0.95, 0.95, f"n = {n}", ha="right", va="top", transform=ax.transAxes,
            fontsize=12, bbox=dict(boxstyle="round", alpha=0.2))
    ax.set_xlabel("Standard Deviation (ms)")
    ax.set_ylabel("Count")
    ax.set_title(f"{experience.capitalize()} Drivers")

In [None]:
plt.suptitle("Williams' Laptime Consistency Distributions by Driver Experience Level", y=1.08, fontsize=16)
plt.tight_layout()
plt.show()


<br>
Rookie drivers histogram showcases a pattern which mirrors a normal distribution curve - <br>
making observations fairly reliable and suitable for a t-test. <br>
However, due to the two large outliers, the experienced drivers histogram features a long right tail, <br>
and is heavily skewed left. We will now label and remove these outliers before carrying out tests.<br>


---------------- 3: Experienced Driver Histogram, no outliers ----------------

1. filter for experienced williams drivers

In [None]:
df_no_outliers = df.copy()

2. identify the two largest outliers in lap time standard deviation per race/driver

In [None]:
largest_two_outliers = df_no_outliers[df_no_outliers["rookie_or_experienced"] == "experienced"].nlargest(2, "laptime_std_ms")

3. (optional) display the details of these outliers for reporting

In [None]:
print("Removed outliers (for annotation):")
print(largest_two_outliers[["gp_year", "gp_name", "driver_name", "laptime_std_ms"]])

4. drop the two largest outliers from your experienced drivers DataFrame

In [None]:
df_no_outliers = df_no_outliers.drop(largest_two_outliers.index)

5. replot the histogram with KDE for the cleaned data

In [None]:
plt.figure(figsize=(12, 8)) # set up a new figure

In [None]:
no_outliers_histplot = sns.histplot(
    df_no_outliers["laptime_std_ms"], 
    kde=True, 
    bins=15, 
    color="royalblue"
)

In [None]:
no_outliers_histplot.set_title("Williams' Lap Time Consistency Distribution (Experienced Drivers) - Without Top 2 Outliers")
no_outliers_histplot.set_xlabel("Standard Deviation (ms)")
no_outliers_histplot.set_ylabel("Count")

In [None]:
plt.grid(linewidth=0.25)
plt.show()

In [None]:
print("\n")
print(df_no_outliers) # a dataframe, storing no outliers, is now available here.


<br>
Compared to the previous histplot, this one is slightly less left skewed, <br>
with the exclusion of the two largest outliers specified by the boxplot. <br>
However, ~10 values between 8,000 and 11,000 ms result in a stronger right tail. <br>
Remember, most parametric tests, e.g. t-test for mean, assume the data has no major outliers, <br>
and is roughly normal. <br>
A big negative outlier can inflate the calculated standard error, making it harder to achieve<br>
statistical significance.<br>
However, as we meet n > 30 for both samples, and roughly normal distributions - let"s carry out<br>
both a parametric and a non-parametric test -> Welch's t-test and Mann Whitney U Test<br>


------------------ Step 4. Perform hypothesis testing ------------------

Count the samples involved: 

In [None]:
sample_counts = df_no_outliers["rookie_or_experienced"].value_counts()
print("\nSample sizes: ", sample_counts)


<br>
Sample sizes:  rookie_or_experienced<br>
experienced    87<br>
rookie         50<br>
Name: count, dtype: int64<br>
For both samples, n > 30. <br>
Rookie is roughly normal is shape, but experienced is slighly skewed left. <br>
Perform a standard Welch's t-test followed by a Mann-Whitney U Test <br>
Make observations, note differences, compare results at a 95% confidence level.<br>



<br>
Hypothesis Recap: <br>
Rookie drivers had higher lap time variance then their teammates during the 2015-2019 seasons.<br>
Groups: <br>
- Experienced: n = 87<br>
- Rookie: n = 50<br>
- Two different populations - experienced drivers and rookies.<br>
- Testing if rookies have greater standard deviation, not just different. Direction matters.<br>
Two-sample, independent, one-tailed Welch's t-test. <br>
H0 (Null): μ_Rookie ≤ μ_Experienced<br>
H1 (Alt): μ_Rookie > μ_Experienced<br>


1. extract relevant data for the test

In [None]:
experienced_data = df_no_outliers[df_no_outliers["rookie_or_experienced"] == "experienced"]["laptime_std_ms"]
rookie_data = df_no_outliers[df_no_outliers["rookie_or_experienced"] == "rookie"]["laptime_std_ms"]

2. run the Welch's one-tailed t-test

In [None]:
t_stat, p_value = ttest_ind(
    rookie_data, 
    experienced_data, 
    equal_var = False,
    alternative = "greater" # defines alternative hypothesis
)

In [None]:
print("\nWelch's t-test for rookies vs. experienced drivers consistency 95% confidence level.\n")
print(f"t-statistic: {t_stat:.3f}")
print(f"One-tailed p-value: {p_value:.4f}")

In [None]:
alpha = 0.05  # 95% confidence level
if p_value < alpha:
    print("\nReject the null hypothesis (H1): Rookies have significantly higher lap time variance than experienced drivers during 2015-2019")
else: # p_value_one_tailed >= alpha
    print("\nFail to reject the null (H0): No significant evidence that rookies have greater lap time variance compared to experienced drivers between 2015-2019.")


<br>
Welch"s t-test for rookies vs. experienced drivers consistency 95% confidence interval.<br>
t-statistic: -0.214<br>
One-tailed p-value: 0.5844<br>
Fail to reject the null (H0): No significant evidence that rookies have graeter lap time variance compared to experienced drivers between 2015-2019.<br>
A H0 result here, but let"s check the non-parametric test first before we jump to conclusions.<br>


3. Run the Mann-Whitney U Test

In [None]:
m_stat, p_value_2 = mannwhitneyu(rookie_data, experienced_data, alternative="greater")

In [None]:
print("\nMann-Whitney U Test for rookies vs. experienced drivers consistency at 95% confidence level.")
print(f"Mann-Whitney U statistic: {m_stat:.3f}")
print(f"One-tailed p-value: {p_value_2:.4f}")

In [None]:
if p_value_2 < alpha:
    print("\nReject the null hypothesis (H1): Rookies have significantly higher lap time variance than experienced drivers during 2015-2019")
else: # p_value_one_tailed >= alpha
    print("\nFail to reject the null (H0): No significant evidence that rookies have greater lap time variance compared to experienced drivers between 2015-2019.")


<br>
Mann-Whitney U Test for rookies vs. experienced drivers consistency at 95% confidence level.<br>
Mann-Whitney U statistic: 2256.000<br>
One-tailed p-value: 0.3595<br>
Fail to reject the null (H0): No significant evidence that rookies have greater lap time variance compared to experienced drivers between 2015-2019.<br>



<br>
Both p-values are well above the 95% confidence level, 0.05 threshold<br>
This means there"s insufficient evidence to reject the null hypothesis that rookies do not have greater lap time variance.<br>
The data does not support the claim that rookies have significantly less consistency than their experienced teammates<br>
over those seasons. <br>
Interventions aimed solely at rookies for consistency improvement might require re-evaluation, or further factors <br>
should be carefully investigated. <br>


Step 5. Explore other visualisations: 

1. driver consistency trajectory over a season


<br>
plot lap time variance per GP over the course of a season for each driver, to: <br>
-> see if rookies improve race-to-race - a proxy for learning/adaptation<br>
-> see if experienced drivers stay consistent, or degrade due to age, car, team issues. <br>
-> enable individual case studies, such as: <br>
    2015: massa and bottas - 2nd season together<br>
    2016: massa and bottas - 3rd and final season for two experienced drivers<br>
    2017: lance stroll - rookie season. felipe massa - final season.<br>
    2018: sirotkin - rookie F1 season.<br>
    2019: george russell - rookie season. kubica - return to F1 (experienced)<br>


example - george russell, 2019 rookie season

In [None]:
driver_name = "George Russell"
season = 2019

filter data

In [None]:
df_driver_season = df[
    (df["driver_name"] == driver_name) & 
    (df["gp_year"] == season)
]

sort by gp_round for proper chronological order

In [None]:
df_driver_season = df_driver_season.sort_values(by="gp_round")

create combined column for better x-axis labels (optional)

In [None]:
df_driver_season["gp_round_and_name"] = df_driver_season["gp_round"].astype(str) + ": " + df_driver_season["gp_name"]

plot

In [None]:
plt.figure(figsize=(12, 6))

In [None]:
russell_2019_consistency = sns.lineplot(
    data=df_driver_season,
    x="gp_round_and_name",  # using the combined column
    y="laptime_std_ms",
    marker="o",
    linewidth=2,
    color="steelblue"
)

In [None]:
russell_2019_consistency.set_title(f"{driver_name}'s Lap Time Consistency Over {season}")
russell_2019_consistency.set_xlabel("Grand Prix - Round and Name")
russell_2019_consistency.set_ylabel("Lap Time Std Dev (ms)")
russell_2019_consistency.tick_params(axis="x", rotation = 45)

In [None]:
plt.grid(linewidth=0.25)
plt.tight_layout()
plt.savefig("plots/plots3/4-russell-2019-consistency.png") # save fig
plt.show()


<br>
Based on this snapshot of 10 GPs from Russell"s 2019 rookie season (approximately half the season),<br>
the data shows promising signs of adaptation and improved consistency. <br>
Russell demonstrates a general trend toward better lap-to-lap consistency throughout the year,<br>
with his laptime standard deviation improving from ~7.2s early in the season to around 2s <br>
in several mid-season races.<br>
Notably, his most consistent performances (~0.6s std dev) occur at Monaco, Hungary, and Singapore.<br>
While these are technical circuits that traditionally reward precision, it"s important to note<br>
that they are also typically processional races with fewer overtaking opportunities and more<br>
stable race conditions - factors that naturally contribute to lower lap time variability<br>
regardless of driver skill development.<br>
The higher variability seen in Japan (~3.4s) and Brazil (~6.75s) should be interpreted cautiously.<br>
With significant data gaps - particularly the four-race span between Singapore (Round 15) and <br>
Brazil (Round 20) covering Russia, Japan, Mexico, and USA - we cannot definitively assess<br>
whether this represents a decline in consistency or is influenced by external factors such as<br>
incidents, weather conditions, or strategic decisions that fall outside our current analysis scope.<br>
Overall, this partial season snapshot suggests Russell showed encouraging signs of adaptation<br>
as a rookie, though a complete dataset would be needed to draw more definitive conclusions<br>
about his consistency trajectory.<br>


repeating this for other drivers, <br>
2. 2018 - visualising the performance of two rookies driving with each other - stroll and sirotkin<br>
combine both lines on a single plot

In [None]:
driver_names = ["Lance Stroll", "Sergey Sirotkin"]
season = 2018

In [None]:
df_driver_season = df[
    (df["driver_name"].isin(driver_names)) & 
    (df["gp_year"] == season)
]

sort races by gp_round

In [None]:
df_driver_season = df_driver_season.sort_values(by="gp_round")

In [None]:
df_driver_season["gp_round_and_name"] = df_driver_season["gp_round"].astype(str) + ": " + df_driver_season["gp_name"]

plot

In [None]:
plt.figure(figsize=(12, 6))

In [None]:
rookies_2018_consistency = sns.lineplot(
    data=df_driver_season,
    x="gp_round_and_name",  
    y="laptime_std_ms",
    hue = "driver_name",
    palette = "Set2",
    marker="o",
    linewidth=2,
)

In [None]:
rookies_2018_consistency.set_title(f"2018 Rookies' Lap Time Consistency Comparison")
rookies_2018_consistency.set_xlabel("Grand Prix - Round and Name")
rookies_2018_consistency.set_ylabel("Lap Time Std Dev (ms)")
rookies_2018_consistency.tick_params(axis="x", rotation = 45)

In [None]:
plt.legend(title="Driver Name") # add legend
plt.grid(linewidth=0.25) # add gridlines
plt.tight_layout()
plt.savefig("plots/plots3/5-rookies-2018-consistency.png", bbox_inches = "tight") # save fig
plt.show()

3. 2015 - 2nd year for two veteran driver combo

In [None]:
driver_names = ["Valtteri Bottas", "Felipe Massa"]
season = 2015

In [None]:
df_driver_season = df[
    (df["driver_name"].isin(driver_names)) & 
    (df["gp_year"] == season)
]

sort races by gp_round

In [None]:
df_driver_season = df_driver_season.sort_values(by="gp_round")

In [None]:
df_driver_season["gp_round_and_name"] = df_driver_season["gp_round"].astype(str) + ": " + df_driver_season["gp_name"]

plot

In [None]:
plt.figure(figsize=(12, 6))

In [None]:
veterans_2015_consistency = sns.lineplot(
    data=df_driver_season,
    x="gp_round_and_name",  
    y="laptime_std_ms",
    hue = "driver_name",
    palette = "Set2",
    marker="o",
    linewidth=2,
)

In [None]:
veterans_2015_consistency.set_title(f"2015 Veterans' Lap Time Consistency Comparison")
veterans_2015_consistency.set_xlabel("Grand Prix - Round and Name")
veterans_2015_consistency.set_ylabel("Lap Time Std Dev (ms)")
veterans_2015_consistency.tick_params(axis="x", rotation = 45)

In [None]:
plt.legend(title="Driver Name") # add legend
plt.grid(linewidth=0.25) # add gridlines
plt.tight_layout()
plt.savefig("plots/plots3/6-veterans-2015-consistency.png", bbox_inches = "tight") # save fig
plt.show()

3. 2016 - 3rd and final year for two veteran driver combo

In [None]:
driver_names = ["Valtteri Bottas", "Felipe Massa"]
season = 2016

In [None]:
df_driver_season = df[
    (df["driver_name"].isin(driver_names)) & 
    (df["gp_year"] == season)
]

sort races by gp_round

In [None]:
df_driver_season = df_driver_season.sort_values(by="gp_round")

In [None]:
df_driver_season["gp_round_and_name"] = df_driver_season["gp_round"].astype(str) + ": " + df_driver_season["gp_name"]

plot

In [None]:
plt.figure(figsize=(12, 6))

In [None]:
veterans_2016_consistency = sns.lineplot(
    data=df_driver_season,
    x="gp_round_and_name",  
    y="laptime_std_ms",
    hue = "driver_name",
    palette = "Set2",
    marker="o",
    linewidth=2,
)

In [None]:
veterans_2016_consistency.set_title(f"2016 Veterans' Lap Time Consistency Comparison")
veterans_2016_consistency.set_xlabel("Grand Prix - Round and Name")
veterans_2016_consistency.set_ylabel("Lap Time Std Dev (ms)")
veterans_2016_consistency.tick_params(axis="x", rotation = 45)

In [None]:
plt.legend(title="Driver Name") # add legend
plt.grid(linewidth=0.25) # add gridlines
plt.tight_layout()
plt.savefig("plots/plots3/7-veterans-2016-consistency.png", bbox_inches = "tight") # save fig
plt.show()

4. Combined 2015 vs 2016 - Same veterans, same tracks, different years

In [None]:
driver_names = ["Valtteri Bottas", "Felipe Massa"]
seasons = [2015, 2016]

Define the common GP order

In [None]:
gp_order = ['Spanish', 'Monaco', 'Austrian', 'British', 'Hungarian', 
           'Belgian', 'Italian', 'Singapore', 'Japanese', 'Brazilian']

Filter data for both seasons

In [None]:
df_combined = df[
    (df["driver_name"].isin(driver_names)) & 
    (df["gp_year"].isin(seasons))
]

Extract GP type from gp_name (e.g., "Spanish Grand Prix" -> "Spanish")

In [None]:
df_combined['gp_type'] = df_combined['gp_name'].str.replace(' Grand Prix', '')

Filter only the common GPs and create ordering

In [None]:
df_combined = df_combined[df_combined['gp_type'].isin(gp_order)]

Create a categorical column for proper ordering

In [None]:
df_combined['gp_type'] = pd.Categorical(df_combined['gp_type'], categories=gp_order, ordered=True)

Sort by GP order

In [None]:
df_combined = df_combined.sort_values(by='gp_type')

Create separate columns for styling

In [None]:
df_combined['driver_year'] = df_combined['driver_name'] + ' (' + df_combined['gp_year'].astype(str) + ')'
df_combined['year_str'] = df_combined['gp_year'].astype(str)

Plot

In [None]:
plt.figure(figsize=(14, 7))

In [None]:
veterans_comparison = sns.lineplot(
    data=df_combined,
    x="gp_type",
    y="laptime_std_ms",
    hue="driver_name",  # Color by driver
    style="year_str",   # Line style by year
    palette="Set1",
    markers=True,
    linewidth=2.5,
    markersize=6,
    dashes={"2015": (2, 2), "2016": ""},  # Dotted for 2015, solid for 2016
    alpha=0.9  # We'll adjust this manually below
)

Manually adjust alpha for each line

In [None]:
for line in veterans_comparison.get_lines():
    # Check if it's a 2015 line (dotted lines will have dashes)
    if line.get_linestyle() == '--':
        line.set_alpha(0.6)  # Reduced alpha for 2015
    else:
        line.set_alpha(1.0)  # Strong alpha for 2016

In [None]:
veterans_comparison.set_title("Veterans' Consistency: 2015 vs 2016 Comparison\n(Bottas & Massa across same 10 circuits)")
veterans_comparison.set_xlabel("Grand Prix")
veterans_comparison.set_ylabel("Lap Time Std Dev (ms)")
veterans_comparison.tick_params(axis="x", rotation=45)

Customize legend

In [None]:
handles, labels = veterans_comparison.get_legend_handles_labels()
plt.legend(handles, labels, title="Driver Name & Year", bbox_to_anchor=(1.05, 1), loc='upper left')

In [None]:
plt.grid(linewidth=0.25, alpha=0.7)
plt.tight_layout()
plt.savefig("plots/plots3/8-veterans-2015-vs-2016-consistency.png", bbox_inches="tight")
plt.show()

5. 2019 - a rookie and a veteran - russell vs kubica

In [None]:
driver_names = ["George Russell", "Robert Kubica"]
season = 2019

In [None]:
df_driver_season = df[
    (df["driver_name"].isin(driver_names)) & 
    (df["gp_year"] == season)
]

sort races by gp_round

In [None]:
df_driver_season = df_driver_season.sort_values(by="gp_round")

In [None]:
df_driver_season["gp_round_and_name"] = df_driver_season["gp_round"].astype(str) + ": " + df_driver_season["gp_name"]

plot

In [None]:
plt.figure(figsize=(12, 6))

In [None]:
vs_2019_consistency = sns.lineplot(
    data=df_driver_season,
    x="gp_round_and_name",  
    y="laptime_std_ms",
    hue = "driver_name",
    palette = "Set2",
    marker="o",
    linewidth=2,
)

In [None]:
vs_2019_consistency.set_title(f"Russell vs Kubica - 2019 Lap Time Consistency Comparison")
vs_2019_consistency.set_xlabel("Grand Prix - Round and Name")
vs_2019_consistency.set_ylabel("Lap Time Std Dev (ms)")
vs_2019_consistency.tick_params(axis="x", rotation = 45)

In [None]:
plt.legend(title="Driver Name") # add legend
plt.grid(linewidth=0.25) # add gridlines
plt.tight_layout()
plt.savefig("plots/plots3/9-vs-2019-consistency.png", bbox_inches = "tight") # save fig
plt.show()

------------------------------------------------------------------------------------------

6. Driver performance stripplots

In [None]:
plt.figure(figsize=(12,6))

In [None]:
consistency_stripplot = sns.stripplot(
    data=df, 
    x='rookie_or_experienced', 
    y='laptime_std_ms', 
    hue='driver_name', 
    jitter=True, 
    dodge=True,
    palette='tab10'
)

In [None]:
consistency_stripplot.set_title("Lap Time Consistency by Driver and Experience Level")
consistency_stripplot.set_xlabel("Experience Level")
consistency_stripplot.set_ylabel("Lap Time Std Dev (ms)")

In [None]:
plt.legend(title='Driver', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.savefig("plots/plots3/10-variance-stripplots.png")
plt.show()


<br>
The large gap between Kubica and the other experienced drivers serve to identify the time period<br>
of his return, and indicates different consistency patterns compared to the Massa/Bottas era. <br>
Another notable observation - as we limited our circuit selection, we a limited number of entries for Kubica, Sirotkin, and Russell. <br>
Drivers like Stroll, Massa, and Bottas have driven for more than 1 season, hence the larger number of data entries. <br>


7. race-by-race boxplot of std dev -> gp_name as an x-axis<br>
let's understand: do certain races create more lap time variability?

In [None]:
plt.figure(figsize=(14,6))

In [None]:
gp_boxplot = sns.boxplot(data=df, x='gp_name', y='laptime_std_ms')

In [None]:
gp_boxplot.set_title("Lap Time Std Dev by Race (All Drivers)")
gp_boxplot.set_ylabel("Std Dev (ms)")
gp_boxplot.set_xlabel("Grand Prix")

In [None]:
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

8. violin plot with distribution detail

In [None]:
plt.figure(figsize=(9, 6))

In [None]:
violin = sns.violinplot(
    data=df, 
    x='rookie_or_experienced', 
    y='laptime_std_ms', 
    hue = 'rookie_or_experienced',
    palette='Set3', 
    inner='quartile'
)

In [None]:
violin.set_title("Lap Time Consistency Distribution by Experience Level")
violin.set_xlabel("Experience Level")
violin.set_ylabel("Lap Time Std Dev (ms)")

In [None]:
plt.tight_layout()
plt.savefig("plots/plots3/11-experience-violinplot.png")
plt.show()

9. pairplot - to explore correlations

In [None]:
sns.pairplot(
    data = df, 
    hue='rookie_or_experienced', 
    vars=['laptime_std_ms', 'gp_round', 'gp_year']
)  
plt.savefig("plots/plots3/12-pairplot.png")
plt.show()