# **Racing Data Analysis Project**

Our primary goal is to analyze and determine which cars dominate on specific tracks and identify the most versatile vehicles.

For 50 laps we will got that assess system:

1. 1st place: 10 points + 3 bonuses = 13 points
2. 2nd place: 9 points + 2 bonuses = 11 points
3. 3rd place: 8 points + 1 bonus = 9 points
4. 4th place: 7 points
5. 5th place: 6 points
6. 6th place: 5 points
7. 7th place: 4 points
8. 8th place: 3 points
9. 9th place: 2 points
10. 10th place: 1 point

**Race Length Accounting**

Determine the base length of the race (eg 50 laps).<br>
If the race is longer, for example 64 laps, the points are multiplied by a factor (64/50).<br>
If the race is shorter, for example 42 laps, the points are multiplied by a factor (42/50).<br>
So, if a win is worth 13 points in a 50-lap race, in a 64-lap race it will be worth 13 * (64/50) = 16.64 points.

## **1. Load Libraries**

In [44]:
import pandas as pd

## **2. Load Data**

In [45]:
df = pd.read_parquet(".\\cleaned_data\\race_data.parquet")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7319 entries, 0 to 7318
Data columns (total 17 columns):
 #   Column                Non-Null Count  Dtype          
---  ------                --------------  -----          
 0   Season                7319 non-null   int64          
 1   Meeting               7319 non-null   object         
 2   Event name            7319 non-null   object         
 3   Pos                   7319 non-null   int64          
 4   Car #                 7319 non-null   int64          
 5   Class                 7319 non-null   category       
 6   Special Class         7319 non-null   bool           
 7   Drivers               7319 non-null   object         
 8   Team                  7319 non-null   object         
 9   Car                   7319 non-null   object         
 10  Best lap set          7319 non-null   bool           
 11  Time                  7319 non-null   object         
 12  Time timedelta        7319 non-null   timedelta64[ns]
 13  Lap

In [46]:
categories_order = df['Class'].cat.categories

print(categories_order)

Index(['Invitational', 'Am Cup', 'Bronze Cup', 'Pro-Am Cup', 'Silver Cup',
       'Gold Cup', 'Pro Cup'],
      dtype='object')


In [47]:
is_ordered = df['Class'].cat.ordered

print("Is the 'Class' column ordered?" , is_ordered)

Is the 'Class' column ordered? True


In [48]:
df[(df['Season'] == 2021) & (df['Meeting'] == 'Monza') & (df['Pos'] == 1)]['Event name'].value_counts()

Event name
Main Race                     1
Main Race after 1.30 hour     1
Main Race after 2.30 hours    1
Name: count, dtype: int64

In [49]:
df[(df['Season'] == 2021) & (df['Meeting'] == 'Monza') & (df['Event name'] == 'Main Race') & (df['Pos'] == 1)]['Laps'].value_counts()

Laps
79    1
Name: count, dtype: int64

## Add Points column

In [50]:
# Definition of the point system
points_system = {1: 15, 2: 12, 3: 10, 4: 7, 5: 6, 6: 5, 7: 4, 8: 3, 9: 2, 10: 1}

# Function to assign points based on position
def assign_points(pos, laps, base_laps=50):
    base_points = points_system.get(pos, 0)
    lap_factor = laps / base_laps
    return base_points * lap_factor

# Adding a 'Points' column to the DataFrame
df['Points'] = df.apply(lambda row: assign_points(row['Pos'], row['Laps']), axis=1)

In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7319 entries, 0 to 7318
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype          
---  ------                --------------  -----          
 0   Season                7319 non-null   int64          
 1   Meeting               7319 non-null   object         
 2   Event name            7319 non-null   object         
 3   Pos                   7319 non-null   int64          
 4   Car #                 7319 non-null   int64          
 5   Class                 7319 non-null   category       
 6   Special Class         7319 non-null   bool           
 7   Drivers               7319 non-null   object         
 8   Team                  7319 non-null   object         
 9   Car                   7319 non-null   object         
 10  Best lap set          7319 non-null   bool           
 11  Time                  7319 non-null   object         
 12  Time timedelta        7319 non-null   timedelta64[ns]
 13  Lap

In [61]:
filtered_data = df[(df['Meeting'] == 'Circuit Paul Ricard 1000Km') & (df['Season'] == 2023)]
total_points_by_car = filtered_data.groupby('Car')['Points'].sum().sort_values(ascending=False).reset_index()
total_points_by_car

Unnamed: 0,Car,Points
0,Mercedes-AMG GT3,339.86
1,BMW M4 GT3,250.98
2,Audi R8 LMS evo II GT3,184.36
3,Ferrari 296 GT3,129.62
4,Lamborghini Huracan GT3 EVO 2,24.6
5,Aston Martin Vantage GT3,0.0
6,Ferrari 488 GT3,0.0
7,McLaren 720S GT3 EVO,0.0
8,Porsche 911 GT3 R (992),0.0
9,Porsche 911 GT3-R (991.II),0.0


In [57]:
df[df['Season'] == 2023]['Meeting'].value_counts()

Meeting
CrowdStrike 24 Hours of Spa    1659
Circuit Paul Ricard 1000Km      323
Monza                           201
NÃ¼rburgring                     158
Barcelona                       103
Misano                           74
Valencia                         69
Hockenheim                       68
Brands Hatch                     56
Zandvoort                        54
Name: count, dtype: int64

In [59]:
df.head()

Unnamed: 0,Season,Meeting,Event name,Pos,Car #,Class,Special Class,Drivers,Team,Car,Best lap set,Time,Time timedelta,Laps,Gap,Gap Timedelta,Dropped off the Race,Points
0,2021,Barcelona,Main Race,1,88,Pro Cup,False,"Raffaele Marciello, Felipe Fraga, Jules Gounon",AKKA ASP,Mercedes-AMG GT3,True,1:47.211,0 days 00:01:47.211000,95,0.0,0 days 00:00:00,False,28.5
1,2021,Barcelona,Main Race,2,54,Pro Cup,False,"Klaus Bachler, Christian Engelhart, Matteo Cai...",Dinamic Motorsport,Porsche 911 GT3-R (991.II),True,1:47.148,0 days 00:01:47.148000,95,2.174,0 days 00:00:02.174000,False,22.8
2,2021,Barcelona,Main Race,3,32,Pro Cup,False,"Dries Vanthoor, Robin Frijns, Charles Weerts",Team WRT,Audi R8 LMS GT3,True,1:47.612,0 days 00:01:47.612000,95,4.036,0 days 00:00:04.036000,False,19.0
3,2021,Barcelona,Main Race,4,63,Pro Cup,False,"Mirko Bortolotti, Marco Mapelli, Andrea Caldar...",Orange 1 FFF Racing Team,Lamborghini Huracan GT3 Evo,True,1:47.027,0 days 00:01:47.027000,95,9.511,0 days 00:00:09.511000,False,13.3
4,2021,Barcelona,Main Race,5,4,Pro Cup,False,"Maro Engel, Luca Stolz, Nico Bastian",HRT,Mercedes-AMG GT3,True,1:47.588,0 days 00:01:47.588000,95,9.984,0 days 00:00:09.984000,False,11.4
