# F1 data anaysis using pandas

## Prologue
In this notebook, an analysis of the F1 data is performed using the pandas library. The data is loaded [f1-telem-web-app](https://github.com/AumPauskar/f1-telem-webapp). Docker image may be downloaded from [here](https://hub.docker.com/repository/docker/aumpauskar/f1-flaskapp/general) provided you are running on a linux (debian) based machine. 

## About the data

This data is collected from the 2024 Monaco Grand Prix the historial race which is being run since 1929. In Monaco/Monte Carlo a race victory is historically ensured by **qualifying results** rather than the race.

**What is qualifying?**\
Qualifying is a session that determines the starting order of cars in the race. The person who sets the fastest lap time starts at the front of the grid. The person who sets the slowest lap time starts at the back of the grid. Since Monaco is a street circuit, it is very difficult to overtake. Hence, the starting position is very important. 

**Format of qualifying**\
The qualifying session is divided into three parts. The first part is called Q1, the second part is called Q2, and the third part is called Q3. The slowest 5 cars are eliminated in Q1, from the remaining cars, the slowest 5 cars are eliminated in Q2. The remaining 10 cars compete in Q3 to determine the starting order of the top 10 cars. Laptimes may not be comparable between Q1, Q2, and Q3 because the track gets faster as the rubber is laid down on the track, slight changes in weather, track temperature, wind speed, etc. These factors may not be huge but they are enough between "pole position" and elimination in Q1.

## Cleaning the data

In qualifying drivers do "fast laps" and "slow laps". Fast laps are the one which we are actually interested in and slow laps are the ones where the driver is either warming up the tires, cooling down the engine, charging the battery, etc. We need to filter out the slow laps. By using the **multiviewer for f1** app we can check which telemetry data is the fast lap and which is the slow lap.

In Q1 the fastest driver was **George Russell** and the slowest driver was **Zhou Guanyu**. 

**George Russell** set his fastest lap in Q1 at the time marker of **00:18:30** (- 00:19:42) a time of **1:11.492**. And **Zhou Guanyu** started at the **00:20:00** (- 00:21:13) time marker with a time of **1:13.247**. George Russell got P1 and Zhou Guanyu got P20 in the first part of qualifying.

Here is an example of **George Russell's** cleaned data

In [7]:
import pandas as pd

# Read the CSV data
rus_q1_data = pd.read_csv("data/cleaned/rus_q1.csv")

# Display the data
rus_q1_data.head()

Unnamed: 0,Date,SessionTime,DriverAhead,DistanceToDriverAhead,Time,RPM,Speed,nGear,Throttle,Brake,DRS,Source,Distance,RelativeDistance,Status,X,Y,Z
0,06:45.3,0 days 00:19:40.650000,,1.4683,0 days 00:06:35.883000,11291,236,6,100,False,8,car,15592.72389,0.170809,OnTrack,-7023.965766,-6509.160716,479.901221
1,06:45.6,0 days 00:19:40.905000,,1.467946,0 days 00:06:36.138000,11424,238,6,100,False,8,pos,15609.7176,0.170996,OnTrack,-6940.0,-6696.0,480.0
2,06:45.7,0 days 00:19:41.010000,,1.467593,0 days 00:06:36.243000,11557,241,6,100,False,8,car,15616.82389,0.171073,OnTrack,-6917.380063,-6761.528161,479.872658
3,06:45.9,0 days 00:19:41.210000,,1.466886,0 days 00:06:36.443000,11656,246,6,100,False,8,car,15630.49056,0.171223,OnTrack,-6887.102054,-6865.841812,479.211293
4,06:46.0,0 days 00:19:41.325000,,1.466533,0 days 00:06:36.558000,11671,245,6,100,False,8,pos,15638.31305,0.171309,OnTrack,-6869.0,-6933.0,479.0


Here is an example of **Zhou Guyanyu's** cleaned data

In [6]:
# Read the CSV data
zho_q1_data = pd.read_csv("data/cleaned/zho_q1.csv")

# Display the data
zho_q1_data.head()

Unnamed: 0,Date,SessionTime,DriverAhead,DistanceToDriverAhead,Time,RPM,Speed,nGear,Throttle,Brake,DRS,Source,Distance,RelativeDistance,Status,X,Y,Z
0,06:46.7,0 days 00:19:42.050000,3,426.581389,0 days 00:04:52.829000,11105,241,6,100,False,8,car,10270.33167,0.287439,OnTrack,-5120,-4221,717
1,06:46.8,0 days 00:19:42.085000,3,423.539074,0 days 00:04:52.864000,11185,242,6,100,False,8,pos,10272.70868,0.287505,OnTrack,-5092,-4218,720
2,06:47.0,0 days 00:19:42.305000,3,420.496759,0 days 00:04:53.084000,11266,244,6,100,False,8,pos,10287.72972,0.287926,OnTrack,-4873,-4198,740
3,06:47.1,0 days 00:19:42.369000,3,417.454444,0 days 00:04:53.148000,11347,246,6,100,False,8,car,10292.13,0.288049,OnTrack,-4812,-4190,745
4,06:47.3,0 days 00:19:42.609000,3,410.254444,0 days 00:04:53.388000,11473,250,6,100,False,8,car,10308.79667,0.288515,OnTrack,-4625,-4158,764


From this we can analyze the mean, median, mode, and standard deviation of the lap times of the drivers in Q1.