<a href="https://colab.research.google.com/github/ZandomeneghiChiara/F1_Project/blob/main/F1_Project_Prog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Formula 1, also known as F1 or Formula One, represents the pinnacle of single-seater auto racing, governed by the Fédération Internationale de l'Automobile (FIA) and owned by the Formula One Group. Since its inaugural season in 1950, the FIA Formula One World Championship has been a premier global motorsport competition. The term "formula" refers to the set of stringent rules and regulations to which all participants' cars must adhere. Each Formula One season comprises a series of races called Grands Prix, held on purpose-built circuits and public road courses around the world.

1. Identification Information
- resultId: Unique identifier for the race result entry.
- raceId: Unique identifier for the race.
- driverId: Unique identifier for the driver.
- constructorId: Unique identifier for the constructor/team.
2. Driver and Constructor Information
- number: The race number of the driver for that event.
3. Starting and Finishing Positions
- grid: Starting position of the driver on the grid.
- position: Finishing position of the driver in the race.
- positionText: Textual representation of the finishing position (e.g., "1", "DNF").
- positionOrder: Numerical order of finishing positions.
4. Performance Metrics
- points: The points awarded to the driver for this race.
- laps: Number of laps completed by the driver.
- time: Total time taken to complete the race.
- milliseconds: Total race time in milliseconds.
- fastestLap: Lap number on which the driver set their fastest lap.
- rank: Rank of the fastest lap within the race.
- fastestLapTime: Time of the driver’s fastest lap.
- fastestLapSpeed: Average speed during the driver’s fastest lap.
5. Race Status
- statusId: Unique identifier indicating the race status (e.g., finished, retired, disqualified).

Each of these categories gives a different perspective on the race results:
- *Identification Information*: Helps in uniquely identifying and linking specific results to races, drivers, and constructors.
- *Driver and Constructor Information*: Provides details about who was driving and for which team.
- *Starting and Finishing Positions*: Offers insight into the driver's starting position and their performance in terms of final position.
- *Performance Metrics*: Detailed performance data, including points scored, lap times, and speeds, which are crucial for performance analysis.
- *Race Status*: Indicates the outcome or status of the driver in the race, whether they finished, retired, or were disqualified.

In [91]:
# import libraries
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None

# 1 - Explore the dataset

In [92]:
df = pd.read_csv('results.csv')
df

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,1:34:50.616,5690616,39,2,1:27.452,218.300,1
1,2,18,2,2,3,5,2,2,2,8.0,58,+5.478,5696094,41,3,1:27.739,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,+8.163,5698779,41,5,1:28.090,216.719,1
3,4,18,4,4,5,11,4,4,4,5.0,58,+17.181,5707797,58,7,1:28.603,215.464,1
4,5,18,5,1,23,3,5,5,5,4.0,58,+18.014,5708630,43,1,1:27.418,218.385,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26075,26081,1110,817,213,3,19,16,16,16,0.0,44,+1:43.071,5053521,25,15,1:50.994,227.169,1
26076,26082,1110,858,3,2,18,17,17,17,0.0,44,+1:44.476,5054926,37,9,1:50.486,228.213,1
26077,26083,1110,807,210,27,0,18,18,18,0.0,44,+1:50.450,5060900,26,4,1:49.907,229.415,1
26078,26084,1110,832,6,55,4,\N,R,19,0.0,23,\N,\N,9,19,1:53.138,222.864,130


In [93]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26080 entries, 0 to 26079
Data columns (total 18 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   resultId         26080 non-null  int64  
 1   raceId           26080 non-null  int64  
 2   driverId         26080 non-null  int64  
 3   constructorId    26080 non-null  int64  
 4   number           26080 non-null  object 
 5   grid             26080 non-null  int64  
 6   position         26080 non-null  object 
 7   positionText     26080 non-null  object 
 8   positionOrder    26080 non-null  int64  
 9   points           26080 non-null  float64
 10  laps             26080 non-null  int64  
 11  time             26080 non-null  object 
 12  milliseconds     26080 non-null  object 
 13  fastestLap       26080 non-null  object 
 14  rank             26080 non-null  object 
 15  fastestLapTime   26080 non-null  object 
 16  fastestLapSpeed  26080 non-null  object 
 17  statusId    

In [94]:
df.head()

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,1:34:50.616,5690616,39,2,1:27.452,218.3,1
1,2,18,2,2,3,5,2,2,2,8.0,58,+5.478,5696094,41,3,1:27.739,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,+8.163,5698779,41,5,1:28.090,216.719,1
3,4,18,4,4,5,11,4,4,4,5.0,58,+17.181,5707797,58,7,1:28.603,215.464,1
4,5,18,5,1,23,3,5,5,5,4.0,58,+18.014,5708630,43,1,1:27.418,218.385,1


In [95]:
df.tail()

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
26075,26081,1110,817,213,3,19,16,16,16,0.0,44,+1:43.071,5053521,25,15,1:50.994,227.169,1
26076,26082,1110,858,3,2,18,17,17,17,0.0,44,+1:44.476,5054926,37,9,1:50.486,228.213,1
26077,26083,1110,807,210,27,0,18,18,18,0.0,44,+1:50.450,5060900,26,4,1:49.907,229.415,1
26078,26084,1110,832,6,55,4,\N,R,19,0.0,23,\N,\N,9,19,1:53.138,222.864,130
26079,26085,1110,857,1,81,5,\N,R,20,0.0,0,\N,\N,\N,0,\N,\N,130


In [96]:
df.describe()

Unnamed: 0,resultId,raceId,driverId,constructorId,grid,positionOrder,points,laps,statusId
count,26080.0,26080.0,26080.0,26080.0,26080.0,26080.0,26080.0,26080.0,26080.0
mean,13041.372661,536.695667,266.277569,49.059663,11.167561,12.854141,1.906635,46.076687,17.476074
std,7530.008377,303.034639,272.581622,60.221056,7.232797,7.700068,4.219715,29.726058,26.129965
min,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0
25%,6520.75,294.75,57.0,6.0,5.0,6.0,0.0,22.0,1.0
50%,13040.5,519.0,163.0,25.0,11.0,12.0,0.0,53.0,10.0
75%,19560.25,791.0,364.0,58.25,17.0,18.0,2.0,66.0,14.0
max,26085.0,1110.0,858.0,214.0,34.0,39.0,50.0,200.0,141.0


In [97]:
df.shape

(26080, 18)

# 2 - Cleaning up the dataset

In [98]:
# List of features to analyze
features = ['resultId', 'raceId', 'driverId', 'constructorId', 'number', 'grid',
    'position', 'positionText', 'positionOrder', 'points', 'laps', 'time',
    'milliseconds', 'fastestLap', 'rank', 'fastestLapTime', 'fastestLapSpeed',
    'statusId' ]

In [99]:
resultId_counts = df['resultId'].value_counts()
print("\nresultId counts:")
print(resultId_counts)


resultId counts:
resultId
1        1
17384    1
17394    1
17393    1
17392    1
        ..
8691     1
8690     1
8689     1
8688     1
26085    1
Name: count, Length: 26080, dtype: int64


In [100]:
raceId_counts = df['raceId'].value_counts()
print("\nraceId counts:")
print(raceId_counts)


raceId counts:
raceId
800    55
809    47
360    39
359    39
361    39
       ..
837    14
470    14
660    13
827    13
765    10
Name: count, Length: 1091, dtype: int64


In [101]:
driverId_counts = df['driverId'].value_counts()
print("\ndriverId counts:")
print(driverId_counts)


driverId counts:
driverId
4      370
8      352
22     326
1      322
18     309
      ... 
616      1
617      1
618      1
621      1
604      1
Name: count, Length: 857, dtype: int64


In [102]:
constructorId_counts = df['constructorId'].value_counts()
print("\nconstructorId counts:")
print(constructorId_counts)


constructorId counts:
constructorId
6      2371
1      1855
3      1609
25      881
32      871
       ... 
96        1
93        1
153       1
84        1
123       1
Name: count, Length: 210, dtype: int64


In [103]:
number_counts = df['number'].value_counts()
print("\nnumber counts:")
print(number_counts)


number counts:
number
6      994
8      993
4      985
16     971
3      971
      ... 
123      1
120      1
126      1
110      1
95       1
Name: count, Length: 130, dtype: int64


In [104]:
grid_counts = df['grid'].value_counts()
print("\ngrid counts:")
print(grid_counts)


grid counts:
grid
0     1616
1     1102
7     1101
4     1098
11    1098
9     1098
5     1098
3     1096
10    1096
8     1095
12    1093
2     1092
6     1091
13    1091
14    1086
15    1079
16    1066
17    1055
18    1018
19    1001
20     957
21     697
22     656
23     453
24     429
25     301
26     248
27      46
28      30
29      25
30      19
31      18
32      17
33      13
34       1
Name: count, dtype: int64


In [105]:
position_counts = df['position'].value_counts()
print("\nposition counts:")
print(position_counts)


position counts:
position
\N    10873
3      1101
4      1101
2      1099
5      1097
1      1094
6      1090
7      1070
8      1042
9      1004
10      944
11      867
12      766
13      679
14      571
15      495
16      404
17      310
18      205
19      128
20       67
21       34
22       19
23        8
24        3
25        1
26        1
27        1
28        1
29        1
30        1
31        1
32        1
33        1
Name: count, dtype: int64


In [106]:
positionText_counts = df['positionText'].value_counts()
print("\npositionText counts:")
print(positionText_counts)


positionText counts:
positionText
R     8827
F     1368
3     1101
4     1101
2     1099
5     1097
1     1094
6     1090
7     1070
8     1042
9     1004
10     944
11     868
12     766
13     679
14     572
15     495
16     404
W      330
17     310
18     205
N      190
D      147
19     128
20      67
21      34
22      19
E        9
23       8
24       3
29       1
32       1
31       1
30       1
25       1
28       1
27       1
26       1
33       1
Name: count, dtype: int64


In [107]:
positionOrder_counts = df['positionOrder'].value_counts()
print("\npositionOrder counts:")
print(positionOrder_counts)


positionOrder counts:
positionOrder
3     1101
4     1101
2     1100
11    1099
5     1098
6     1098
7     1098
8     1098
9     1097
10    1096
12    1096
1     1094
13    1092
14    1088
15    1086
16    1073
17    1066
18    1051
19    1028
20    1013
21     745
22     716
23     505
24     478
25     386
26     349
27     270
28     221
29     180
30     156
31     117
32      79
33      65
34      46
35      29
36      18
37      17
38      17
39      13
Name: count, dtype: int64


In [108]:
points_counts = df['points'].value_counts()
print("\npoints counts:")
print(points_counts)


points counts:
points
0.00     18250
2.00      1091
4.00      1079
6.00      1060
1.00      1034
3.00       823
10.00      580
9.00       443
8.00       441
15.00      261
12.00      261
18.00      251
25.00      239
5.00       134
26.00       30
19.00       18
1.50        17
7.00        13
13.00        9
16.00        9
11.00        7
0.50         6
4.50         4
1.33         3
6.50         2
8.50         2
24.00        1
12.50        1
20.00        1
4.14         1
30.00        1
36.00        1
50.00        1
3.50         1
3.14         1
6.14         1
8.14         1
2.50         1
7.50         1
Name: count, dtype: int64


In [109]:
laps_counts = df['laps'].value_counts()
print("\nlaps counts:")
print(laps_counts)


laps counts:
laps
0      2509
70      953
53      905
52      788
56      751
       ... 
192       1
181       1
119       1
120       1
127       1
Name: count, Length: 172, dtype: int64


In [110]:
time_counts = df['time'].value_counts()
print("\ntime counts:")
print(time_counts)


time counts:
time
\N           18829
+8:22.19         5
+0.7             4
+46.2            4
+5.7             4
             ...  
+1:28.787        1
+1:21.319        1
+57.979          1
+49.036          1
+1:50.450        1
Name: count, Length: 7000, dtype: int64


In [111]:
milliseconds_counts = df['milliseconds'].value_counts()
print("\nmilliseconds counts:")
print(milliseconds_counts)


milliseconds counts:
milliseconds
\N          18830
14259460        5
10928200        3
14429440        2
8659600         2
            ...  
5887201         1
5844399         1
5823906         1
4576494         1
5060900         1
Name: count, Length: 7213, dtype: int64


In [112]:
fastestLap_counts = df['fastestLap'].value_counts()
print("\nfastestLap counts:")
print(fastestLap_counts)


fastestLap counts:
fastestLap
\N    18465
50      284
52      268
53      266
51      249
      ...  
77       12
78        6
73        5
80        3
85        2
Name: count, Length: 80, dtype: int64


In [113]:
rank_counts = df['rank'].value_counts()
print("\nrank counts:")
print(rank_counts)


rank counts:
rank
\N    18249
2       377
5       377
1       377
3       377
4       377
6       377
13      376
10      376
11      376
9       376
12      376
7       376
14      375
8       375
15      374
16      373
17      367
18      358
19      322
20      269
0       216
21      122
22       91
23       43
24       28
Name: count, dtype: int64


In [114]:
fastestLapTime_counts = df['fastestLapTime'].value_counts()
print("\nfastestLapTime counts:")
print(fastestLapTime_counts)


fastestLapTime counts:
fastestLapTime
\N          18465
1:18.262        4
1:43.026        4
1:18.904        4
1:17.495        4
            ...  
1:31.167        1
1:30.573        1
1:30.108        1
1:30.279        1
1:53.138        1
Name: count, Length: 6970, dtype: int64


In [115]:
fastestLapSpeed_counts = df['fastestLapSpeed'].value_counts()
print("\nfastestLapSpeed counts:")
print(fastestLapSpeed_counts)


fastestLapSpeed counts:
fastestLapSpeed
\N         18465
208.575        3
222.592        3
207.249        3
200.642        3
           ...  
196.062        1
193.786        1
193.956        1
194.698        1
222.864        1
Name: count, Length: 7145, dtype: int64


In [116]:
statusId_counts = df['statusId'].value_counts()
print("\nstatusId counts:")
print(statusId_counts)


statusId counts:
statusId
1      7246
11     3894
5      2016
12     1598
3      1048
       ... 
59        1
58        1
102       1
56        1
92        1
Name: count, Length: 137, dtype: int64


The goal of my analysis i compute the **Fastest Lap Analysis**:

Analyze fastest lap times (fastestLapTime) and speeds (fastestLapSpeed).
Identify which drivers or teams consistently achieve fastest laps.
Compare fastest laps across races or within a single race to understand performance dynamics.


# 3 - Outliers