# 2022 Data Collection

## Introduction
#### The ***2022* Formula 1 season** marked the beginning of the ground-effect era under new technical regulations and was defined by a dominant **Max Verstappen**, who secured his **second World Championship** early as Red Bull mastered the regulations, while **Ferrari** emerged as the early-season challenger with a fast but unreliable car that faded due to strategic errors and reliability issues; the season featured **22 races**, the introduction of sprint weekends in expanded form, dramatic regulation-driven performance swings across the grid, and a reshuffled competitive order where Red Bull pulled away decisively, Ferrari stumbled, and Mercedes struggled before recovering late, making 2022 a season of **technical upheaval**, **strategic missteps**, and **regulation-driven dominance**.


## Importing the necessary libraries

In [1]:
import pandas as pd
import fastf1

In [2]:
fastf1.Cache.enable_cache("../cache")

## Getting the Races of 2022

In [3]:
schedule = fastf1.get_event_schedule(2022)
listOfRaces = schedule["Country"].tolist()
listOfRaces

['Spain',
 'Bahrain',
 'Bahrain',
 'Saudi Arabia',
 'Australia',
 'Italy',
 'United States',
 'Spain',
 'Monaco',
 'Azerbaijan',
 'Canada',
 'Great Britain',
 'Austria',
 'France',
 'Hungary',
 'Belgium',
 'Netherlands',
 'Italy',
 'Singapore',
 'Japan',
 'United States',
 'Mexico',
 'Brazil',
 'Abu Dhabi']

## Iterating into every Race and concatenating every Race Data into a DataFrame
Through my first try in ["Bahrain_test.ipynb"](https://github.com/Chracker24/MTS-IE/blob/main/02_Notebooks/Data_Collection/Formula1/2020/Bahrain_test.ipynb), I am going to use a loop to collect data and trim it down to the column and data points that I deem necessary for the Intelligence Engine.

In [7]:
cols = [
    "Driver",
    "LapTime",
    "Stint",
    "Sector1Time",
    "Sector2Time",
    "Sector3Time",
    "Compound",
    "Team",
    "Deleted",
]

In [8]:
Season_Data2022 = []

In [10]:
for i in range(len(listOfRaces)):
    session = fastf1.get_session(2022, listOfRaces[i], "R")
    session.load(laps=True, telemetry=False, weather=False)
    laps = session.laps
    Race_Data = laps[cols].copy()
    Race_Data = Race_Data.dropna(subset=["LapTime"]).copy()
    Race_Data["Season"] = 2022
    Race_Data["Race"] = listOfRaces[i]
    Race_Data[["LapTime","Sector1Time","Sector2Time","Sector3Time"]] = Race_Data[["LapTime","Sector1Time","Sector2Time","Sector3Time"]].apply(lambda x : x.dt.total_seconds())
    Race_Data = Race_Data[Race_Data["Deleted"]==False].copy()

    Season_Data2022.append(Race_Data)

core           INFO 	Loading data for Spanish Grand Prix - Race [v3.7.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '11', '63', '55', '44', '77', '31', '4', '14', '22', '5', '3', '10', '47', '18', '6', '20', '23', '24', '16']
core           INFO 	Loading data for Bahrain Grand Prix - Race [v3.7.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for s

In [11]:
season_2022 = pd.concat(Season_Data2022, ignore_index=True)

In [12]:
season_2022

Unnamed: 0,Driver,LapTime,Stint,Sector1Time,Sector2Time,Sector3Time,Compound,Team,Deleted,Season,Race
0,VER,89.739,1.0,,32.819,29.417,SOFT,Red Bull Racing,False,2022,Spain
1,VER,87.509,1.0,24.345,33.238,29.926,SOFT,Red Bull Racing,False,2022,Spain
2,VER,87.574,1.0,24.494,33.214,29.866,SOFT,Red Bull Racing,False,2022,Spain
3,VER,87.601,1.0,24.462,33.042,30.097,SOFT,Red Bull Racing,False,2022,Spain
4,VER,87.937,1.0,24.486,33.222,30.229,SOFT,Red Bull Racing,False,2022,Spain
...,...,...,...,...,...,...,...,...,...,...,...
33112,ALO,91.819,2.0,18.373,38.859,34.587,HARD,Alpine,False,2022,Abu Dhabi
33113,ALO,90.579,2.0,18.351,38.485,33.743,HARD,Alpine,False,2022,Abu Dhabi
33114,ALO,91.065,2.0,18.337,39.215,33.513,HARD,Alpine,False,2022,Abu Dhabi
33115,ALO,90.688,2.0,18.295,39.038,33.355,HARD,Alpine,False,2022,Abu Dhabi
