# 2021 Data Collection

## Introduction
#### The ***2021* Formula 1 season** was a fiercely contested, 22-race championship defined by an intense, season-long duel between **Max Verstappen** and **Lewis Hamilton**, culminating in one of the most controversial finales in motorsport history at Abu Dhabi, where Verstappen secured his **first World Championship** on the final lap; Red Bull and Mercedes were closely matched throughout the season, trading victories and momentum, while the year also stood out for multiple first-time winners such as **Ocon** (Hungary), **Ricciardo** (Monza), and **Norris** (Russia—nearly), dramatic incidents at Silverstone, Monza, and Jeddah, and a highly competitive midfield, making 2021 a **rare blend of *pure rivalry*, *high drama*, and *modern F1 controversy***.


## Importing the necessary libraries

In [7]:
import pandas as pd
import fastf1
import logging

In [8]:
fastf1.Cache.enable_cache("../cache")

## Getting the Races of 2021

In [9]:
schedule = fastf1.get_event_schedule(2021)
listOfRaces = schedule["Country"].tolist()
listOfRaces

['Bahrain',
 'Bahrain',
 'Italy',
 'Portugal',
 'Spain',
 'Monaco',
 'Azerbaijan',
 'France',
 'Austria',
 'Austria',
 'Great Britain',
 'Hungary',
 'Belgium',
 'Netherlands',
 'Italy',
 'Russia',
 'Turkey',
 'United States',
 'Mexico',
 'Brazil',
 'Qatar',
 'Saudi Arabia',
 'Abu Dhabi']

## Iterating into every Race and concatenating every Race Data into a DataFrame
Through my first try in ["Bahrain_test.ipynb"](https://github.com/Chracker24/MTS-IE/blob/main/02_Notebooks/Data_Collection/Formula1/2020/Bahrain_test.ipynb), I am going to use a loop to collect data and trim it down to the column and data points that I deem necessary for the Intelligence Engine.

In [10]:
cols = [
    "Driver",
    "LapTime",
    "LapNumber",
    "Stint",
    "Sector1Time",
    "Sector2Time",
    "Sector3Time",
    "Compound",
    "Team",
    "Deleted",
]

In [11]:
Season_Data2021 = []

In [12]:
logging.getLogger("fastf1").setLevel(logging.ERROR)
for i in range(len(listOfRaces)):
    session = fastf1.get_session(2021, listOfRaces[i], "R")
    session.load()
    laps = session.laps
    Race_Data = laps[cols].copy()
    Race_Data = Race_Data.dropna(subset=["LapTime"]).copy()
    Race_Data["Season"] = 2021
    Race_Data["Race"] = listOfRaces[i]
    Race_Data[["LapTime","Sector1Time","Sector2Time","Sector3Time"]] = Race_Data[["LapTime","Sector1Time","Sector2Time","Sector3Time"]].apply(lambda x : x.dt.total_seconds())
    Race_Data = Race_Data[Race_Data["Deleted"]==False].copy()

    Season_Data2021.append(Race_Data)

In [13]:
season_2021 = pd.concat(Season_Data2021, ignore_index=True)

In [14]:
season_2021 = season_2021.sort_values("LapNumber").reset_index(drop=True)
season_2021

Unnamed: 0,Driver,LapTime,LapNumber,Stint,Sector1Time,Sector2Time,Sector3Time,Compound,Team,Deleted,Season,Race
0,NOR,93.959,1.0,1.0,,34.093,30.013,SOFT,McLaren,False,2021,Spain
1,STR,111.426,1.0,1.0,,31.965,47.070,MEDIUM,Aston Martin,False,2021,Italy
2,RAI,106.760,1.0,1.0,,35.989,26.549,INTERMEDIATE,Alfa Romeo Racing,False,2021,Turkey
3,LEC,219.679,1.0,1.0,,89.989,56.823,UNKNOWN,Ferrari,False,2021,Belgium
4,MSC,125.671,1.0,1.0,,49.423,26.114,SOFT,Haas F1 Team,False,2021,Azerbaijan
...,...,...,...,...,...,...,...,...,...,...,...,...
23893,HAM,78.003,78.0,3.0,20.193,36.406,21.404,SOFT,Mercedes,False,2021,Monaco
23894,VET,77.071,78.0,2.0,20.335,35.978,20.758,HARD,Aston Martin,False,2021,Monaco
23895,NOR,76.101,78.0,2.0,19.692,35.738,20.671,HARD,McLaren,False,2021,Monaco
23896,PER,75.307,78.0,2.0,19.850,35.110,20.347,HARD,Red Bull Racing,False,2021,Monaco


## Saving the CSV file into the Data/Archive File

In [15]:
from pathlib import Path

PROJECT_ROOT = Path.cwd().parents[2]
archive_dir = PROJECT_ROOT / "Data" / "Archive"

archive_dir.mkdir(parents=True, exist_ok=True)

season_2021.to_csv(
    archive_dir / "season_2021.csv",
    index=False
)
