# 2020 Data Collection
## Introduction
#### The ***2020* Formula 1 season** was a COVID-disrupted, 17-race championship defined by empty grandstands, double-headers, and extreme Mercedes dominance, with **Lewis Hamilton winning his 7th World Championship** to equal Schumacher after crushing the field in the W11, one of the fastest cars in F1 history; Mercedes won 13 of 17 races, while **Max Verstappen** was the only real challenger, dragging Red Bull to occasional wins, and the season also stood out for unexpected first-time winners like **Gasly** (Monza), **Pérez** (Sakhir), and **Ocon** (Sakhir), plus chaos at Ferrari who collapsed to midfield, making 2020 a **strange mix of *historical dominance* and *rare unpredictability***.

## Importing the necessary libraries

In [1]:
import pandas as pd
import fastf1
import logging
from pathlib import Path

In [2]:
fastf1.Cache.enable_cache("../../cache")

## Getting the Races of 2020

In [3]:
schedule = fastf1.get_event_schedule(2020)
listOfRaces = schedule["Country"].tolist()
listOfRaces

['Spain',
 'Spain',
 'Austria',
 'Austria',
 'Hungary',
 'Great Britain',
 'Great Britain',
 'Spain',
 'Belgium',
 'Italy',
 'Italy',
 'Russia',
 'Germany',
 'Portugal',
 'Italy',
 'Turkey',
 'Bahrain',
 'Bahrain',
 'Abu Dhabi']

## Iterating into every Race and concatenating every Race Data into a DataFrame
Through my first try in ["Bahrain_test.ipynb"](https://github.com/Chracker24/MTS-IE/blob/main/02_Notebooks/Data_Collection/Formula1/2020/Bahrain_test.ipynb), I am going to use a loop to collect data and trim it down to the column and data points that I deem necessary for the Intelligence Engine.

In [4]:
cols = [
    "Driver",
    "LapTime",
    "LapNumber",
    "Stint",
    "Sector1Time",
    "Sector2Time",
    "Sector3Time",
    "Compound",
    "Team",
    "Deleted",
]

In [5]:
Season_Data2020 = []

In [6]:
logging.getLogger("fastf1").setLevel(logging.ERROR)
for i in range(len(listOfRaces)):
    session = fastf1.get_session(2020, listOfRaces[i], "R")
    session.load()
    laps = session.laps
    Race_Data = laps[cols].copy()
    Race_Data = Race_Data.dropna(subset=["LapTime"]).copy()
    Race_Data["Season"] = 2020
    Race_Data["Race"] = listOfRaces[i]
    Race_Data[["LapTime","Sector1Time","Sector2Time","Sector3Time"]] = Race_Data[["LapTime","Sector1Time","Sector2Time","Sector3Time"]].apply(lambda x : x.dt.total_seconds())
    Race_Data = Race_Data[Race_Data["Deleted"]==False].copy()

    Season_Data2020.append(Race_Data)

In [7]:
season_2020 = pd.concat(Season_Data2020, ignore_index=True)

In [8]:
season_2020 = season_2020.sort_values("LapNumber").reset_index(drop=True)
season_2020

Unnamed: 0,Driver,LapTime,LapNumber,Stint,Sector1Time,Sector2Time,Sector3Time,Compound,Team,Deleted,Season,Race
0,HAM,94.010,1.0,1.0,,37.157,24.870,MEDIUM,Mercedes,False,2020,Great Britain
1,LAT,105.776,1.0,1.0,,41.922,27.344,MEDIUM,Williams,False,2020,Great Britain
2,VET,128.656,1.0,1.0,,44.097,31.519,WET,Ferrari,False,2020,Turkey
3,RAI,104.681,1.0,1.0,,41.160,27.496,MEDIUM,Alfa Romeo Racing,False,2020,Great Britain
4,KVY,80.360,1.0,1.0,,34.261,23.512,MEDIUM,AlphaTauri,False,2020,Austria
...,...,...,...,...,...,...,...,...,...,...,...,...
19191,HAM,76.627,70.0,4.0,27.970,26.907,21.750,SOFT,Mercedes,False,2020,Hungary
19192,VER,80.222,70.0,3.0,28.674,28.600,22.948,HARD,Red Bull Racing,False,2020,Hungary
19193,BOT,80.276,70.0,4.0,28.554,28.629,23.093,HARD,Mercedes,False,2020,Hungary
19194,ALB,80.549,70.0,3.0,28.906,28.728,22.915,HARD,Red Bull Racing,False,2020,Hungary


## Saving the Dataframe as a CSV (Comma Separated Value) file in Data/Archive

In [None]:

PROJECT_ROOT = Path.cwd().parents[4]
archive_dir = PROJECT_ROOT / "Data" / "Archive"

archive_dir.mkdir(parents=True, exist_ok=True)

season_2020.to_csv(
    archive_dir / "season_2020.csv",
    index=False
)
