# Exploratory Data Analysis on F1 using SQL
Queries covered in this notebook:
1. Finding locations that hosted maximum number of race
2. Finding circuits that hosted opening races
3. Number of races hosted by every country
4. All-time constructor points and leaders
5. Number of races organized per season
6. Comparison of all-time top 10 teams (based on points and races won)
7. Number of races won by top 10 drivers
8. Finding the fastest laps on each circuit
9. Number of constructors from each country
10. Finding the fastest qualification times on every circuit for pole position
11. Drivers who have won at least one race

Importing Libraries

In [1]:
import pandas as pd
import sqlite3
import plotly.express as px
import plotly.graph_objects as go
import os 

Establishing Connection Between The SQLITE3 Database 

In [2]:
conn = sqlite3.connect("F1.db")

Reading CSV Files

In [6]:
data_1 = pd.read_csv("Dataset\circuits.csv")
data_2 = pd.read_csv("Dataset\constructor_results.csv")
data_3 = pd.read_csv("Dataset\constructor_standings.csv")
data_4 = pd.read_csv("Dataset\constructors.csv")
data_5 = pd.read_csv("Dataset\driver_standings.csv")
data_6 = pd.read_csv("Dataset\drivers.csv")
data_7 = pd.read_csv("Dataset\lap_times.csv")
data_8 = pd.read_csv("Dataset\pit_stops.csv")
data_9 = pd.read_csv("Dataset\qualifying.csv")
data_10 = pd.read_csv("D:\\Programming\\Projects\\SQL & Tableau\\F1 EDA using SQL & Tableau\\Dataset\\races.csv")
data_11 = pd.read_csv("Dataset\seasons.csv")
data_12 = pd.read_csv("Dataset\status.csv")
data_13 = pd.read_csv("D:\\Programming\\Projects\\SQL & Tableau\\F1 EDA using SQL & Tableau\\Dataset\\results.csv")
data_14 = pd.read_csv("Dataset\sprint_results.csv")

Inserting Data Into SQL Tables 

In [7]:
data_1.to_sql("circuits", conn)
data_2.to_sql("constructor_results", conn)
data_3.to_sql("constructor_standings", conn)
data_4.to_sql("constructors", conn)
data_5.to_sql("driver_standings", conn)
data_6.to_sql("drivers", conn)
data_7.to_sql("lap_times", conn)
data_8.to_sql("pit_stops", conn)
data_9.to_sql("qualifying", conn)
data_10.to_sql("races", conn)
data_11.to_sql("seasons", conn)
data_12.to_sql("status", conn)
data_13.to_sql("results", conn)
data_14.to_sql("sprint_results", conn)

180

# 1. Finding locations that hosted maximum number of races:
Discovering the most popular venues in Formula 1 history by identifying places that have hosted the highest number of races. This question allows us to pinpoint the iconic locations that have been at the heart of numerous thrilling races over the years.

In [8]:
circuit_maximum = pd.read_sql('''
    SELECT c.location AS "City", COUNT(r.circuitid) AS "Races Hosted"
    FROM circuits c, races r 
    WHERE r.circuitid = c.circuitid 
    GROUP BY "City" 
    ORDER BY "Races Hosted" DESC;
''', conn)
circuit_maximum 

fig1 =px.bar(circuit_maximum, x="City", y="Races Hosted", color="Races Hosted", title="Races hosted by cities")
fig1.show()

In [9]:
fig1.write_image("plot1.png")

# 2. Finding The Circuits That Have Hosted The Opening Races:
Uncover the historical significance of Formula 1 by identifying the circuits that hosted the very first races of each season. This question provides insights into the tracks that set the stage for the excitement and competition at the beginning of each racing season.

In [10]:
opening_circuit = pd.read_sql(''' 
        SELECT r.name AS  "Grand Prix Name", c.name AS "Circuit Name", c.location AS "City", c.country AS "Country", count(*) AS "Opening Races Hosted"
        FROM circuits c, races r 
        WHERE r.circuitid = c.circuitid AND r.round = 1
        GROUP BY "City" 
        ORDER BY "Opening Races Hosted" DESC;
''', conn)
opening_circuit

Unnamed: 0,Grand Prix Name,Circuit Name,City,Country,Opening Races Hosted
0,Australian Grand Prix,Albert Park Grand Prix Circuit,Melbourne,Australia,22
1,Argentine Grand Prix,Autódromo Juan y Oscar Gálvez,Buenos Aires,Argentina,15
2,South African Grand Prix,Kyalami,Midrand,South Africa,8
3,Brazilian Grand Prix,Autódromo Internacional Nelson Piquet,Rio de Janeiro,Brazil,7
4,Bahrain Grand Prix,Bahrain International Circuit,Sakhir,Bahrain,5
5,Monaco Grand Prix,Circuit de Monaco,Monte-Carlo,Monaco,5
6,Brazilian Grand Prix,Autódromo José Carlos Pace,São Paulo,Brazil,3
7,United States Grand Prix,Phoenix street circuit,Phoenix,USA,2
8,Swiss Grand Prix,Circuit Bremgarten,Bern,Switzerland,2
9,Dutch Grand Prix,Circuit Park Zandvoort,Zandvoort,Netherlands,1


In [11]:
fig2 =px.bar(opening_circuit, x="City", y = "Opening Races Hosted", title = "Cities That Hosted Opening Races", color = 'Opening Races Hosted')
fig2.show()
fig2.write_image("plot2.png")

# 3. Number Of Races Hosted By Every Country 
Exploring the global landscape of Formula 1 by understanding how many races each country has hosted. This question provides a glimpse into the international appeal of the sport and showcases the diverse locations that have been part of Formula 1 history.

In [12]:
count_races = pd.read_sql('''
        SELECT c.country AS "Country", COUNT(*) AS "Races Hosted" 
        FROM circuits c, races r 
        WHERE r.circuitid = c.circuitid
        GROUP BY "Country"
        ORDER BY "Races Hosted" DESC;
''', conn)
count_races

Unnamed: 0,Country,Races Hosted
0,Italy,105
1,Germany,79
2,UK,78
3,USA,75
4,Monaco,69
5,Belgium,68
6,France,63
7,Spain,60
8,Canada,52
9,Brazil,50


In [13]:
fig3 = px.bar(count_races, x = "Country", y = "Races Hosted", color = "Races Hosted", title="Countries That Have Hosted Race")
fig3.show()
fig3.write_image("plot3.png")

# 4. All Time Constructor Points And Leaders 
Diving into the competitive world of Formula 1 teams by analyzing the cumulative points earned by each team over the years. This question helps us identify the leading 20 teams in the sport, showcasing their sustained success and contribution to Formula 1 history.

In [14]:
constructor_all_time = pd.read_sql('''
        SELECT c.name AS "Constructor", c.nationality AS "Nationality", SUM(points) AS "Total Points"
        FROM constructor_results r, constructors c
        WHERE c.constructorId = r.constructorId
        GROUP BY "Constructor"
        ORDER BY "Total Points" DESC
        LIMIT 20;
''', conn)
constructor_all_time

Unnamed: 0,Constructor,Nationality,Total Points
0,Ferrari,Italian,9505.0
1,Mercedes,German,7060.5
2,Red Bull,Austrian,6891.0
3,McLaren,British,6191.5
4,Williams,British,3609.0
5,Renault,French,1777.0
6,Force India,Indian,1098.0
7,Team Lotus,British,918.0
8,Benetton,Italian,861.5
9,Lotus F1,British,706.0


In [15]:
fig4 = px.bar(constructor_all_time, x="Constructor", y="Total Points", color="Total Points", title="Top 20 Constructors - All-time Points Scored")
fig4.show()
fig4.write_image("plot4.png")

# 5. Finding Number Of Races Organized Per Season
Examining the evolution of Formula 1 over time by understanding the variation in the number of races organized each season. This question allows us to identify trends and patterns in the scheduling of races throughout the history of the sport.

In [16]:
race_per_season = pd.read_sql('''
        SELECT strftime("%Y", "date") AS "Year", COUNT(*) AS "Races held"
        FROM races
        GROUP BY "Year"
        ORDER BY "Year";
''', conn)
race_per_season

Unnamed: 0,Year,Races held
0,1950,7
1,1951,8
2,1952,8
3,1953,9
4,1954,9
...,...,...
69,2019,21
70,2020,17
71,2021,22
72,2022,22


In [17]:
fig5 = px.line(race_per_season, x="Year", y="Races held", title="Races Held Every Season")
fig5.show()
fig5.write_image("plot5.png")