# F1 Data Analysis Project

---
### Basic Concepts
1. **Charts and Graphs**:
   - Bar Chart
   - Line Chart
   - Pie Chart
   - Scatter Plot
   - Histogram
   - Box Plot

2. **Tables**:
   - Simple Tables
   - Pivot Tables

3. **Maps**:
   - Geographic Maps
   - Heat Maps
---
### Intermediate Concepts
4. **Time Series Analysis**:
   - Line Charts
   - Area Charts
   - Candlestick Charts

5. **Distribution Plots**:
   - Histograms
   - Box Plots
   - Violin Plots

6. **Comparative Analysis**:
   - Grouped Bar Charts
   - Stacked Bar Charts
   - Side-by-Side Box Plots

7. **Correlation and Relationships**:
   - Scatter Plots
   - Bubble Charts
   - Pair Plots
---
### Advanced Concepts
8. **Multivariate Analysis**:
   - Heatmaps
   - Parallel Coordinates
   - Radar Charts

9. **Geospatial Analysis**:
   - Choropleth Maps
   - Dot Density Maps
   - Flow Maps

10. **Interactive Visualizations**:
    - Interactive Dashboards
    - Drill-Down Charts
    - Linked Visualizations

11. **Network Graphs**:
    - Node-Link Diagrams
    - Matrix Plots

12. **3D Visualizations**:
    - 3D Scatter Plots
    - 3D Surface Plots

13. **Animated Visualizations**:
    - Animated Line Charts
    - Animated Scatter Plots
---
### Design Principles
14. **Color Theory**:
    - Color Palettes
    - Color Blindness Considerations

15. **Layout and Composition**:
    - Grid Layouts
    - White Space

16. **Typography**:
    - Font Choices
    - Text Hierarchy

17. **Data Storytelling**:
    - Narrative Flow
    - Annotations
---
### Tools and Libraries
18. **Software and Tools**:
    - Tableau
    - Power BI
    - Excel

19. **Programming Libraries**:
    - Matplotlib
    - Seaborn
    - Plotly
    - D3.js


In [1]:
import pandas as pd
import requests
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
from sklearn.model_selection import train_test_split
import warnings
warnings.simplefilter("ignore")
pd.set_option('display.max_columns', None)
print("Pandas version:", pd.__version__)
print("Seaborn version:", sns.__version__)
print("Matplotlib version:", matplotlib.__version__)
print("NumPy version:", np.__version__)

Pandas version: 2.2.3
Seaborn version: 0.13.2
Matplotlib version: 3.9.3
NumPy version: 2.2.0


### Loading of Libraries.

1. **Importing Libraries:**

  -   **Pandas** (`import pandas as pd`):  
  Pandas is like an advanced spreadsheet tool that allows us to load, manipulate, and analyze large sets of data quickly.

  -   **Seaborn** (`import seaborn as sns`):  
  Seaborn is a tool for making nice-looking charts and graphs. It builds on top of another tool (Matplotlib) to make visualizations prettier and easier to create.

  -   **Matplotlib** (`import matplotlib.pyplot as plt` and `import matplotlib`):  
  This is a library for creating plots and charts in Python. Think of it like drawing tools that help us visualize data.

  -   **NumPy** (`import numpy as np`):  
  NumPy is used for handling numbers and calculations in a more efficient way. It’s great for working with large groups of numbers, especially in math-heavy tasks.

  -   **Scikit-Learn** (`from sklearn.model_selection import train_test_split`):  
  This is a popular library for machine learning. It helps split data into training and testing parts, which is a key step in training predictive models.

2. **Setting Up Warnings:**

   ```python
   import warnings
   warnings.simplefilter("ignore")


3. **Printing Version Information:**

   ```python
   print("Pandas version:", pd.__version__)
   print("Seaborn version:", sns.__version__)
   print("Matplotlib version:", matplotlib.__version__)
   print("NumPy version:", np.__version__)

These lines display the versions of each library in use, which helps in keeping track of the exact setup, since different versions might have small differences in functionality.


## Import Libraries
The code starts by importing the `os` library, which is used for interacting with the operating system.

### Function to Read CSV from Google Drive
- **Function Definition**: `def read_csv_from_drive(url):` defines a function that takes a Google Drive URL as input.
- **Extract File ID**: The function extracts the file ID from the URL to create a direct download link.
- **Download File**: It uses the `requests` library to download the file from Google Drive.
- **Save File Temporarily**: The downloaded file is saved temporarily on the local system.
- **Read CSV**: The file is read into a DataFrame using `pandas`.
- **Clean Up**: The temporary file is deleted after reading.
- **Error Handling**: If any error occurs, it prints an error message and returns `None`.

### Google Drive URLs
A dictionary named `urls` contains the URLs of various CSV files stored on Google Drive.

### Load DataFrames
The code iterates over the `urls` dictionary, calling the `read_csv_from_drive` function for each URL and storing the resulting DataFrames in a new dictionary called `dataframes`.

### Print DataFrame Summary
For each DataFrame, it prints whether the DataFrame was loaded successfully, its shape (number of rows and columns), and its column names.

### Ensure All DataFrames Are Loaded
The code assigns each DataFrame from the `dataframes` dictionary to a variable for easier access later.

### Example Usage

- **Unique Location-Country Pairs**: If the `circuits_df` DataFrame is loaded successfully, it prints unique pairs of locations and countries.
- **Circuit Count Per Country**: It also prints the count of circuits per country.



In [2]:
import os

# Function to read CSV file from Google Drive
def read_csv_from_drive(url):
    try:
        # Extract file ID and create download URL
        file_id = url.split('/d/')[1].split('/')[0]
        download_url = f"https://drive.google.com/uc?id={file_id}"
        print(f"Downloading from: {download_url}")
        
        # Download the file
        response = requests.get(download_url)
        response.raise_for_status()
        
        # Save to temporary file
        temp_file = 'temp_csv.csv'
        with open(temp_file, 'wb') as file:
            file.write(response.content)
        
        # Read CSV into a DataFrame
        df = pd.read_csv(temp_file, on_bad_lines='skip')
        os.remove(temp_file)  # Clean up temporary file
        return df
    except Exception as e:
        print(f"Error reading file from {url}: {e}")
        return None

# Google Drive URLs
urls = {
    'circuits_df': 'https://drive.google.com/file/d/1-nmbX9yd1FWx41QouC4tM4NTtXrRF1nz/view?usp=sharing',
    'constructor_results_df': 'https://drive.google.com/file/d/1mVRIG28qZr-z5LJci-sT3MGKgdDv3zdR/view?usp=sharing',
    'constructor_standings_df': 'https://drive.google.com/file/d/1EbDJF5MDXOR_5igh1btuAVZJQPEYNfo0/view?usp=drive_link',
    'lap_times_df': 'https://drive.google.com/file/d/1-UalbzTCdNOMaIvNcejKi7jcfB_9ryQZ/view?usp=drive_link',
    'pit_stops_df': 'https://drive.google.com/file/d/1IGEHa6mbyjBlMUi84nKQf9Rufz0Gw0CV/view?usp=drive_link',
    'qualifying_df': 'https://drive.google.com/file/d/1oJwNLCSgjnh5wyO2qibaMME6hxL-OuC2/view?usp=drive_link',
    'results_df': 'https://drive.google.com/file/d/11vyh0O1blCuzweha8_5o60La0TJqg_rj/view?usp=drive_link',
    'seasons_df': 'https://drive.google.com/file/d/1rtkiMn7g07ZvB8U88jFlL9P4kqw_Ud2w/view?usp=drive_link',
    'sprint_results_df': 'https://drive.google.com/file/d/1nIZYslPrGQrFnwWboNa9ktIC7Uzcj2dR/view?usp=drive_link',
    'status_df': 'https://drive.google.com/file/d/1L8-FZlC8OAl6QWEEGMvr8XrLUyZepkZk/view?usp=drive_link',
    'drivers_df': 'https://drive.google.com/file/d/1fR1Y7Y1qWXZpcbpexVynH5ZewF-pBP0k/view?usp=drive_link',
    'races_df': 'https://drive.google.com/file/d/1-IKn_OpmhhJFPLmKV-KNnVCYEFzS4vjH/view?usp=drive_link',
    'constructors_df': 'https://drive.google.com/file/d/1umEG3vYsUi1-ilft5LRPSVXXb-J6DSUH/view?usp=drive_link',
    'driver_standings_df': 'https://drive.google.com/file/d/1hq2zpHLsjqmUhE51guQoazAPGVQsCwbD/view?usp=drive_link'
}

# Load DataFrames
dataframes = {name: read_csv_from_drive(url) for name, url in urls.items()}

# Print DataFrame summary
for name, df in dataframes.items():
    if df is not None:
        print(f"✅ {name}: Loaded successfully.")
        print(f"Shape: {df.shape}")
        print(f"Columns: {list(df.columns)}\n")
    else:
        print(f"❌ {name}: Failed to load.\n")

# Ensure all DataFrames are loaded
circuits_df = dataframes.get('circuits_df')
constructor_results_df = dataframes.get('constructor_results_df')
constructor_standings_df = dataframes.get('constructor_standings_df')
lap_times_df = dataframes.get('lap_times_df')
pit_stops_df = dataframes.get('pit_stops_df')
qualifying_df = dataframes.get('qualifying_df')
results_df = dataframes.get('results_df')
seasons_df = dataframes.get('seasons_df')
sprint_results_df = dataframes.get('sprint_results_df')
status_df = dataframes.get('status_df')
drivers_df = dataframes.get('drivers_df')
races_df = dataframes.get('races_df')
constructors_df = dataframes.get('constructors_df')
driver_standings_df = dataframes.get('driver_standings_df')

# Example usage of loaded DataFrames
if circuits_df is not None:
    unique_location_country_pairs = circuits_df[['location', 'country']].drop_duplicates()
    print("Unique location-country pairs:")
    print(unique_location_country_pairs)
    
    country_circuit_count = circuits_df.groupby('country')['circuitId'].nunique().reset_index(name='circuit_count')
    print("Circuit count per country:")
    print(country_circuit_count)
else:
    print("Failed to load circuits_df.")

# Add similar checks and usage for other DataFrames as needed


Downloading from: https://drive.google.com/uc?id=1-nmbX9yd1FWx41QouC4tM4NTtXrRF1nz
Downloading from: https://drive.google.com/uc?id=1mVRIG28qZr-z5LJci-sT3MGKgdDv3zdR
Downloading from: https://drive.google.com/uc?id=1EbDJF5MDXOR_5igh1btuAVZJQPEYNfo0
Downloading from: https://drive.google.com/uc?id=1-UalbzTCdNOMaIvNcejKi7jcfB_9ryQZ
Downloading from: https://drive.google.com/uc?id=1IGEHa6mbyjBlMUi84nKQf9Rufz0Gw0CV
Downloading from: https://drive.google.com/uc?id=1oJwNLCSgjnh5wyO2qibaMME6hxL-OuC2
Downloading from: https://drive.google.com/uc?id=11vyh0O1blCuzweha8_5o60La0TJqg_rj
Downloading from: https://drive.google.com/uc?id=1rtkiMn7g07ZvB8U88jFlL9P4kqw_Ud2w
Downloading from: https://drive.google.com/uc?id=1nIZYslPrGQrFnwWboNa9ktIC7Uzcj2dR
Downloading from: https://drive.google.com/uc?id=1L8-FZlC8OAl6QWEEGMvr8XrLUyZepkZk
Downloading from: https://drive.google.com/uc?id=1fR1Y7Y1qWXZpcbpexVynH5ZewF-pBP0k
Downloading from: https://drive.google.com/uc?id=1-IKn_OpmhhJFPLmKV-KNnVCYEFzS4vjH
Down

In [3]:
# Print only the successfully loaded tables
print("Successfully Loaded Tables:")
for table_name, df in dataframes.items():
    if df is not None:
        print(table_name)


Successfully Loaded Tables:
circuits_df
constructor_results_df
constructor_standings_df
lap_times_df
pit_stops_df
qualifying_df
results_df
seasons_df
sprint_results_df
status_df
drivers_df
races_df
constructors_df
driver_standings_df


#### List of tables and columns.

| Table                     | Columns                                                                                                                                                                        |
|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **circuits_df**           | `circuitId`, `circuitRef`, `name`, `location`, `country`, `lat`, `lng`, `alt`, `url`                                                                                           |
| **constructor_results_df** | `constructorResultsId`, `raceId`, `constructorId`, `points`, `status`                                                                                                          |
| **constructor_standings_df** | `constructorStandingsId`, `raceId`, `constructorId`, `points`, `position`, `positionText`, `wins`                                                                        |
| **lap_times_df**          | `raceId`, `driverId`, `lap`, `position`, `time`, `milliseconds`                                                                                                                |
| **pit_stops_df**          | `raceId`, `driverId`, `stop`, `lap`, `time`, `duration`, `milliseconds`                                                                                                        |
| **qualifying_df**         | `qualifyId`, `raceId`, `driverId`, `constructorId`, `number`, `position`, `q1`, `q2`, `q3`                                                                                     |
| **results_df**            | `resultId`, `raceId`, `driverId`, `constructorId`, `number`, `grid`, `position`, `positionText`, `positionOrder`, `points`, `laps`, `time`, `milliseconds`, `fastestLap`, `rank`, `fastestLapTime`, `fastestLapSpeed`, `statusId` |
| **seasons_df**            | `year`, `url`                                                                                                                                                                  |
| **sprint_results_df**     | `resultId`, `raceId`, `driverId`, `constructorId`, `number`, `grid`, `position`, `positionText`, `positionOrder`, `points`, `laps`, `time`, `milliseconds`, `fastestLap`, `fastestLapTime`, `statusId` |
| **status_df**             | `statusId`, `status`                                                                                                                                                           |
| **drivers_df**            | `driverId`, `driverRef`, `number`, `code`, `forename`, `surname`, `dob`, `nationality`, `url`                                                                                  |
| **races_df**              | `raceId`, `year`, `round`, `circuitId`, `name`, `date`, `time`, `url`, `fp1_date`, `fp1_time`, `fp2_date`, `fp2_time`, `fp3_date`, `fp3_time`, `quali_date`, `quali_time`, `sprint_date`, `sprint_time` |
| **constructors_df**       | `constructorId`, `constructorRef`, `name`, `nationality`, `url`                                                                                                                |
| **driver_standings_df**   | `driverStandingsId`, `raceId`, `driverId`, `points`, `position`, `positionText`, `wins`.                                                                                        |