# Analysis of Formula 1 Drivers — 2008 Season

## Objective
Analyze **points, wins, and podiums** of drivers and constructors in the 2008 Formula One World Championship.

---

## Dataset
The dataset `formula1_data.csv` contains the following columns:

- **Driver** → Driver’s name  
- **Team** → Constructor’s name  
- **Race** → Grand Prix city  
- **Country** → Grand Prix country  
- **Position** → Finishing position (0–8)  
  - `0` = driver outside the top 8 (no points)  
  - `1–8` = valid position with scoring

---

## Scoring System (2008)
Points awarded to the top 8 drivers:

| Position | Points |
|----------|--------|
| 1st      | 10     |
| 2nd      | 8      |
| 3rd      | 6      |
| 4th      | 5      |
| 5th      | 4      |
| 6th      | 3      |
| 7th      | 2      |
| 8th      | 1      |
| 9th+     | 0      |

---

## Required Features

1. **Individual performance**  
   - Function that takes a driver’s name as input  
   - Returns a list with:  
     `[total_points, wins, podiums]`

2. **Drivers’ standings**  
   - Create a dictionary `{driver: total_points}`  
   - Sort by score (descending)  
   - Save to file `Drivers_Standings_2008.txt`

3. **Constructors’ standings**  
   - Calculate total points for each team  
   - Sum of points obtained by the team’s drivers  
   - Return a dictionary `{team: total_points}`

---

## Expected Output
- Individual analysis of drivers (points, wins, podiums)  
- File `Drivers_Standings_2008.txt` with final standings  
- Aggregated constructors’ standings  
- Summaries and preview tables in console



# Cell 0 — Robust CSV Setup

In this section, the dataset `formula1_data.csv` is handled:

- If the file is already available (locally or in Colab), it will be used directly.  
- If the file is not found, it will be automatically downloaded from a public URL.  

**Expected output:** full path of the CSV file in use.


In [1]:
# If the CSV path is not specified, look for the file and, if missing, try to download it
CSV_PATH = None

import os
import urllib.request

CSV_URL = "https://proai-datasets.s3.eu-west-3.amazonaws.com/formula1_data.csv"
CSV_NAME = "formula1_data.csv"


def find_csv(name: str = CSV_NAME,
             env_var_name: str = "F1_CSV_PATH",
             extra_paths: list[str] | None = None) -> str:
    """
    Find the CSV file and return the absolute path if it exists.

    Search order:
      1) Environment variable (if set).
      2) Any extra paths provided by the user.
      3) Standard paths: notebook folder, /mnt/data, /content, /content/drive/MyDrive.
    """
    # 1) Environment variable (highest priority)
    p = os.environ.get(env_var_name)
    if p and os.path.isfile(p):
        return os.path.abspath(p)

    # 2) Extra paths (if any) + 3) typical Colab/notebook locations
    paths_to_check = list(extra_paths or [])
    paths_to_check.extend([
        name,
        f"/mnt/data/{name}",
        f"/content/{name}",
        f"/content/drive/MyDrive/{name}",
    ])

    for path in paths_to_check:
        if os.path.isfile(path):
            return os.path.abspath(path)

    raise FileNotFoundError("CSV not found in standard or additional paths.")


def download_csv_if_missing(dest: str = CSV_NAME, url: str = CSV_URL) -> str:
    """
    Download the CSV from the given URL only if it is missing locally.
    Returns the absolute path. Raises on failure or empty file.
    """
    if os.path.isfile(dest):
        return os.path.abspath(dest)

    print(f"CSV not found. Attempting download from:\n   {url}")
    try:
        urllib.request.urlretrieve(url, dest)
        if os.path.isfile(dest) and os.path.getsize(dest) > 0:
            path = os.path.abspath(dest)
            print(f"Downloaded to: {path}")
            return path
        raise RuntimeError("File missing or empty after download.")
    except Exception as e:
        raise RuntimeError(f"Error while downloading the CSV: {e}") from e


def init_csv(optional_path: str | None = None, name: str = CSV_NAME) -> str:
    """
    Return the absolute path of the CSV following three steps:
      1) Use the manual path if provided and it exists.
      2) Otherwise, automatically search with find_csv().
      3) If not found, download the file with download_csv_if_missing().
    """
    if optional_path:
        if os.path.isfile(optional_path):
            path = os.path.abspath(optional_path)
            print(f"CSV (manual path): {path}")
            return path
        raise FileNotFoundError(f"Provided path is invalid: {optional_path}")

    try:
        path = find_csv(name=name)
        print(f"CSV found: {path}")
        return path
    except FileNotFoundError:
        path = download_csv_if_missing(dest=name)
        print(f"CSV available after download: {path}")
        return path


# Execution
RESOLVED_CSV_PATH = init_csv(CSV_PATH, name=CSV_NAME)
print("CSV resolved to:", RESOLVED_CSV_PATH)

CSV not found. Attempting download from:
   https://proai-datasets.s3.eu-west-3.amazonaws.com/formula1_data.csv
Downloaded to: /content/formula1_data.csv
CSV available after download: /content/formula1_data.csv
CSV resolved to: /content/formula1_data.csv


# Cell 1 — Project Constants

In this section, the main project constants are defined:

- Path to the CSV file (obtained from Cell 0).  
- Public fallback URL to download the CSV.  
- Output file name for the drivers’ standings (`Drivers_Standings_2008.txt`).  
- Scoring system of the 2008 season: 10–8–6–5–4–3–2–1.  
- Expected columns in the CSV, useful to validate the dataset structure.


In [3]:
# If Cell 0 already resolved the path, use it; otherwise, fall back to the local filename
CSV_FILE_PATH = globals().get("RESOLVED_CSV_PATH", "formula1_data.csv")

# Always use an absolute path to avoid working-directory ambiguity
CSV_FILE_PATH = os.path.abspath(CSV_FILE_PATH)

# Output file for drivers' standings
DRIVERS_STANDINGS_FILE = "Drivers_Standings_2008.txt"

# 2008 points system: 10–8–6–5–4–3–2–1 for positions 1..8
POINTS_SYSTEM_2008 = {
    1: 10, 2: 8, 3: 6, 4: 5,
    5: 4, 6: 3, 7: 2, 8: 1
}

# Expected CSV columns (used to validate dataset structure)
EXPECTED_CSV_COLUMNS = ("Driver", "Team", "Race", "Country", "Position")

# Cell 2 — Utilities

In this section, several helper functions are defined to keep the data clean and reliable:

- **Text cleaning**: removes accents, unifies apostrophes and hyphens, normalizes spaces, and converts everything to lowercase.  
- **Safe integer conversion**: converts a value to integer without breaking the workflow, using a fallback value if necessary.  
- **Points from position**: maps the finishing position to the corresponding points according to the 2008 scoring system.  

These utilities ensure consistent data and reduce the risk of errors in later steps.


In [4]:
import unicodedata

# If True, when a value cannot be converted to int a warning will be shown
WARNINGS_ENABLED = False

# Table for str.translate():
# Converts various apostrophes to "'" and various dashes to "-"
APOSTROPHE_DASH_MAP = {
    # Apostrophes / quotes
    ord("’"): "'",
    ord("‘"): "'",
    ord("´"): "'",
    ord("`"): "'",
    ord("ʼ"): "'",
    ord("‛"): "'",
    # Dashes
    ord("–"): "-",
    ord("—"): "-",
    ord("‒"): "-",
    ord("−"): "-",
}

def normalize_punctuation(text: str) -> str:
    """
    Normalize common punctuation marks (apostrophes/quotes and dashes).

    Args:
        text (str): Input text to normalize. If None, returns an empty string.

    Returns:
        str: Text with apostrophes converted to "'" and dashes converted to "-".
    """
    if text is None:
        return ""
    return str(text).translate(APOSTROPHE_DASH_MAP)

def remove_accents(text: str) -> str:
    """
    Remove accents while keeping base letters.

    Example:
        'àéîõü' -> 'aeiou'

    Args:
        text (str): Input text. If None, returns an empty string.

    Returns:
        str: Text without diacritical marks.
    """
    if text is None:
        return ""
    norm = unicodedata.normalize('NFKD', str(text))
    return "".join(c for c in norm if not unicodedata.combining(c))

def compress_spaces(text: str) -> str:
    """
    Reduce multiple spaces to one and trim leading/trailing spaces.

    Example:
        '  Lewis   Hamilton  ' -> 'Lewis Hamilton'

    Args:
        text (str): Input text. If None, returns an empty string.

    Returns:
        str: Text with normalized spaces.
    """
    if text is None:
        return ""
    parts = str(text).strip().split()
    return " ".join(parts)

def normalize_text(text: str) -> str:
    """
    Apply robust normalization, useful for comparisons (driver/team/place names).

    Steps:
        1) Normalize punctuation (apostrophes, dashes)
        2) Remove accents
        3) Convert to lowercase
        4) Compress spaces

    Args:
        text (str): Input text to normalize.

    Returns:
        str: Normalized text (ignoring case/accents/spaces).
    """
    s = normalize_punctuation(text)
    s = remove_accents(s)
    s = s.lower()
    s = compress_spaces(s)
    return s

def safe_int(value, default: int = 0, strict: bool = False, field_name: str = "Position") -> int:
    """
    Convert a value to int in tolerant or strict mode.

    - strict=False (default): if conversion fails, return 'default'.
    - strict=True: if conversion fails, raise ValueError.

    Args:
        value (Any): Value to convert to int.
        default (int): Value to use if conversion fails (tolerant mode).
        strict (bool): If True, fail with exception; otherwise use 'default'.
        field_name (str): Field name for messages/warnings.

    Returns:
        int: Converted integer or 'default' (in tolerant mode).

    Raises:
        ValueError: If 'strict' is True and conversion is not possible.
    """
    try:
        return int(str(value).strip())
    except Exception:
        if strict:
            raise ValueError(f"Non-integer value in field '{field_name}': {value!r}")
        if WARNINGS_ENABLED:
            print(f"[WARNING] '{field_name}' not valid ({value!r}). Using fallback: {default}")
        return default

def points_from_position(position) -> int:
    """
    Return the points associated with a finishing position according to POINTS_SYSTEM_2008.

    Uses 'safe_int' in tolerant mode (strict=False) to avoid breaking the workflow.

    Args:
        position (Any): Finishing position (1–8 gives points; other values → 0).

    Returns:
        int: Points corresponding to the position (0 if outside top 8 or invalid).
    """
    p = safe_int(position, default=0, strict=False, field_name="Position")
    return POINTS_SYSTEM_2008.get(p, 0)

# Cell 3 — CSV Reading + Summary

In this section, the function `read_csv_f1` is defined. Its tasks are:

- Check that the file has the correct headers.  
- Clean and normalize text fields (spaces, characters, uniformity).  
- Convert the `Position` column into a safe integer.  

At the end, several summaries are printed:

- Number of unique drivers.  
- Number of unique teams.  
- Number of unique Grand Prix races.  

Finally, a preview of the first 5 rows of the dataset is displayed for visual inspection.


In [5]:
import csv
import os

def read_csv_f1(file_path: str, strict: bool = False) -> list[dict]:
    """
    Read the F1 season CSV and return a list of cleaned rows.

    The function:
      - checks that the file exists;
      - checks that all expected columns are present (EXPECTED_CSV_COLUMNS);
      - cleans text fields (trims excess spaces);
      - converts 'Position' into int with 'safe_int' (tolerant or strict).

    Args:
        file_path (str): Path to the CSV (e.g., CSV_FILE_PATH).
        strict (bool): If True, error on invalid 'Position'; if False, use 0 as fallback.

    Returns:
        list[dict]: A list of rows (dicts) with keys:
            - "Driver" (str)
            - "Team" (str)
            - "Race" (str)
            - "Country" (str)
            - "Position" (int)

    Raises:
        FileNotFoundError: If the file does not exist.
        ValueError: If required columns are missing compared to EXPECTED_CSV_COLUMNS.
    """
    if not os.path.isfile(file_path):
        raise FileNotFoundError(
            f"File '{file_path}' not found. "
            "Make sure the CSV is in the correct folder."
        )

    rows: list[dict] = []
    # Robust reading: utf-8-sig handles potential BOM
    with open(file_path, "r", encoding="utf-8-sig", newline="") as f:
        reader = csv.DictReader(f)

        # Verify that all required columns are present
        missing = [c for c in EXPECTED_CSV_COLUMNS if c not in reader.fieldnames]
        if missing:
            raise ValueError(
                f"Missing columns in CSV: {missing}. "
                f"Headers found: {reader.fieldnames}"
            )

        # Clean each row and convert 'Position'
        for raw in reader:
            row = {
                "Driver": compress_spaces(raw.get("Driver", "")),
                "Team": compress_spaces(raw.get("Team", "")),
                "Race": compress_spaces(raw.get("Race", "")),
                "Country": compress_spaces(raw.get("Country", "")),
                # Convert to int; if strict=False, use 0 when invalid
                "Position": safe_int(
                    raw.get("Position", 0),
                    default=0,
                    strict=strict,
                    field_name="Position"
                ),
            }
            rows.append(row)
    return rows


# Try loading the CSV and print an essential summary
try:
    rows = read_csv_f1(CSV_FILE_PATH, strict=False)
    print(f"   CSV loaded: {CSV_FILE_PATH}")
    print(f"   Rows read (race results): {len(rows)}")

    # Sets to count unique elements
    drivers = set(r["Driver"] for r in rows)
    teams = set(r["Team"] for r in rows)
    grands_prix = set((r["Race"], r["Country"]) for r in rows)

    print(f"   Unique drivers: {len(drivers)}")
    print(f"   Unique teams:   {len(teams)}")
    print(f"   Unique GPs:     {len(grands_prix)}")
except Exception as e:
    print("Error while loading dataset:", e)
    raise

   CSV loaded: /content/formula1_data.csv
   Rows read (race results): 180
   Unique drivers: 10
   Unique teams:   6
   Unique GPs:     18


# Cell 3 — Dataset Preview (first 5 rows)

In this part, an extract of the first 5 cleaned rows is displayed, with normalized names and the `Position` column converted to integer, for a quick visual check.

What is displayed:

- Driver  
- Team  
- Race  
- Country  
- Position  

This step is used to verify that the headers are correct and that the `Position` values are numeric, confirming that the CSV reading and normalization have been performed correctly.


In [6]:
# PREVIEW: first 5 rows of the dataset
print("\n Preview of the first 5 rows:")
for i, r in enumerate(rows[:5], start=1):
    print(f"{i:>2}. Driver={r['Driver']}, Team={r['Team']}, Race={r['Race']}, Country={r['Country']}, Position={r['Position']}")


 Preview of the first 5 rows:
 1. Driver=Hamilton, Team=McLaren, Race=Melbourne, Country=Australia, Position=1
 2. Driver=Massa, Team=Ferrari, Race=Melbourne, Country=Australia, Position=0
 3. Driver=Raikkonen, Team=Ferrari, Race=Melbourne, Country=Australia, Position=8
 4. Driver=Kubica, Team=BMW, Race=Melbourne, Country=Australia, Position=0
 5. Driver=Alonso, Team=Renault, Race=Melbourne, Country=Australia, Position=4


# Cell 4 — Driver Performance

In this section, the function `driver_performance` is defined. It calculates the performance of a single driver in the 2008 season.

**Input:** full name or last name of the driver.  
**Output:** list with three values in the order  
`[total_points, wins, podiums]`.

**Driver lookup logic:**
- Exact match on full name, after normalization (spaces, accents, case).  
- If not found, try with the last name only.  
- If the last name is unique, return the data.  
- If the last name matches multiple drivers, raise an error asking for a more specific input.  

This function is useful to analyze individual performances robustly, even when drivers share the same last name.


In [8]:
def _extract_last_name(name: str) -> str:
    """
    Extract the last name from a full name and make it comparable
    (remove accents and extra spaces, unify punctuation, lowercase).

    Uses the normalization pipeline (punctuation, accents, lowercase, spaces)
    to make comparisons reliable.

    Args:
        name (str): Full or partial driver name (e.g., "Lewis Hamilton").

    Returns:
        str: Normalized last name (e.g., "hamilton").
             If the name has a single word, returns that normalized.
             If the string is empty, returns "".
    """
    norm = normalize_text(name)
    parts = norm.split()
    return parts[-1] if parts else norm


def driver_performance(rows: list[dict], driver_name: str) -> list[int]:
    """
    Compute a driver's performance in the 2008 season.

    First searches by full name (after normalization). If not found,
    tries with the last name; if the last name is ambiguous, asks for a more specific input.

    Args:
        rows (list[dict]): CSV rows read with 'read_csv_f1'.
        driver_name (str): Full name or last name (e.g., "Lewis Hamilton" or "Hamilton").

    Returns:
        list[int]: A list of 3 integers in this order:
            [total_points, wins_count, podiums_count]

    Raises:
        ValueError: If 'driver_name' is empty, not in the dataset,
                    or the last name matches multiple drivers.
    """
    if not isinstance(driver_name, str) or not driver_name.strip():
        raise ValueError("Please provide a non-empty driver name.")

    target_norm = normalize_text(driver_name)
    target_last = _extract_last_name(driver_name)

    def accumulate_on(predicate) -> tuple[int, list[int]]:
        """
        Sum the driver's stats over rows satisfying 'predicate'.

        Returns:
            tuple[int, list[int]]: (occurrences, [points, wins, podiums])
        """
        total_points = 0
        wins = 0
        podiums = 0
        occ = 0

        for r in rows:
            if predicate(r):
                occ += 1
                pos = r["Position"]

                # Sum points according to the 2008 scoring system
                total_points += points_from_position(pos)

                # Count wins (P1) and podiums (P1–P3)
                if pos == 1:
                    wins += 1
                    podiums += 1
                elif pos in (2, 3):
                    podiums += 1

        return occ, [total_points, wins, podiums]

    # Exact match on full normalized name
    occ, result = accumulate_on(lambda r: normalize_text(r["Driver"]) == target_norm)
    if occ > 0:
        return result

    # Fallback: try with the last name (normalized)
    matched_drivers = sorted({r["Driver"] for r in rows if _extract_last_name(r["Driver"]) == target_last})
    if len(matched_drivers) == 1:
        occ, result = accumulate_on(lambda r: _extract_last_name(r["Driver"]) == target_last)
        if occ > 0:
            return result
    elif len(matched_drivers) > 1:
        raise ValueError(f"Ambiguous last name '{driver_name}'. Possible drivers: {', '.join(matched_drivers)}")

    raise ValueError(f"Driver '{driver_name}' not found in the dataset.")

# Cell 5 — Drivers’ Standings + TXT Export

In this section, functions are defined to build and save the final drivers’ standings.

## Functions
- `create_drivers_standings` → returns a dictionary of the form  
  `{ "First Last": points }`.
- `save_drivers_standings_txt` → saves the standings to a text file named  
  `Drivers_Standings_2008.txt`.

## Standings sorting
- Drivers are ordered by total points, from highest to lowest.  
- In case of a tie, sorting falls back to alphabetical order by name.


In [9]:
def create_drivers_standings(rows: list[dict]) -> dict[str, int]:
    """
    Build the points standings for each driver using the full name as key.

    For each race row, sums the points computed by 'points_from_position(Position)'.
    The driver name is taken as written in the CSV ("Driver" field).

    Args:
        rows (list[dict]): Dataset rows (output of 'read_csv_f1').

    Returns:
        dict[str, int]: Dictionary with:
            - key   = driver's full name (str, exactly as in the CSV)
            - value = total points (int) summed over all races

    Note:
        No sorting is applied here (a dict is returned).
    """
    standings: dict[str, int] = {}
    for r in rows:
        full_name = r["Driver"].strip()
        standings[full_name] = standings.get(full_name, 0) + points_from_position(r["Position"])
    return standings


def save_drivers_standings_txt(
    drivers_standings: dict[str, int],
    output_path: str = DRIVERS_STANDINGS_FILE,
    strict: bool = False
) -> tuple[str, list[tuple[str, int]]]:
    """
    Save the drivers' standings to a TXT file with a header and one line per driver.

    Output ordering: descending by points, then ascending by normalized name.
    The file is always overwritten and ends with a trailing newline.

    Args:
        drivers_standings (dict[str, int]): {driver_name: total_points}.
        output_path (str): Path of the text file to create/overwrite.
        strict (bool): If True, re-raise I/O errors (OSError).

    Returns:
        tuple[str, list[tuple[str, int]]]:
            - saved file path (str)
            - sorted standings as a list of tuples (driver_name, points)

    Raises:
        OSError: Only if 'strict=True' and a write error occurs.
    """
    sorted_standings = sorted(
        drivers_standings.items(),
        key=lambda kv: (-kv[1], normalize_text(kv[0]))
    )

    try:
        lines = ["Drivers Standings 2008 Formula 1"]
        for full_name, pts in sorted_standings:
            lines.append(f"{full_name}: {pts}")
        with open(output_path, "w", encoding="utf-8") as f:
            f.write("\n".join(lines) + "\n")
    except Exception as e:
        print(f"Error while saving drivers' standings to '{output_path}':", e)
        if strict:
            raise
    else:
        # Verify the file was written correctly.
        try:
            ok = os.path.isfile(output_path) and os.path.getsize(output_path) > 0
            if not ok:
                print("[WARNING] The file may not have been written correctly (zero size).")
            else:
                print(f"File saved: {output_path}")
        except Exception as ver:
            print("[WARNING] Could not verify the file that was just written:", ver)

    return output_path, sorted_standings

# Cell 6 — Constructors’ Standings

In this section, the function `constructors_standings_from_drivers` is defined. It computes the teams’ (constructors) standings from the already generated drivers’ standings.

## How it works
- Reuses the drivers’ standings as the calculation base.  
- Sums each driver’s points to their team (driver → team mapping).  
- Returns a dictionary of the form:  
  `{ "Team Name": total_points }`.

## Robust driver → team mapping
- The helper `map_driver_to_team` associates each driver with the **first** team encountered (consistent with the 2008 championship).  
- If the same driver appears with different teams in the data, a warning is printed and the **first** team found is kept.  

This ensures team points stay aligned with the drivers’ standings and detects potential inconsistencies without interrupting execution.


In [10]:
def map_driver_to_team(rows: list[dict]) -> dict[str, str]:
    """
    Build the driver → team mapping for the 2008 season.

    In 2008, each driver has a single team. If the same driver appears multiple times
    in the file, we take the first team encountered and keep it for all their races
    (we don't change it afterwards).

    Args:
        rows (list[dict]): Dataset rows (output of 'read_csv_f1').

    Returns:
        dict[str, str]: {"First Last": "Team"}.
    """
    driver_team: dict[str, str] = {}
    # Associate each driver with the first team encountered
    for r in rows:
        d = r["Driver"].strip()
        t = r["Team"].strip()
        # If the driver already exists, don't update; warn if the team differs
        if d in driver_team:
            if driver_team[d] != t:
                print(f"[WARNING] Driver with multiple teams in dataset: {d} -> {driver_team[d]} / {t} (keeping the first)")
            continue
        driver_team[d] = t
    return driver_team


def constructors_standings_from_drivers(rows: list[dict], drivers_standings: dict[str, int]) -> dict[str, int]:
    """
    Compute the constructors' standings by summing each driver's points to their team.

    How it works:
      - Derive the driver → team map from the dataset using 'map_driver_to_team(rows)'.
      - For each driver in 'drivers_standings', add their points to the corresponding team.
      - If a driver is not in the mapping (rare/anomalous), use the team "(Unknown Team)".

    Args:
        rows (list[dict]): Dataset rows (used to create the driver → team map).
        drivers_standings (dict[str, int]): Dictionary with:
            - key   = driver's full name (as in the CSV)
            - value = driver's total points (int)

    Returns:
        dict[str, int]: Constructors dictionary with:
            - key   = team name (str)
            - value = total points (int) aggregated from that team's drivers

    Note:
        No sorting is applied here (a dict is returned).
        In case of inconsistencies (same driver with different teams in the file),
        'map_driver_to_team' keeps the first team and prints a warning.
    """
    # Get the driver → team dictionary from the dataset
    driver_team = map_driver_to_team(rows)

    # Sum each driver's points to their team
    team_points: dict[str, int] = {}
    for driver, points in drivers_standings.items():
        # If the mapping is missing (anomalous case), use a fallback team
        team = driver_team.get(driver, "(Unknown Team)")
        team_points[team] = team_points.get(team, 0) + points
    return team_points


# COMPATIBILITY ALIAS (avoids NameError for existing code)
def constructors_standings(rows: list[dict]) -> dict[str, int]:
    """
    Convenience alias: compute constructors' standings starting only from dataset rows.

    Internally:
      1) builds the drivers' standings with 'create_drivers_standings(rows)',
      2) aggregates points per team with 'constructors_standings_from_drivers(rows, drivers_standings)'.

    Args:
        rows (list[dict]): Dataset rows (output of 'read_csv_f1').

    Returns:
        dict[str, int]: Dictionary {team: total_points} (unsorted).
    """
    drivers_standings = create_drivers_standings(rows)
    return constructors_standings_from_drivers(rows, drivers_standings)

# Cell 7 — Driver Performance Test

This section verifies the correct behavior of the `driver_performance` function.

**Manual check**  
- **Variable input:** test several name variants, e.g., `Lewis HAMILTON`, `léwis hamilton`, `Hamilton`.  
- **Expected output:** a list of the form `[total_points, wins, podiums]`.  

This test ensures that name normalization works correctly, producing the same result even with different input formats.


In [12]:
# Test name with double spaces and uppercase to check normalization
test_name = "Lewis  HAMILTON"
try:
    result = driver_performance(rows, test_name)
    print(f"Performance of '{test_name}' -> [points, wins, podiums] = {result}")
except Exception as e:
    print("Error:", e)

Performance of 'Lewis  HAMILTON' -> [points, wins, podiums] = [98, 5, 10]


# Cell 8 — Drivers’ Standings (preview)

In this section, the drivers’ standings are generated and displayed.

## Steps
- Create the standings with `create_drivers_standings`.
- Save the standings to a text file with `save_drivers_standings_txt`.
- Display the top 10 positions in the console.

## Goal
Verify the correct ordering of drivers and the consistency of the calculated points with the 2008 scoring system.


In [13]:
# Compute total points for each driver
drivers_dict = create_drivers_standings(rows)
# Save standings to TXT and also get the sorted version
path, sorted_standings = save_drivers_standings_txt(drivers_dict, DRIVERS_STANDINGS_FILE)

# Show a small preview of the top 10 positions
print("Top 10 positions:")
for i, (name, pts) in enumerate(sorted_standings[:10], start=1):
    print(f"{i:>2}. {name}: {pts}")

File saved: Drivers_Standings_2008.txt
Top 10 positions:
 1. Hamilton: 98
 2. Massa: 97
 3. Kubica: 75
 4. Raikkonen: 75
 5. Alonso: 61
 6. Heidfeld: 60
 7. Kovalainen: 53
 8. Vettel: 35
 9. Trulli: 31
10. Glock: 25


# Cell 9 — Constructors’ Standings (preview)

In this section, the constructors’ standings are computed and displayed.

## Steps
- Use the function `constructors_standings_from_drivers`.  
- Reuse the already calculated drivers’ data.  
- Print the top 10 teams in the console.

## Goal
Verify that the constructors’ aggregated points are consistent with the results obtained by individual drivers.


In [15]:
# Compute team points from the drivers' standings
team_dict = constructors_standings_from_drivers(rows, drivers_dict)
# Sort by total points (descending), then by normalized team name (ascending)
sorted_teams = sorted(team_dict.items(), key=lambda kv: (-kv[1], normalize_text(kv[0])))

# Print a small preview of the top 10 teams (teams with 0 points are not explicitly filtered out)
print("Top 10 Constructors:")
for i, (team, pts) in enumerate(sorted_teams[:10], start=1):
    print(f"{i:>2}. {team}: {pts}")

Top 10 Constructors:
 1. Ferrari: 172
 2. McLaren: 151
 3. BMW: 135
 4. Renault: 61
 5. Toyota: 56
 6. Toro Rosso: 35


# TEST Cell — Simple Asserts

In this section, a few automated checks are executed to confirm the pipeline is working correctly.

## Checks performed
- **Hamilton’s performance:** verify expected points and podiums.  
- **Drivers’ standings:** verify that the winner is Hamilton with **98 points**.  
- **Constructors’ standings:** verify that the top team is **Ferrari**.  
- **Total consistency:** the sum of drivers’ points must match the sum of constructors’ points.  
- **Output file:** `Drivers_Standings_2008.txt` must exist and be correctly written.

## Goal
Run quick tests to ensure that all the main features are implemented correctly.


In [16]:
# Toggle: set to False to disable tests
RUN_TESTS = True

if RUN_TESTS:
    # Verify driver performance using last name and full name
    expected_ham = [98, 5, 10]
    assert driver_performance(rows, "Hamilton") == expected_ham, "Hamilton test (last name) failed"
    assert driver_performance(rows, "Lewis Hamilton") == expected_ham, "Hamilton test (full name) failed"

    # Build drivers' standings and check winner and score
    drivers_stand = create_drivers_standings(rows)
    top_name, top_points = sorted(drivers_stand.items(), key=lambda kv: (-kv[1], normalize_text(kv[0])))[0]
    assert normalize_text(top_name) == "hamilton", "Top of drivers' standings must be Hamilton"
    assert top_points == 98, "Expected Hamilton score: 98"

    # Derive constructors' standings from drivers' standings
    teams_stand = constructors_standings_from_drivers(rows, drivers_stand)

    # Check that the top constructor is Ferrari
    top_team, team_points = sorted(teams_stand.items(), key=lambda kv: (-kv[1], normalize_text(kv[0])))[0]
    assert normalize_text(top_team) == "ferrari", "Expected top constructor: Ferrari"

    # Verify that the TXT file exists and has the correct header
    try:
        with open(DRIVERS_STANDINGS_FILE, "r", encoding="utf-8") as f:
            lines = [l.rstrip("\n") for l in f]
        assert lines and lines[0].startswith("Drivers Standings 2008 Formula 1"), "Unexpected TXT header or empty file"
    except FileNotFoundError:
        raise AssertionError("TXT file not found: run the cell that saves the drivers' standings (Function 2).")

    # Check the saved file
    assert os.path.getsize(DRIVERS_STANDINGS_FILE) > 0, "TXT file unexpectedly empty"
    with open(DRIVERS_STANDINGS_FILE, "rb") as f:
        # Go to the last byte
        f.seek(-1, os.SEEK_END)
        assert f.read(1) == b"\n", "Missing trailing newline in TXT"

    print("Quick tests passed.")
else:
    print("Tests disabled (RUN_TESTS=False).")

Quick tests passed.


# EXTRA TEST Cell — Additional (optional) validations

In this section, additional tests are executed to further validate code correctness.

## Checks performed
- Helper `_expect_value_error(...)`: verifies that a function raises `ValueError` when expected.  
- **Test 1:** `driver_performance('', ...)` → must raise `ValueError`.  
- **Test 2:** `driver_performance('Nonexistent Driver', ...)` → must raise `ValueError`.  
- **Test 3:** the sum of drivers’ points must match the sum of constructors’ points.  
- **Test 4:** for every row with `Position == 0`, the points must be `0`.  

## Goal
Confirm that the system correctly handles errors, inconsistencies, and edge cases.  
If all tests pass, it prints: **“Extra tests passed.”**


In [17]:
def _expect_value_error(function, *args, **kwargs) -> bool:
    """
    Execute the function with the given arguments and verify that it raises ValueError.

    Returns:
        bool: True if a ValueError was raised.

    Raises:
        AssertionError: If no error is raised or if an error different from ValueError is raised.
    """
    try:
        function(*args, **kwargs)
    except ValueError:
        return True
    except Exception as e:
        raise AssertionError(f"Expected ValueError, but got {type(e).__name__}: {e}") from e
    else:
        raise AssertionError("Expected ValueError, but no exception was raised.")


# Must raise ValueError with empty name
assert _expect_value_error(driver_performance, rows, ""), \
    "driver_performance('') should raise ValueError"

# Must raise ValueError with nonexistent driver
assert _expect_value_error(driver_performance, rows, "Nonexistent Driver"), \
    "driver_performance('Nonexistent Driver') should raise ValueError"

# The sum of drivers' points must match the sum of constructors' points
drivers_dict__test = create_drivers_standings(rows)
teams_dict__test = constructors_standings_from_drivers(rows, drivers_dict__test)
assert sum(drivers_dict__test.values()) == sum(teams_dict__test.values()), \
    "Sum of drivers' and constructors' points does NOT match: possible inconsistency!"

# Every row with Position == 0 must yield 0 points
for r in rows:
    if r["Position"] == 0:
        assert points_from_position(r["Position"]) == 0, \
            f"Wrong score for Position=0 in row: {r}"

print("Extra tests passed.")

Extra tests passed.


# FINAL Cell — Demo

In this section, a complete demonstration of the project is executed.

## Demo contents
- Display the performance of an example driver.  
- Print the top 10 positions of the drivers’ standings.  
- Print the top 10 positions of the constructors’ standings.  
- Save or update the file `Drivers_Standings_2008.txt`.  

## Goal
Summarize the entire project into a clear and readable output.


In [18]:
# Change here if you want a different driver
DEMO_DRIVER = "Lewis Hamilton"

def _print_table(rows_to_print, headers=None, min_width=0, title: str = None, title_char: str = "=") -> None:
    """
    Print a left-aligned text table.

    Args:
        rows_to_print (iterable): Rows to print, e.g., [(val1, val2, ...), ...].
        headers (iterable | None): Column headers (optional).
        min_width (int): Minimum width for each column.
        title (str | None): Title above the table (optional).
        title_char (str): Character used to underline the title (default "=").
    """
    # Print an optional title above the table
    if title:
        print(title)
        print(title_char * len(title))

    # Prepare data and compute minimum column widths
    if headers:
        rows_with_headers = [headers] + list(rows_to_print)
    else:
        rows_with_headers = list(rows_to_print)

    col_count = max((len(r) for r in rows_with_headers), default=0)
    widths = [min_width] * col_count
    for r in rows_with_headers:
        for i, cell in enumerate(r):
            widths[i] = max(widths[i], len(str(cell)))

    # Format a row with aligned columns
    def fmt_row(row):
        return " | ".join(str(cell).ljust(widths[i]) for i, cell in enumerate(row))

    # Print header and separator
    if headers:
        print(fmt_row(headers))
        print("-+-".join("-" * w for w in widths))

    # Print data rows
    for r in rows_to_print:
        print(fmt_row(r))

# Selected driver's performance
if DEMO_DRIVER.strip():
    try:
        points, wins, podiums = driver_performance(rows, DEMO_DRIVER)
        print()
        _print_table(
            [(points, wins, podiums)],
            headers=("Total points", "Wins", "Podiums"),
            min_width=6,
            title=f"Performance of '{DEMO_DRIVER}'"
        )
    except Exception as e:
        print(f"\nError: {e}")
else:
    print("\n(Driver performance: disabled because DEMO_DRIVER is empty)")

# Compute and save drivers' standings with a top-10 preview
drivers_dict = create_drivers_standings(rows)
path, sorted_standings = save_drivers_standings_txt(drivers_dict, DRIVERS_STANDINGS_FILE)

# Prepare top 10 drivers for tabular print
top10_drivers = [(i+1, name, pts) for i, (name, pts) in enumerate(sorted_standings[:10])]
print()
_print_table(
    top10_drivers,
    headers=("Pos", "Driver", "Points"),
    min_width=3,
    title="Top 10 Drivers 2008"
)

# Compute constructors' standings and prepare top 10
team_dict = constructors_standings_from_drivers(rows, drivers_dict)
sorted_teams = sorted(team_dict.items(), key=lambda kv: (-kv[1], normalize_text(kv[0])))
top10_teams = [(i+1, team, pts) for i, (team, pts) in enumerate(sorted_teams[:10])]

print()
_print_table(
    top10_teams,
    headers=("Pos", "Team", "Points"),
    min_width=3,
    title="Top 10 Constructors 2008"
)


Performance of 'Lewis Hamilton'
Total points | Wins   | Podiums
-------------+--------+--------
98           | 5      | 10     
File saved: Drivers_Standings_2008.txt

Top 10 Drivers 2008
Pos | Driver     | Points
----+------------+-------
1   | Hamilton   | 98    
2   | Massa      | 97    
3   | Kubica     | 75    
4   | Raikkonen  | 75    
5   | Alonso     | 61    
6   | Heidfeld   | 60    
7   | Kovalainen | 53    
8   | Vettel     | 35    
9   | Trulli     | 31    
10  | Glock      | 25    

Top 10 Constructors 2008
Pos | Team       | Points
----+------------+-------
1   | Ferrari    | 172   
2   | McLaren    | 151   
3   | BMW        | 135   
4   | Renault    | 61    
5   | Toyota     | 56    
6   | Toro Rosso | 35    


In [19]:
print("\n=== Conclusions ===")
print("- Drivers' champion: Lewis Hamilton with 98 points (according to the tests).")
print("- Top constructor: Ferrari (according to the tests).")
print("- The workflow relies entirely on the Standard Library (csv, os, urllib).")


=== Conclusions ===
- Drivers' champion: Lewis Hamilton with 98 points (according to the tests).
- Top constructor: Ferrari (according to the tests).
- The workflow relies entirely on the Standard Library (csv, os, urllib).


# Conclusions

The analysis of the 2008 Formula 1 season shows that Lewis Hamilton confirmed himself as the drivers’ champion with a total of 98 points, thanks to a high number of wins and podiums.  
At the team level, Ferrari ranked first, achieving the highest overall score by combining the performances of its drivers.

This project demonstrated how, starting from a simple CSV file and relying solely on the Python Standard Library, it is possible to build reliable analysis pipelines, compute standings, and validate data with automated checks.
