# Trampoline Competition Scheduler

## Breakdown of Data Importing and Processing
### 1. Data Importing

Via Pandas, we can import data from any Excel-based file, including TrampOnline's .xls files, convert it into a .csv file, and view it. This section deals with the import process.

In [1]:
# Import required packages.

# Import pandas for data analysis.
import pandas as pd

In [None]:
# Read the TrampOnline .xls Excel file.
df = pd.read_excel("data/TrampOnline_Sample_Competitors_Not_Encoded.xls", sheet_name = "Sheet1", header = 0)

# Define a mapping of common corrupted sequences to correct Unicode characters.
corrupted_map = {
    'à': 'Á',
    'Ã': 'Á',
    'Ã¤': 'ä',
    'Ã„': 'Ä',
    'Ã¶': 'ö',
    'Ã–': 'Ö',
    'Ã¼': 'ü',
    'Ãœ': 'Ü',
    'ÃŸ': 'ß',
    'Ã¡': 'á',
    'Ã©': 'é',
    'Ã¨': 'è',
    'Ã¢': 'â',
    'Ã´': 'ô',
    'Ãª': 'ê',
    'Ã­': 'í',
    'Ã³': 'ó',
    'Ã“': 'Ó',
    'Ã±': 'ñ',
    'Ã': 'à',
    'â€œ': '“',
    'â€': '”',
    'â€˜': '‘',
    'â€™': '’',
    'â€“': '–',
    'â€”': '—',
}

# Define a function to fix corrupted sequences in each string.
def fix_mojibake(text):
    if isinstance(text, str):
        for bad, good in corrupted_map.items():
            text = text.replace(bad, good)
    return text

# Apply fix to all string columns in the data frame.
for col in df.select_dtypes(include = ["object"]).columns:
    df[col] = df[col].apply(fix_mojibake)

# Save the cleaned data to a UTF-8 .csv file.
df.to_csv("data/created_files/01_TrampOnline_Sample_Competitors.csv", index = False, encoding = "utf-8-sig")

In [None]:
# Read the .csv file with all the entries.
df = pd.read_csv("data/created_files/01_TrampOnline_Sample_Competitors.csv", keep_default_na = True, delimiter = ",", skipinitialspace = True, encoding = "utf-8-sig")

### 2. Data Cleaning

The data should be checked and ensured to not have been corrupted or similar. This section performs some sanity checks on the data and then standardises the types.

In [4]:
# Find the shape of the data in the format (number of rows, number of columns).
df.shape

(117, 11)

In [5]:
# Display the first five results, starting at index zero.
df.head(5)

Unnamed: 0,ID,Name,Club,ClassName,StartOrder,Discipline,Team,Team_Category,Guest,flight,photo_consent
0,10134,Derbiled Áed,DCU,Novice Women,20,TRI,A,,0,1.0,1
1,10304,Anwar Fateh,MU,Novice Men,12,TRI,,,0,1.0,1
2,10403,Karpos Pankratios,QUB,Intermediate Men,1,TRI,,,0,1.0,1
3,10999,Sören Shiva,TCD,Intervanced Men,45,TRI,A,,0,,1
4,11568,Yolotzin Thibaut,UCC,Intermediate Women,65,TRI,A,,0,,1


In [6]:
# Display the last five results.
df.tail(5)

Unnamed: 0,ID,Name,Club,ClassName,StartOrder,Discipline,Team,Team_Category,Guest,flight,photo_consent
112,13896,Alexandra Hamilton,UCD,DMT Level 3,6,DMT,,,0,1.0,1
113,13900,Francis Know & Freddie Kaufman,Hull,Synchro Level 1,3,TRS,,,0,3.0,1
114,13905,Maya Solis & Jayden Conway,Glasgow,Synchro Level 1,5,TRS,,,0,2.0,1
115,13941,Tanisha Riggs & Eliza Jefferson (UL),UCC,Synchro Level 3,22,TRS,,,0,2.0,1
116,14393,Sniffmoo Piebean & Kamal Cruz (UCC),QUB,Synchro Level 1,98,TRS,,,0,1.0,1


In [7]:
# Check the types of the .csv file.
df.dtypes

ID                 int64
Name              object
Club              object
ClassName         object
StartOrder         int64
Discipline        object
Team              object
Team_Category    float64
Guest              int64
flight           float64
photo_consent      int64
dtype: object

Based on the above data types, the following modifications should be made.

- **ID:** Perfectly fine as an integer.

- **Name:** Should be converted into a string.

- **Club:** Should be converted into a category.

- **ClassName:** Should be converted into a category.

- **StartOrder:** Perfectly fine as an integer.

- **Discipline:** Should be converted into a category.

- **Team:** Should be converted into a string.

- **Team_Category:** Should be converted into a category.

- **Guest:** Should be converted into a Boolean.

- **flight:** Perfectly fine as a float.

- **photo_consent:** Should be converted into a Boolean.

In [8]:
# Change all data fields to the appropriate type.
df["Name"] = df["Name"].astype("string")
df["Club"] = df["Club"].astype("category")
df["ClassName"] = df["ClassName"].astype("category")
df["Discipline"] = df["Discipline"].astype("category")
df["Team"] = df["Team"].astype("string")
df["Team_Category"] = df["Team_Category"].astype("category")
df["Guest"] = df["Guest"].astype("boolean")
df["photo_consent"] = df["photo_consent"].astype("boolean")

#### Sanity Checks

This section checks for duplicate rows and the number of unique columns.

In [9]:
# Check for duplicate columns.
# If the result is "Empty DataFrame", then there are no duplicates.
print(df[df.duplicated()])

Empty DataFrame
Columns: [ID, Name, Club, ClassName, StartOrder, Discipline, Team, Team_Category, Guest, flight, photo_consent]
Index: []


In [10]:
# Check for the number of unique values in each column. In big competitions, the number of trampoline, DMT, and tumbling competitor categories
# should equal the number of levels plus one to account for anyone not competing.
df.nunique()

ID               117
Name              80
Club              21
ClassName         25
StartOrder        62
Discipline         4
Team               2
Team_Category      0
Guest              2
flight             3
photo_consent      1
dtype: int64

In [None]:
# Save the updated data frame back to a .csv file.
df.to_csv("data/created_files/02_Updated_Sample_TrampOnline_Data.csv", index = False, encoding = "utf-8-sig")

In [None]:
# Read the .csv file with all the entries.
df = pd.read_csv("data/created_files/02_Updated_Sample_TrampOnline_Data.csv", keep_default_na = True, delimiter = ",", skipinitialspace = True, encoding = "utf-8-sig")

## Competitor Scheduling

The following section will get into organising the competitors into their groups.

There are seven trampoline categories to consider: Novice, Intermediate, Intervanced, Advanced, Elite, Elite-Pro, and Disability (any category). Competitors should be separated based on these levels, and then again by their category. The only exception to the category separation rule is *"Disability (any category)"* as it is not usually split into Men and Women categories.

Once this is done, a check should be done to investigate the number of competitors in categories and decide on how many flights are appropriate. A general rule of thumb is 12 competitors per flight, however this isn't a hard rule. There may be more or less.

### Individual Trampoline (TRI), Double-Mini Trampoline (DMT), Tumbling (TUM), and Synchronised Trampoline (TRS) Level Ordering

The first step is to organise competitors by level. The levels should be put into a specific order (generally from lowest to highest), and then competitors should be split by category (only in TRI).

In [None]:
# Define the order of individual trampoline (TRI) levels.
trampoline_level_order = ["Novice", "Intermediate", "Intervanced", "Advanced", "Elite", "Elite-Pro", "Disability (any category)"]

# Define the order of double-mini-trampoline (DMT) levels.
dmt_level_order = ["DMT Level 1", "DMT Level 2", "DMT Level 3", "DMT Level 4", "DMT Level 5", "DMT Level 6"]

# Define the order of tumbling (TUM) levels.
tum_level_order = ["Tumbling Level 1", "Tumbling Level 2", "Tumbling Level 3", "Tumbling Level 4"]

# Define the order of synchronised trampoline (TRS) levels.
trs_level_order = ["Lower Synchro", "Higher Synchro", "Synchro Level 1", "Synchro Level 2", "Synchro Level 3"]

# ---------------------- TRI PROCESSING ---------------------- #

# Filter for individual trampoline (TRI) competitors.
tri_df = df[df["Discipline"] == "TRI"].copy()

# Split "ClassName" into level and category.
tri_df[["tri_level", "tri_category"]] = tri_df["ClassName"].str.extract(r"^(.*?)\s+(Men|Women)$")

# Convert level into an ordered categorical.
tri_df["tri_level"] = pd.Categorical(tri_df["tri_level"], categories = trampoline_level_order, ordered = True)

# Sort by level and category.
tri_df = tri_df.sort_values(by = ["tri_level", "tri_category", "Name", "Club"]).reset_index(drop = True)

# Remove the "tri_level" and "tri_category" columns.
tri_df = tri_df.drop(columns = ["tri_level", "tri_category"])

# ---------------------- DMT PROCESSING ---------------------- #

# Filter for double-mini trampoline (DMT) competitors.
dmt_df = df[df["Discipline"] == "DMT"].copy()

# Extract DMT level directly from ClassName.
dmt_df["dmt_level"] = pd.Categorical(dmt_df["ClassName"], categories = dmt_level_order, ordered = True)

# Sort by DMT level, then name and club.
dmt_df = dmt_df.sort_values(by = ["dmt_level", "Name", "Club"]).reset_index(drop = True)

# Remove the "dmt_level" column.
dmt_df = dmt_df.drop(columns = ["dmt_level"])

# ---------------------- TUM PROCESSING ---------------------- #

# Filter for tumbling (TUM) competitors.
tum_df = df[df["Discipline"] == "TUM"].copy()

# Extract TUM level directly from ClassName.
tum_df["tum_level"] = pd.Categorical(tum_df["ClassName"], categories = tum_level_order, ordered = True)

# Sort by TUM level, then name and club.
tum_df = tum_df.sort_values(by = ["tum_level", "Name", "Club"]).reset_index(drop = True)

# Remove the "tum_level" column.
tum_df = tum_df.drop(columns = ["tum_level"])

# ---------------------- TRS PROCESSING ---------------------- #

# Filter for synchronised trampoline (TRS) competitors.
trs_df = df[df["Discipline"] == "TRS"].copy()

# Extract TRS level directly from ClassName.
trs_df["trs_level"] = pd.Categorical(trs_df["ClassName"], categories = trs_level_order, ordered = True)

# Sort by TRS level, then name and club.
trs_df = trs_df.sort_values(by = ["trs_level", "Name", "Club"]).reset_index(drop = True)

# Remove the "trs_level" column.
trs_df = trs_df.drop(columns = ["trs_level"])

# ---------------------- MERGE AND OUTPUT ---------------------- #

# Get all other competitors, in case there are any not accounted for (not TRI, DMT, TUM, or TRS).
other_df = df[~df["Discipline"].isin(["TRI", "DMT", "TUM", "TRS"])]

# Combine all the disciplines in the defined order.
df = pd.concat([tri_df, dmt_df, tum_df, trs_df, other_df], ignore_index = True)

# Drop exact duplicates.
df = df.drop_duplicates()

# Coerce string columns to UTF-8 explicitly.
for col in ["Name", "Club", "ClassName"]:
    df[col] = df[col].astype(str).str.strip()

# Save the updated data frame back to a .csv file.
df.to_csv("data/created_files/03_Organised_Sample_TrampOnline_Data.csv", index = False, encoding = "utf-8-sig")

# Print the sorted dataset.
print(df)

        ID                                    Name    Club        ClassName  \
0    13522                           Aidan Navarro     UCD       Novice Men   
1    13001                         Antwerp Flazgod     DCU       Novice Men   
2    12999                         Antwerp Flazgod  Exeter       Novice Men   
3    10304                             Anwar Fateh      MU       Novice Men   
4    13058                        Clarence Acevedo  Durham       Novice Men   
..     ...                                     ...     ...              ...   
112  13379              Sören Shiva & Sandra Burke     TCD  Synchro Level 2   
113  13385                   Akma 2 & Echo Longray     TCD  Synchro Level 3   
114  13392  Bjoern Pierrick & Shoshanna Assol (UL)     UCC  Synchro Level 3   
115  13381              Kaitlyn Hart & Kieran Hart     UCD  Synchro Level 3   
116  13941    Tanisha Riggs & Eliza Jefferson (UL)     UCC  Synchro Level 3   

     StartOrder Discipline Team  Team_Category  Gue

In [None]:
# Read the .csv file with all the entries.
df = pd.read_csv("data/created_files/03_Organised_Sample_TrampOnline_Data.csv", keep_default_na = True, delimiter = ",", skipinitialspace = True, encoding = "utf-8-sig")

The below code will transform the data frame into a block-formatted table structured for Excel.

In [15]:
def format_discipline_sheet(df_subset, add_plus=False):
    # Return early if the data frame is empty.
    if df_subset.empty:
        return pd.DataFrame(), [], []

    # Preserve the original class order and initialise outputs.
    class_order = df_subset["ClassName"].drop_duplicates()
    group_blocks = []          # List of data frames for each group block.
    level_titles = []          # List of class names (used for headers).
    group_sizes = []           # List of total competitor count per class.

    # Process each class one by one.
    for class_name in class_order:
        # Convert "Women" to "Ladies" for display.
        class_display = class_name.replace("Women", "Ladies")
        if add_plus:
            class_display += " +"  # Optionally add a plus.
        level_titles.append(class_display)

        # Filter the subset for the current class.
        group = df_subset[df_subset["ClassName"] == class_name].reset_index(drop = True)

        # Add a row number column.
        group.insert(0, "#", range(1, len(group) + 1))

        # Keep only the desired display columns.
        group = group[["#", "Name", "Club"]]

        # Split into smaller flights (e.g. of max 12 people).
        flights = split_into_flights(group)
        group_sizes.append(sum(len(f) for f in flights))

        # === Construct full block for one class === #

        # Add leading blank row, then column headers, then spacer.
        level_blank = pd.DataFrame([["", "", ""]], columns = group.columns)
        col_headers = pd.DataFrame([["#", "Name", "Club"]], columns = group.columns)
        spacer = pd.DataFrame([["", "", ""]], columns = group.columns)
        full_block = pd.concat([level_blank, col_headers, spacer], ignore_index = True)

        # Append each flight: empty row + header + content.
        for i, flight_df in enumerate(flights, start = 1):
            empty_row = pd.DataFrame([["", "", ""]], columns = group.columns)
            flight_header = pd.DataFrame([[f"Flight {i}", "", ""]], columns = group.columns)
            full_block = pd.concat([full_block, empty_row, flight_header, flight_df], ignore_index = True)

        # Store this block.
        group_blocks.append(full_block)

    # Ensure all blocks are the same height by padding with blank rows.
    max_height = max(len(g) for g in group_blocks)
    for i, block in enumerate(group_blocks):
        group_blocks[i] = block.reindex(range(max_height)).reset_index(drop = True)

    # Add a blank spacer column between each group block.
    spaced_blocks = []
    for i, block in enumerate(group_blocks):
        spaced_blocks.append(block)
        if i < len(group_blocks) - 1:
            spacer_col = pd.DataFrame([""] * max_height, columns = [""])
            spaced_blocks.append(spacer_col)

    # Concatenate all blocks side-by-side into a single data frame.
    final_df = pd.concat(spaced_blocks, axis = 1)
    return final_df, level_titles, group_sizes

def split_into_flights(df, max_flight_size = 12):
    total = len(df)
    if total <= 15:
        return [df.copy()]  # If small enough, return as single flight.

    # Determine how many flights are needed.
    n_flights = (total - 1) // max_flight_size + 1

    # Determine base size of each flight, and how many get an extra row.
    base = total // n_flights
    extras = total % n_flights

    flights = []
    start = 0
    for i in range(n_flights):
        size = base + (1 if i < extras else 0)
        flights.append(df.iloc[start:start+size].copy())
        start += size

    return flights

The below code is responsible for the writing operations to the Excel file itself.

In [16]:
def write_formatted(sheet_name, df_subset, writer, add_plus=False):
    workbook = writer.book

    # Generate the formatted block and get the metadata.
    formatted_df, levels, group_sizes = format_discipline_sheet(df_subset, add_plus)
    if formatted_df.empty:
        return

    # Standardise sheet names.
    stripped_sheet_name = sheet_name.strip()

    # Preview output in console (for debugging).
    if stripped_sheet_name in {"TRA Flights", "DMT Flights", "TUM Flights", "SYN Flights"}:
        print(f"\n==== Preview of formatted_df for {stripped_sheet_name} ====")
        print(formatted_df.head(10).to_string())
        print("==========================================\n")

    # Determine if flight formatting (boxes) should be applied.
    apply_flight_formatting = stripped_sheet_name in {"TRA Flights", "DMT Flights", "TUM Flights", "SYN Flights"}
    print(f"Processing sheet: '{sheet_name}' -> apply_flight_formatting = {apply_flight_formatting}")

    # Start position in Excel worksheet.
    startrow = 3
    startcol = 1

    # Create worksheet and register it.
    worksheet = writer.book.add_worksheet(sheet_name)
    writer.sheets[sheet_name] = worksheet

    # === Define Excel cell formats === #
    level_header_format = workbook.add_format({
        "bold": True, "align": "center", "valign": "vcenter",
        "bg_color": "#D9E1F2", "border": 2
    })

    column_header_format = workbook.add_format({
        "bold": True, "align": "center", "valign": "vcenter", "border": 2
    })

    flight_header_format = workbook.add_format({
        "bold": True, "align": "center", "valign": "vcenter",
        "bg_color": "#D9E1F2", "border": 2
    })

    bright_blue = workbook.add_format({"bg_color": "#D9E1F2"})
    title_format = workbook.add_format({
        "bold": True, "font_size": 16, "align": "center",
        "valign": "vcenter", "bg_color": "#D9E1F2", "border": 2
    })

    title_map = {
        "TRA Flights": "TRAMPOLINE INDIVIDUAL COMPETITORS",
        "DMT Flights": "DOUBLE-MINI TRAMPOLINE COMPETITORS",
        "TUM Flights": "TUMBLING COMPETITORS",
        "SYN Flights": "SYNCHRONISED TRAMPOLINE COMPETITORS"
    }

    # Write all rows except flight headers (which will be merged/formatted later).
    for r in range(formatted_df.shape[0]):
        first_cell = formatted_df.iat[r, 0]
        abs_row = startrow + r

        # Skip flight header rows.
        if isinstance(first_cell, str) and first_cell.strip().startswith("Flight"):
            continue

        for c in range(formatted_df.shape[1]):
            abs_col = startcol + c
            val = formatted_df.iat[r, c]
            worksheet.write(abs_row, abs_col, val if pd.notna(val) else "")

    # === Write title at the top of the sheet === #
    worksheet = writer.sheets[sheet_name]
    sheet_title = title_map.get(stripped_sheet_name, "COMPETITORS")
    total_cols = startcol + formatted_df.shape[1]

    for row in range(2):
        for col in range(startcol, total_cols):
            worksheet.write(row, col, "", bright_blue)

    worksheet.merge_range(0, startcol, 1, total_cols - 1, sheet_title, title_format)

    # === Format each competition level block === #
    col = 0
    level_index = 0
    while level_index < len(levels):
        if col + 2 >= formatted_df.shape[1]:
            break

        abs_col_start = startcol + col
        abs_col_end = abs_col_start + 2

        # Write merged level name header.
        level_row = startrow
        worksheet.merge_range(level_row, abs_col_start, level_row, abs_col_end,
                              levels[level_index], level_header_format)

        # Write column headers ("#", "Name", "Club").
        col_header_row = level_row + 1
        for c in range(abs_col_start, abs_col_end + 1):
            val = formatted_df.iloc[1, col + (c - abs_col_start)]
            worksheet.write(col_header_row, c, val, column_header_format)

        # Walk through rows to find and format flights.
        row_ptr = startrow + 3
        max_row = len(formatted_df)

        while row_ptr < max_row:
            # Check for flight header across 3 columns.
            flight_cell_text = None
            for offset in range(3):
                check_val = formatted_df.iat[row_ptr - startrow, col + offset]
                if isinstance(check_val, str) and check_val.strip().startswith("Flight"):
                    flight_cell_text = check_val.strip()
                    break

            if apply_flight_formatting and flight_cell_text:
                print(f"Flight header detected at row {row_ptr}: {flight_cell_text}")

                # Write flight header (merged across 3 columns).
                worksheet.merge_range(row_ptr, abs_col_start, row_ptr, abs_col_end,
                                      flight_cell_text, flight_header_format)

                comp_row = row_ptr + 1
                first_competitor = True

                # Write competitors in the flight, adding borders.
                while (comp_row < max_row and
                       pd.notna(formatted_df.iat[comp_row - startrow, col]) and
                       not str(formatted_df.iat[comp_row - startrow, col]).strip().startswith("Flight")):

                    first_competitor = False

                    for c in range(abs_col_start, abs_col_end + 1):
                        rel_r = comp_row - startrow
                        rel_c = c - startcol
                        val = formatted_df.iat[rel_r, rel_c]

                        # Create and apply borders around the box.
                        border_format = workbook.add_format()
                        if comp_row == row_ptr + 1:
                            border_format.set_top(2)

                        # Determine if this is the last competitor in the flight.
                        is_last_row_in_flight = False
                        if comp_row + 1 >= max_row:
                            is_last_row_in_flight = True
                        else:
                            next_row_vals = [
                                formatted_df.iat[comp_row + 1 - startrow, col + offset]
                                for offset in range(3)
                            ]
                            if all(pd.isna(val) or str(val).strip() == "" or str(val).strip().startswith("Flight")
                                   for val in next_row_vals):
                                is_last_row_in_flight = True

                        if is_last_row_in_flight:
                            border_format.set_bottom(2)
                        if c == abs_col_start:
                            border_format.set_left(2)
                        if c == abs_col_end:
                            border_format.set_right(2)

                        worksheet.write(comp_row, c, val if pd.notna(val) else "", border_format)

                    comp_row += 1

                if comp_row == row_ptr + 1:
                    comp_row += 1  # Skip empty flight header.

                row_ptr = comp_row
            else:
                row_ptr += 1

        level_index += 1
        col += 4  # Move to next level block (3 columns + 1 spacer).

    # === Adjust column widths === #
    for i in range(formatted_df.shape[1]):
        excel_col = startcol + i
        col_data = formatted_df.iloc[:, i].astype(str)

        if col_data.str.strip().eq("").all():
            worksheet.set_column(excel_col, excel_col, 1)  # Blank columns.
        elif formatted_df.columns[i] == "#":
            worksheet.set_column(excel_col, excel_col, 3.5)  # Narrow for number column.
        else:
            max_len = col_data.map(len).max()
            adjusted_width = min(max(5, max_len + 1), 30)
            worksheet.set_column(excel_col, excel_col, adjusted_width)

    worksheet.set_column(0, 0, 1)  # Add padding on far left.

The below code executes the writing of data into thr Excel file.

In [None]:
# === Write all four sheets into the final Excel file === #
with pd.ExcelWriter("data/created_files/04_Grouped_Competitors_by_Discipline.xlsx", engine = "xlsxwriter") as writer:
    write_formatted("TRA Flights", tri_df, writer, add_plus = True)
    write_formatted("DMT Flights", dmt_df, writer, add_plus = False)
    write_formatted("TUM Flights", tum_df, writer, add_plus = False)
    write_formatted("SYN Flights", trs_df, writer, add_plus = False)

print("Excel file written successfully.")


==== Preview of formatted_df for TRA Flights ====
          #              Name    Club           #             Name      Club           #               Name  Club           #               Name       Club           #               Name  Club           #                Name        Club           #            Name         Club           #                Name              Club           #             Name              Club           #             Name  Club           #               Name  Club           #             Name      Club
0                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

In [None]:
# === Create a mapping from competitor (Name + ClassName) to their assigned flight number === #
def extract_flight_map(discipline_df):
    flight_map = []  # List to store dictionaries of competitor and flight details.

    # Get the order of unique classes in the discipline.
    class_order = discipline_df["ClassName"].drop_duplicates()

    for class_name in class_order:

        # Extract all competitors for this class.
        group = discipline_df[discipline_df["ClassName"] == class_name].reset_index(drop = True)

        # Insert a column with sequential numbers starting from 1 (used as a display index).
        group.insert(0, "#", range(1, len(group) + 1))

        # Keep only the necessary columns.
        group = group[["#", "Name", "Club"]]

        # Split this class group into appropriate flights.
        flights = split_into_flights(group)

        # For each flight, assign a flight number and store it alongside each competitor.
        for flight_num, flight_df in enumerate(flights, start=1):
            for name in flight_df["Name"]:
                flight_map.append({
                    "Name": name.strip(),        # Remove any surrounding whitespace.
                    "ClassName": class_name,     # Original class name used for matching.
                    "Flight": flight_num         # Assigned flight number.
                })

    # Return the complete flight mapping as a data frame.
    return pd.DataFrame(flight_map)

# === Build a combined flight map across all disciplines === #
flight_df_list = []

# For each discipline, extract its flight map and add it to the list.
flight_df_list.append(extract_flight_map(tri_df))   # Trampoline (TRA).
flight_df_list.append(extract_flight_map(dmt_df))  # Double Mini (DMT).
flight_df_list.append(extract_flight_map(tum_df))  # Tumbling (TUM).
flight_df_list.append(extract_flight_map(trs_df))  # Synchro (SYN).

# Combine all flight maps into a single data frame.
combined_flights = pd.concat(flight_df_list, ignore_index = True)

# === Prepare both dataframes (original and combined) for merging by standardising the name and class strings === #
df["Name"] = df["Name"].astype(str).str.strip()
df["ClassName"] = df["ClassName"].astype(str).str.strip()

combined_flights["Name"] = combined_flights["Name"].astype(str).str.strip()
combined_flights["ClassName"] = combined_flights["ClassName"].astype(str).str.strip()

# Ensure the expected "flight" column exists in the combined flights table.
if "Flight" not in combined_flights.columns:
    raise ValueError("combined_flights must contain a 'Flight' column")

# Rename "flight" to "flight_Assigned" to avoid collision with existing 'flight' column
combined_flights.rename(columns = {"Flight": "flight_Assigned"}, inplace = True)

# === Merge the original data with the assigned flight numbers === #
# This matches rows based on "Name" and "ClassName".
df = df.merge(combined_flights, on = ["Name", "ClassName"], how = "left")

# If the "flight" column doesn't already exist in the original data, create an empty one.
if "flight" not in df.columns:
    df["flight"] = None

# Populate the "flight" column with the new assigned values (if any were found during merge).
df["flight"] = df["flight_Assigned"].fillna(df["flight"])

# Remove the temporary helper column used during merging.
df.drop(columns = ["flight_Assigned"], inplace = True)

# Identify duplicates with the same ID (excluding the first occurrence).
dupes_ID = df[df.duplicated(subset = ["ID", "Name", "Club", "ClassName", "Discipline"], keep = "first")]

# Print the duplicates.
print("Duplicate rows with the same ID to be removed:")
print(dupes_ID)

# Remove duplicate rows.
df = df.drop_duplicates(subset = ["ID","Name", "Club", "ClassName", "Discipline"], keep = "first")

# Identify duplicates (excluding the first occurrence).
dupes_misc = df[df.duplicated(subset = ["Name", "Club", "ClassName", "Discipline"], keep = "first")]
print("Duplicate rows with the same details (other than ID) to be considered for removal:")
print(dupes_misc)

# Save the updated competitor data to a new .csv file.
df.to_csv("data/created_files/05_Organised_Sample_TrampOnline_Data_with_Flights.csv", index = False, encoding = "utf-8-sig")

print("Flight numbers successfully added to data/created_files/Organised_Sample_TrampOnline_Data_with_Flights.csv")

Duplicate rows with the same ID to be removed:
      ID             Name    Club   ClassName  StartOrder Discipline Team  \
2  13001  Antwerp Flazgod     DCU  Novice Men           3        TRI    A   
4  12999  Antwerp Flazgod  Exeter  Novice Men           2        TRI  NaN   

   Team_Category  Guest  flight  photo_consent  
2            NaN  False       1           True  
4            NaN  False       1           True  
Duplicate rows with the same details (other than ID) to be considered for removal:
Empty DataFrame
Columns: [ID, Name, Club, ClassName, StartOrder, Discipline, Team, Team_Category, Guest, flight, photo_consent]
Index: []
Flight numbers successfully added to data/created_files/Organised_Sample_TrampOnline_Data_with_Flights.csv
