# Trampoline Competition Scheduler

## Breakdown of Data Importing and Processing
### 1. Data Importing

Via Pandas, we can import data from any Excel-based file, including TrampOnline's .xls files, convert it into a .csv file, and view it. This section deals with the import process.

In [1]:
# Import required packages.

# Import pandas for data analysis.
import pandas as pd

In [2]:
# Read the TrampOnline .xls Excel file.
df = pd.read_excel("data/TrampOnline_Sample_Competitors.xls", sheet_name = "Sheet1", header = 0)

# Convert the .xls file into a .csv file.
df.to_csv("data/TrampOnline_Sample_Competitors.csv", index = False)

In [3]:
# Read the .csv file with all the entries.
df = pd.read_csv("data/TrampOnline_Sample_Competitors.csv", keep_default_na = True, delimiter = ",", skipinitialspace = True, encoding = "utf-8-sig")

### 2. Data Cleaning

The data should be checked and ensured to not have been corrupted or similar. This section performs some sanity checks on the data and then standardises the types.

In [4]:
# Find the shape of the data in the format (number of rows, number of columns).
df.shape

(25, 11)

In [5]:
# Display the first five results, starting at index zero.
df.head(5)

Unnamed: 0,ID,Name,Club,ClassName,StartOrder,Discipline,Team,Team_Category,Guest,flight,photo_consent
0,10134,Derbiled Áed,DCU,Novice Women,20,TRI,,,0,1.0,1
1,10304,Anwar Fateh,MU,Novice Men,12,TRI,,,0,1.0,1
2,10403,Karpos Pankratios,QUB,Intermediate Men,1,TRI,,,0,1.0,1
3,10999,Sören Shiva,TCD,Intervanced Men,45,TRI,A,,0,,1
4,11568,Yolotzin Thibaut,UCC,Intermediate Women,65,TRI,,,0,,1


In [6]:
# Display the last five results.
df.tail(5)

Unnamed: 0,ID,Name,Club,ClassName,StartOrder,Discipline,Team,Team_Category,Guest,flight,photo_consent
20,13045,Shoshanna Assol,UL,Tumbling Level 2,99,TUM,,,0,,1
21,13048,Akma 2,TCD,Tumbling Level 4,14,TUM,,,1,2.0,1
22,13052,Echo Longray,TCD,Tumbling Level 3,11,TUM,,,0,2.0,1
23,13055,Antwerp Flazgod,Exeter,Tumbling Level 1,5,TUM,,,0,2.0,1
24,13056,Lirana Willowshade,UCC,Intervanced Women,37,TRI,,,1,,1


In [7]:
# Check the types of the .csv file.
df.dtypes

ID                 int64
Name              object
Club              object
ClassName         object
StartOrder         int64
Discipline        object
Team              object
Team_Category    float64
Guest              int64
flight           float64
photo_consent      int64
dtype: object

Based on the above data types, the following modifications should be made.

- **ID:** Perfectly fine as an integer.

- **Name:** Should be converted into a string.

- **Club:** Should be converted into a category.

- **ClassName:** Should be converted into a category.

- **StartOrder:** Perfectly fine as an integer.

- **Discipline:** Should be converted into a category.

- **Team:** Should be converted into a string.

- **Team_Category:** Should be converted into a category.

- **Guest:** Should be converted into a Boolean.

- **flight:** Perfectly fine as a float.

- **photo_consent:** Should be converted into a Boolean.

In [8]:
# Change all data fields to the appropriate type.
df["Name"] = df["Name"].astype("string")
df["Club"] = df["Club"].astype("category")
df["ClassName"] = df["ClassName"].astype("category")
df["Discipline"] = df["Discipline"].astype("category")
df["Team"] = df["Team"].astype("string")
df["Team_Category"] = df["Team_Category"].astype("category")
df["Guest"] = df["Guest"].astype("boolean")
df["photo_consent"] = df["photo_consent"].astype("boolean")

#### Sanity Checks

This section checks for duplicate rows and the number of unique columns.

In [9]:
# Check for duplicate columns.
# If the result is "Empty DataFrame", then there are no duplicates.
print(df[df.duplicated()])

Empty DataFrame
Columns: [ID, Name, Club, ClassName, StartOrder, Discipline, Team, Team_Category, Guest, flight, photo_consent]
Index: []


In [10]:
# Check for the number of unique values in each column. In big competitions, the number of trampoline, DMT, and tumbling competitor categories
# should equal the number of levels plus one to account for anyone not competing.
df.nunique()

ID               25
Name             14
Club              8
ClassName        22
StartOrder       21
Discipline        3
Team              1
Team_Category     0
Guest             2
flight            2
photo_consent     1
dtype: int64

In [11]:
# Save the updated data frame back to CSV.
df.to_csv("data/Updated_Sample_TrampOnline_Data.csv", index = False)

## Competitor Scheduling

The following section will get into organising the competitors into their groups.

There are seven trampoline categories to consider: Novice, Intermediate, Intervanced, Advanced, Elite, Elite-Pro, and Disability (any category). Competitors should be separated based on these levels, and then again by their category. The only exception to the category separation rule is *"Disability (any category)"* as it is not usually split into Men and Women categories.

Once this is done, a check should be done to investigate the number of competitors in categories and decide on how many flights are appropriate. A general rule of thumb is 12 competitors per flight, however this isn't a hard rule. There may be more or less.

### Individual Trampoline (TRI), Double-Mini Trampoline (DMT), Tumbling (TUM), and Synchronised Trampoline (TRS) Level Ordering

The first step is to organise competitors by level. The levels should be put into a specific order (generally from lowest to highest), and then competitors should be split by category (only in TRI).

In [12]:
# Define the order of individual trampoline (TRI) levels.
trampoline_level_order = ["Novice", "Intermediate", "Intervanced", "Advanced", "Elite", "Elite-Pro", "Disability (any category)"]

# Define the order of double-mini-trampoline (DMT) levels.
dmt_level_order = ["DMT Level 1", "DMT Level 2", "DMT Level 3", "DMT Level 4", "DMT Level 5", "DMT Level 6"]

# Define the order of tumbling (TUM) levels.
tum_level_order = ["Tumbling Level 1", "Tumbling Level 2", "Tumbling Level 3", "Tumbling Level 4"]

# Define the order of synchronised trampoline (TRS) levels.
trs_level_order = ["Lower Synchro", "Higher Synchro", "Synchro Level 1", "Synchro Level 2", "Synchro Level 3"]

# ---------------------- TRI PROCESSING ---------------------- #

# Filter for individual trampoline (TRI) competitors.
tri_df = df[df["Discipline"] == "TRI"].copy()

# Split "ClassName" into level and category.
tri_df[["tri_level", "tri_category"]] = tri_df["ClassName"].str.extract(r"^(.*?)\s+(Men|Women)$")

# Convert level into an ordered categorical.
tri_df["tri_level"] = pd.Categorical(tri_df["tri_level"], categories = trampoline_level_order, ordered = True)

# Sort by level and category.
tri_df = tri_df.sort_values(by = ["tri_level", "tri_category", "Name", "Club"]).reset_index(drop = True)

# Remove the "tri_level" and "tri_category" columns.
tri_df = tri_df.drop(columns = ["tri_level", "tri_category"])

# ---------------------- DMT PROCESSING ---------------------- #

# Filter for double-mini trampoline (DMT) competitors.
dmt_df = df[df["Discipline"] == "DMT"].copy()

# Extract DMT level directly from ClassName.
dmt_df["dmt_level"] = pd.Categorical(dmt_df["ClassName"], categories = dmt_level_order, ordered = True)

# Sort by DMT level, then name and club.
dmt_df = dmt_df.sort_values(by = ["dmt_level", "Name", "Club"]).reset_index(drop = True)

# Remove the "dmt_level" column.
dmt_df = dmt_df.drop(columns = ["dmt_level"])

# ---------------------- TUM PROCESSING ---------------------- #

# Filter for tumbling (TUM) competitors.
tum_df = df[df["Discipline"] == "TUM"].copy()

# Extract TUM level directly from ClassName.
tum_df["tum_level"] = pd.Categorical(tum_df["ClassName"], categories = tum_level_order, ordered = True)

# Sort by TUM level, then name and club.
tum_df = tum_df.sort_values(by = ["tum_level", "Name", "Club"]).reset_index(drop = True)

# Remove the "tum_level" column.
tum_df = tum_df.drop(columns = ["tum_level"])

# ---------------------- TRS PROCESSING ---------------------- #

# Filter for synchronised trampoline (TRS) competitors.
trs_df = df[df["Discipline"] == "TRS"].copy()

# Extract TRS level directly from ClassName.
trs_df["trs_level"] = pd.Categorical(trs_df["ClassName"], categories = trs_level_order, ordered = True)

# Sort by TRS level, then name and club.
trs_df = trs_df.sort_values(by = ["trs_level", "Name", "Club"]).reset_index(drop = True)

# Remove the "trs_level" column.
trs_df = trs_df.drop(columns = ["trs_level"])

# ---------------------- MERGE AND OUTPUT ---------------------- #

# Get all other competitors, in case there are any not accounted for (not TRI, DMT, TUM, or TRS).
other_df = df[~df["Discipline"].isin(["TRI", "DMT", "TUM", "TRS"])]

# Combine all the disciplines in the defined order.
df = pd.concat([tri_df, dmt_df, tum_df, trs_df, other_df], ignore_index = True)

# Print the sorted dataset.
print(df)

       ID                Name    Club           ClassName  StartOrder  \
0   13001     Antwerp Flazgod     DCU          Novice Men           3   
1   12999     Antwerp Flazgod  Exeter          Novice Men           2   
2   10304         Anwar Fateh      MU          Novice Men          12   
3   10134        Derbiled Áed     DCU        Novice Women          20   
4   10403   Karpos Pankratios     QUB    Intermediate Men           1   
5   11568    Yolotzin Thibaut     UCC  Intermediate Women          65   
6   10999         Sören Shiva     TCD     Intervanced Men          45   
7   11900      Brianne Hagano      UL   Intervanced Women          76   
8   13056  Lirana Willowshade     UCC   Intervanced Women          37   
9   11783         Aäron Óscar     UCD        Advanced Men          42   
10  12483   Désirée Godofredo     TCD      Advanced Women          10   
11  12702     Bjoern Pierrick     UCC           Elite Men          23   
12  12881     Shoshanna Assol      UL         Elite

The below code will export the list of competitors into a .csv file. Save adjusting this code until there is functional code to sort all TRI, DMT, TUM, and TRS competitors.

TRI competitors shoud go into their own sheet called "TRA Flights".

DMT competitors shoud go into their own sheet called "DMT Flights".

TUM competitors shoud go into their own sheet called "TUM Flights".

TRS competitors shoud go into their own sheet called "TRS Flights".

Current issue is that levels are being displayed below #, Name, and Club.

In [25]:
def format_discipline_sheet(df_subset):
    if df_subset.empty:
        return pd.DataFrame()

    class_order = df_subset["ClassName"].drop_duplicates()
    groups = []

    for class_name in class_order:
        group = df_subset[df_subset["ClassName"] == class_name].reset_index(drop=True)

        # Add number column
        group.insert(0, "#", range(1, len(group) + 1))

        # Select only necessary columns
        group = group[["#", "Name", "Club"]]

        # Create the first row with the level name spanning 3 columns
        header_level = pd.DataFrame([[class_name, "", ""]], columns=group.columns)

        # Create the second row with standard column headers
        header_columns = pd.DataFrame([["#", "Name", "Club"]], columns=group.columns)

        # Combine header rows and data
        group_with_headers = pd.concat([header_level, header_columns, group], ignore_index=True)

        groups.append(group_with_headers)

    if not groups:
        return pd.DataFrame()

    # Pad each group to the same height
    max_len = max(len(g) for g in groups)
    groups = [g.reindex(range(max_len)) for g in groups]

    # Add spacers between groups
    with_spacers = []
    for group in groups:
        with_spacers.append(group)
        with_spacers.append(pd.DataFrame(columns=[""]))  # Spacer

    return pd.concat(with_spacers[:-1], axis=1)

# ---------------------- FORMAT EACH DISCIPLINE ---------------------- #

# TRA = TRI (Trampoline)
tra_sheet = format_discipline_sheet(tri_df)

# DMT
dmt_sheet = format_discipline_sheet(dmt_df)

# TUM
tum_sheet = format_discipline_sheet(tum_df)

# SYN = TRS (Synchro)
syn_sheet = format_discipline_sheet(trs_df)

# ---------------------- WRITE TO EXCEL ---------------------- #

with pd.ExcelWriter("data/Grouped_Competitors_by_Discipline.xlsx", engine="xlsxwriter") as writer:
    if not tra_sheet.empty:
        tra_sheet.to_excel(writer, sheet_name="TRA Flights", index=False)
    if not dmt_sheet.empty:
        dmt_sheet.to_excel(writer, sheet_name="DMT Flights", index=False)
    if not tum_sheet.empty:
        tum_sheet.to_excel(writer, sheet_name="TUM Flights", index=False)
    if not syn_sheet.empty:
        syn_sheet.to_excel(writer, sheet_name="SYN Flights", index=False)


print("Excel file 'Grouped_Competitors_by_Discipline.xlsx' successfully created.")

Excel file 'Grouped_Competitors_by_Discipline.xlsx' successfully created.


#### Exporting to a Stylised Excel (.xlsx) File

The following code will take the created data frame and turn it into an Excel spreadsheet suitable for human use. It will also stylise the headers to make them more presentable.

In [14]:
# Calculate number of groups (unique combinations of level and category).
num_groups = len(groups)

# Build header labels to match competitor data frame.
header_labels = []
for (level, is_female), _ in df.groupby(["tra_competitor", "is_female"]): # Group by level and category.
    label = f"{level} {'Ladies +' if is_female else 'Men +'}" # Title the header with the level, followed by category.
    header_labels.extend([label, "", "", "", ""])  # The header should take up all the coolumns for each group.

# Remove the final spacer.
header_labels = header_labels[:competitor_df.shape[1]]

# Insert header row at the top.
competitor_df.loc[-1] = header_labels # Inserts a row at index -1, so that it is above all of the competitor data.
competitor_df = competitor_df.sort_index() # Sorts the rows, ensuring that the new header row is at the top.
competitor_df = competitor_df.reset_index(drop = True)  # Resets the index numbers, so index -1 becomes 0 and so on.

# Insert unique column headers for each column so Pandas is happy to export to Excel.
competitor_df.columns = [f"col{i}" for i in range(competitor_df.shape[1])]

# Function to highlight the background of the header row in a light blue tone and make the text bold.
def highlight_headers(val):
    """Function that highlights the heading of a group with a light blue background and makes the text bold.
    Assumes that the content is a string, and has "Ladies +", or "Men +" in it."""
    if isinstance(val, str) and ("Ladies +" in val or "Men +" in val):
        return "background-color: lightblue; font-weight: bold"
    return ""

# Apply the function to every cell of a data frame, specifically targeting the header row.
styled = competitor_df.style.applymap(highlight_headers, subset = pd.IndexSlice[0, :]) # The [0, :] targets the header row at index 0.

# Export the data frame to an Excel (.xlsx) file.
styled.to_excel("data/Grouped_by_Level_and_Category.xlsx", index = False, header = False)

NameError: name 'groups' is not defined