# Trampoline Competition Scheduler

## Breakdown of Data Importing and Processing
### 1. Data Importing

Via Pandas, we can import data from any Excel-based file, including TrampOnline's .xls files, convert it into a .csv file, and view it. This section deals with the import process.

In [1]:
# Import required packages.

# Import pandas for data analysis.
import pandas as pd

In [2]:
# Read the TrampOnline .xls Excel file.
df = pd.read_excel("data/TrampOnline_Sample_Competitors.xls", sheet_name = "Sheet1", header = 0)

# Convert the .xls file into a .csv file.
df.to_csv("data/TrampOnline_Sample_Competitors.csv", index = False)

In [3]:
# Read the .csv file with all the entries.
df = pd.read_csv("data/TrampOnline_Sample_Competitors.csv", keep_default_na = True, delimiter = ",", skipinitialspace = True, encoding = "utf-8-sig")

### 2. Data Cleaning

The data should be checked and ensured to not have been corrupted or similar. This section performs some sanity checks on the data and then standardises the types.

In [4]:
# Find the shape of the data in the format (number of rows, number of columns).
df.shape

(12, 11)

In [5]:
# Display the first five results, starting at index zero.
df.head(5)

Unnamed: 0,ID,Name,Club,ClassName,StartOrder,Discipline,Team,Team_Category,Guest,flight,photo_consent
0,10134,Derbiled Áed,DCU,Novice Women,20,TRI,,,0,1.0,1
1,10304,Anwar Fateh,MU,Novice Men,12,TRI,,,0,1.0,1
2,10403,Karpos Pankratios,QUB,Intermediate Men,1,TRI,,,0,1.0,1
3,10999,Sören Shiva,TCD,Intervanced Men,45,TRI,A,,0,,1
4,11568,Yolotzin Thibaut,UCC,Intermediate Women,65,TRI,,,0,,1


In [6]:
# Display the last five results.
df.tail(5)

Unnamed: 0,ID,Name,Club,ClassName,StartOrder,Discipline,Team,Team_Category,Guest,flight,photo_consent
7,12483,Désirée Godofredo,TCD,Advanced Women,10,TRI,A,,0,1.0,1
8,12702,Bjoern Pierrick,UCC,Elite Men,23,TRI,,,0,,1
9,12881,Shoshanna Assol,UL,Elite Women,32,TRI,,,0,1.0,1
10,12969,Akma 2,TCD,Elite-Pro Men,69,TRI,A,,1,1.0,1
11,12977,Echo Longray,TCD,Elite-Pro Women,99,TRI,A,,0,1.0,1


In [7]:
# Check the types of the .csv file.
df.dtypes

ID                 int64
Name              object
Club              object
ClassName         object
StartOrder         int64
Discipline        object
Team              object
Team_Category    float64
Guest              int64
flight           float64
photo_consent      int64
dtype: object

Based on the above data types, the following modifications should be made.

- **ID:** Perfectly fine as an integer.

- **Name:** Should be converted into a string.

- **Club:** Should be converted into a category.

- **ClassName:** Should be converted into a category.

- **StartOrder:** Perfectly fine as an integer.

- **Discipline:** Should be converted into a category.

- **Team:** Should be converted into a string.

- **Team_Category:** Should be converted into a category.

- **Guest:** Should be converted into a Boolean.

- **flight:** Perfectly fine as a float.

- **photo_consent:** Should be converted into a Boolean.

In [None]:
# Change all data fields to the appropriate type.
df["Name"] = df["Name"].astype("string")
df["Club"] = df["Club"].astype("category")
df["ClassName"] = df["ClassName"].astype("category")
df["Discipline"] = df["Discipline"].astype("category")
df["Team"] = df["Team"].astype("string")
df["Team_Category"] = df["Team_Category"].astype("category")
df["Guest"] = df["Guest"].astype("boolean")
df["photo_consent"] = df["photo_consent"].astype("boolean")

#### Sanity Checks

This section checks for duplicate rows and the number of unique columns.

In [11]:
# Check for duplicate columns.
# If the result is "Empty DataFrame", then there are no duplicates.
print(df[df.duplicated()])

Empty DataFrame
Columns: [ID, Name, Club, ClassName, StartOrder, Discipline, Team, Team_Category, Guest, flight, photo_consent]
Index: []


In [12]:
# Check for the number of unique values in each column. In big competitions, the number of trampoline, DMT, and tumbling competitor categories
# should equal the number of levels plus one to account for anyone not competing.
df.nunique()

ID               12
Name             12
Club              7
ClassName        12
StartOrder       12
Discipline        1
Team              1
Team_Category     0
Guest             2
flight            1
photo_consent     1
dtype: int64

In [13]:
# Save the updated data frame back to CSV.
df.to_csv("Updated_Sample_TrampOnline_Data.csv", index = False)

## Competitor Scheduling

The following section will get into organising the competitors into their groups.

There are six trampoline categories to consider: Novice, Intermediate, Intervanced, Advanced, Elite, and Elite-Pro. Competitors should be separated based on these levels, and then again by their category.

Once this is done, a check should be done to investigate the number of competitors in categories and decide on how many flights are appropriate. A general rule of thumb is 12 competitors per flight, however this isn't a hard rule. There may be more or less.

### Trampoline Level Ordering

The first step is to organise competitors by level. The levels should be put into a specific order, and then competitors should be split by category.

In [None]:
# Define the order of trampoline levels.
trampoline_level_order = ["Novice", "Intermediate", "Intervanced", "Advanced", "Elite", "Elite-Pro", "Not competing"]

# Set "tra_competitor" as an ordered categorical variable.
df["tra_competitor"] = pd.Categorical(df["tra_competitor"], categories = trampoline_level_order, ordered = True)

# Sort by "tra_competitor" and "is_female".
df = df.sort_values(["tra_competitor", "is_female"])

In [None]:
# Group by both level and category.
groups = [
    group.reset_index(drop = True)[["first_name", "surname", "tra_competitor", "is_female"]] # reset_index() will create subframes from the data frame.
    for _, group in df.groupby(["tra_competitor", "is_female"]) # The underscore tells Pandas to ignore the key, just group the data frame.
]

# Pad all groups to the same row count.
max_len = max(len(g) for g in groups) # Find the maximum amount of columns in any given group.
groups = [g.reindex(range(max_len)) for g in groups] # Check each group and make all of them have the same number of columns.

# Add a blank spacer column between each group.
with_spacers = [] # The groups will be stored here, and an empty column will be added at the end later.
for group in groups:
    with_spacers.append(group) # Add the group to the list.
    with_spacers.append(pd.DataFrame(columns = [""])) # Add a blank space to the end of the group.

# Remove the last spacer.
with_spacers = with_spacers[:-1] # Start from the beginning and go to the last element, which excludes the last element.

# Concatenate all the data frames into one side-by-side.
competitor_df = pd.concat(with_spacers, axis = 1) # "with_spacers" is the list containing all the data frames, and axis = 1 specifies horizontal concatenation.

# Export to a .csv file.
competitor_df.to_csv("Grouped_by_Level_and_Category.csv", index = False)

#### Exporting to a Stylised Excel (.xlsx) File

The following code will take the created data frame and turn it into an Excel spreadsheet suitable for human use. It will also stylise the headers to make them more presentable.

In [None]:
# Calculate number of groups (unique combinations of level and category).
num_groups = len(groups)

# Build header labels to match competitor data frame.
header_labels = []
for (level, is_female), _ in df.groupby(["tra_competitor", "is_female"]): # Group by level and category.
    label = f"{level} {'Ladies +' if is_female else 'Men +'}" # Title the header with the level, followed by category.
    header_labels.extend([label, "", "", "", ""])  # The header should take up all the coolumns for each group.

# Remove the final spacer.
header_labels = header_labels[:competitor_df.shape[1]]

# Insert header row at the top.
competitor_df.loc[-1] = header_labels # Inserts a row at index -1, so that it is above all of the competitor data.
competitor_df = competitor_df.sort_index() # Sorts the rows, ensuring that the new header row is at the top.
competitor_df = competitor_df.reset_index(drop = True)  # Resets the index numbers, so index -1 becomes 0 and so on.

# Insert unique column headers for each column so Pandas is happy to export to Excel.
competitor_df.columns = [f"col{i}" for i in range(competitor_df.shape[1])]

# Function to highlight the background of the header row in a light blue tone and make the text bold.
def highlight_headers(val):
    """Function that highlights the heading of a group with a light blue background and makes the text bold.
    Assumes that the content is a string, and has "Ladies +", or "Men +" in it."""
    if isinstance(val, str) and ("Ladies +" in val or "Men +" in val):
        return "background-color: lightblue; font-weight: bold"
    return ""

# Apply the function to every cell of a data frame, specifically targeting the header row.
styled = competitor_df.style.applymap(highlight_headers, subset = pd.IndexSlice[0, :]) # The [0, :] targets the header row at index 0.

# Export the data frame to an Excel (.xlsx) file.
styled.to_excel("Grouped_by_Level_and_Category.xlsx", index = False, header = False)