# Trampoline Competition Scheduler

## Information
### Competitors
- First name of competitor
    - Stored as a string.
- Surname of competitor
    - Stored as a string.
- Preferred category of competitor
    - Stored as a boolean.
        - True: Female
        - False: Male
- Trampoline (TRA) competitor
    - Competing?
        - True or False.
    - Stored as an enumerator?
        - 1: Novice
        - 2: Intermediate
        - 3: Intervanced
        - 4: Advanced
        - 5: Elite
        - 6: Elite-Pro
- Double-mini trampoline (DMT) competitor
    - Competing?
        - True or False.
    - Stored as an int.
        - 1: Level 1
        - 2: Level 2
        - 3: Level 3
        - 4: Level 4
        - 5: Level 5
        - 6: Level 6
- Tumbling (TUM) competitor
    - Competing?
        - True or False.
    - Stored as an int.
        - 1: Level 1
        - 2: Level 2
        - 3: Level 3
        - 4: Level 4


### Judges
Level of trampoline (TRA) juding qualification
Chairperson


## Breakdown of Data Importing and Processing
### 1. Data Importing

Via Pandas, we can import data from a .csv file and view it. This section deals with the import process.

In [1]:
# Import required packages.

# Import pandas for data analysis.
import pandas as pd

In [2]:
# Read the .csv file with all the entries.
df = pd.read_csv("Sample_Competitor_Data.csv", keep_default_na = True, delimiter = ",", skipinitialspace = True, encoding = "utf-8-sig")

### 2. Data Cleaning

The data should be checked and ensured to not have been corrupted or similar. This section performs some sanity checks on the data and then standardises the types.

In [3]:
# Find the shape of the data in the format (number of rows, number of columns).
df.shape

(14, 6)

In [4]:
# Display the first five results, starting at index zero.
df.head(5)

Unnamed: 0,first_name,surname,preferred_category,tra_competitor,dmt_competitor,tum_competitor
0,Bilbo,Staple,Male,Novice,1,1
1,Nilbo,Lexus,Male,Novice,4,2
2,Tristy,Thomdaughter,Female,Intermediate,0,3
3,Tristan,Turntable,Male,Not competing,2,4
4,Izzy,Dogg,Female,Intermediate,3,5


In [5]:
# Display the last five results.
df.tail(5)

Unnamed: 0,first_name,surname,preferred_category,tra_competitor,dmt_competitor,tum_competitor
9,Matitan,Marthritis,Male,Elite-Pro,7,5
10,Miko,Piko,Male,Elite-Pro,4,4
11,Darnius,Wallet,Male,Elite-Pro,1,3
12,Ark,Balboa,Male,Elite,0,2
13,Lavender,Rose,Female,Advanced,0,1


#### Category to "is_female"

In the interest of code efficiency, a Boolean data type is preferred over a categorical data type. In trampoline, the binary categories of "Ladies" and "Men" exist, making it perfect to be represented as True and False values. The "preferred_category" column is edited to reflect this.

In [6]:
# Create a new column called "is_female" based on the "preferred_category" column.
df["is_female"] = df["preferred_category"].apply(lambda x: True if x == "Female" else False)

# Drop the preferred category column.
df = df.drop(columns = ["preferred_category"])

In [7]:
# Check the types of the .csv file.
df.dtypes

first_name        object
surname           object
tra_competitor    object
dmt_competitor     int64
tum_competitor     int64
is_female           bool
dtype: object

In [8]:
# Change all data fields to either categorical (category) or continuous (float64) types.
df["first_name"] = df["first_name"].astype("category")
df["surname"] = df["surname"].astype("category")
df["tra_competitor"] = df["tra_competitor"].astype("category")
df["dmt_competitor"] = df["dmt_competitor"].astype("float64")
df["tum_competitor"] = df["tum_competitor"].astype("float64")

# "is_female" is already of type Boolean, therefore does not need to be changed.

#### Sanity Checks

This section checks for duplicate rows and the number of unique columns.

In [9]:
# Check for duplicate columns.
# If the result is "Empty DataFrame", then there are no duplicates.
print(df[df.duplicated()])

Empty DataFrame
Columns: [first_name, surname, tra_competitor, dmt_competitor, tum_competitor, is_female]
Index: []


In [10]:
# Check for the number of unique values in each column. In big competitions, the number of trampoline, DMT, and tumbling competitor categories
# should equal the number of levels plus one to account for anyone not competing.
df.nunique()

first_name        14
surname           14
tra_competitor     7
dmt_competitor     8
tum_competitor     7
is_female          2
dtype: int64

In [11]:
# Save the updated data frame back to CSV.
df.to_csv("Updated_Sample_Data.csv", index = False)

## Competitor Scheduling

The following section will get into organising the competitors into their groups.

There are six trampoline categories to consider: Novice, Intermediate, Intervanced, Advanced, Elite, and Elite-Pro. Competitors should be separated based on these levels, and then again by their category.

Once this is done, a check should be done to investigate the number of competitors in categories and decide on how many flights are appropriate. A general rule of thumb is 12 competitors per flight, however this isn't a hard rule. There may be more or less.

### Trampoline Level Ordering

The first step is to organise competitors by level. The levels should be put into a specific order, and then competitors should be split by category.

In [12]:
# Define the order of trampoline levels.
trampoline_level_order = ["Novice", "Intermediate", "Intervanced", "Advanced", "Elite", "Elite-Pro", "Not competing"]

# Set "tra_competitor" as an ordered categorical variable.
df["tra_competitor"] = pd.Categorical(df["tra_competitor"], categories = trampoline_level_order, ordered = True)

# Sort by "tra_competitor" and "is_female".
df = df.sort_values(["tra_competitor", "is_female"])

In [13]:
# Group by both level and category.
groups = [
    group.reset_index(drop = True)[["first_name", "surname", "tra_competitor", "is_female"]] # reset_index() will create subframes from the data frame.
    for _, group in df.groupby(["tra_competitor", "is_female"]) # The underscore tells Pandas to ignore the key, just group the data frame.
]

# Pad all groups to the same row count.
max_len = max(len(g) for g in groups) # Find the maximum amount of columns in any given group.
groups = [g.reindex(range(max_len)) for g in groups] # Check each group and make all of them have the same number of columns.

# Add a blank spacer column between each group.
with_spacers = [] # The groups will be stored here, and an empty column will be added at the end later.
for group in groups:
    with_spacers.append(group) # Add the group to the list.
    with_spacers.append(pd.DataFrame(columns = [""])) # Add a blank space to the end of the group.

# Remove the last spacer.
with_spacers = with_spacers[:-1] # Start from the beginning and go to the last element, which excludes the last element.

# Concatenate all the data frames into one side-by-side.
competitor_df = pd.concat(with_spacers, axis = 1) # "with_spacers" is the list containing all the data frames, and axis = 1 specifies horizontal concatenation.

# Export to a .csv file.
competitor_df.to_csv("Grouped_by_Level_and_Category.csv", index = False)

  for _, group in df.groupby(["tra_competitor", "is_female"]) # The underscore tells Pandas to ignore the key, just group the data frame.


#### Exporting to a Stylised Excel (.xlsx) File

The following code will take the created data frame and turn it into an Excel spreadsheet suitable for human use. It will also stylise the headers to make them more presentable.

In [14]:
# Calculate number of groups (unique combinations of level and category).
num_groups = len(groups)

# Build header labels to match competitor data frame.
header_labels = []
for (level, is_female), _ in df.groupby(["tra_competitor", "is_female"]): # Group by level and category.
    label = f"{level} {'Ladies +' if is_female else 'Men +'}" # Title the header with the level, followed by category.
    header_labels.extend([label, "", "", "", ""])  # The header should take up all the coolumns for each group.

# Remove the final spacer.
header_labels = header_labels[:competitor_df.shape[1]]

# Insert header row at the top.
competitor_df.loc[-1] = header_labels # Inserts a row at index -1, so that it is above all of the competitor data.
competitor_df = competitor_df.sort_index() # Sorts the rows, ensuring that the new header row is at the top.
competitor_df = competitor_df.reset_index(drop = True)  # Resets the index numbers, so index -1 becomes 0 and so on.

# Insert unique column headers for each column so Pandas is happy to export to Excel.
competitor_df.columns = [f"col{i}" for i in range(competitor_df.shape[1])]

# Function to highlight the background of the header row in a light blue tone and make the text bold.
def highlight_headers(val):
    """Function that highlights the heading of a group with a light blue background and makes the text bold.
    Assumes that the content is a string, and has "Ladies +", or "Men +" in it."""
    if isinstance(val, str) and ("Ladies +" in val or "Men +" in val):
        return "background-color: lightblue; font-weight: bold"
    return ""

# Apply the function to every cell of a data frame, specifically targeting the header row.
styled = competitor_df.style.applymap(highlight_headers, subset = pd.IndexSlice[0, :]) # The [0, :] targets the header row at index 0.

# Export the data frame to an Excel (.xlsx) file.
styled.to_excel("Grouped_by_Level_and_Category.xlsx", index = False, header = False)

  for (level, is_female), _ in df.groupby(["tra_competitor", "is_female"]): # Group by level and category.
  styled = competitor_df.style.applymap(highlight_headers, subset = pd.IndexSlice[0, :]) # The [0, :] targets the header row at index 0.
