# Data setup

In this notebook, we run all of the functions necessary for setting up the data that is later displayed on the website.


In [1]:
from helper_functions import setup
import numpy as np
import helper_functions as hf
# hf.SPORTS_EVENTS["volleyball"].subteams


## Sanitize player data

First, we simply load the responses and anonymize them.

In [2]:
df = setup.sanitize_and_anonymize_data(overwrite=True, verbose=False, anonymize=False)

print(f"{len(df)} entries, of which {np.sum(~df.is_postdoc)} are PhDs and {np.sum(df.is_postdoc)} are postdocs")

# Conflicting sports
# np.sum(df["volleyball"] & df["basketball"])
# np.sum(df["football"] & df["tennis"])
# df[df["capture_the_flag"] & df["spikeball"]]["nickname"].tolist()
endings = [email.split("@")[1] if email != "???" else email for email in df["email"]]
print(len([end for end in endings if end == "???"]), "unknown email-addresses.")
# df[df["nickname"] == "Magnificent Barracuda"]
# df["nickname"].to_csv("animal_names.csv", index=False)


90 entries, of which 70 are PhDs and 20 are postdocs
27 unknown email-addresses.


In [3]:
# from helper_functions.setup.openai_image_download import generate_all_images, save_resized_animal_images
# animals = [animal.lower() for animal in df["nickname"]]
# # The following operation uses up openai credits.
# generate_all_images([])
# save_resized_animal_images(150)


## Team generation

Since we now have all the player data including the sports where they're available, we can generate the teams based on this information.

We want to keep them balanced with regards to all sports; This  is handled in the `create_teams` routine.

In [4]:
teams = setup.create_teams()
teams[0].player_df.head(3)


Unnamed: 0,nickname,institute,is_postdoc,avail_monday,avail_tuesday,avail_thursday,avail_friday,wants_basketball,basketball,wants_running_sprints,...,spikeball,wants_beer_pong,beer_pong,wants_fooseball,fooseball,wants_ping_pong,ping_pong,num_sports,num_sports_not_avail,late_entry
3,Magnificent Barracuda,MPE,False,True,True,True,True,False,False,True,...,True,True,True,True,True,True,True,9,0,False
21,Animated Yak,MPE,False,True,True,True,True,False,False,False,...,True,True,True,True,True,True,True,8,0,False
60,Failing Muskrat,IPP,False,True,True,True,True,True,True,False,...,False,False,False,False,False,True,True,6,0,False


## Subteam generation

Now we're getting to the spicy stuff!

We can generate the subteams for each main team, but there's a few caveats:

- Some of the sports are going to happen simultaneously, which is accounted for by the SportEvent class keeping book about that, and weights that are assigned while subteams are drawn.
- Some players have only chosen one sport. To make sure they can attend that, we are also adjusting their weights while being drawn.
- Some of the sports do not have traditional subteam generation, which we need to account for.

### Generate the subteams for each sport

For running/sprints, everyone is on their own, and we have different events.\
As a first iteration, we just group everyone to be in their own subteam.\
Conveniently, all reserve players are also taking part in the other concurrent events.

Same with chess.

In [5]:
from helper_functions.setup import generate_all_subteams 

for team in teams:
    all_subteams = generate_all_subteams(team)
    team.add_subteam_keys(all_subteams)
    team.create_backup()


----------------------------------------
 volleyball basketball
WARN: The following players are still set as reserve for both sports: {'Excited Rabbit'}
----------------------------------------
 football tennis
WARN: No solution found for Magnificent Barracuda, they are currently double-booked.
----------------------------------------
 capture_the_flag spikeball
capture_the_flag: Switched out Kindly Clownfish with Nutty Sheep from A: 1 to A: R
capture_the_flag: Switched out Animated Yak with Failing Muskrat from A: 1 to A: R
capture_the_flag: Switched out Nice Albatross with Frightening Avocet from A: 1 to A: R
WARN: No solution found for Magnificent Barracuda, they are currently double-booked.
----------------------------------------
 volleyball basketball
WARN: The following players are still set as reserve for both sports: {'Trivial Uguisu'}
----------------------------------------
 football tennis
----------------------------------------
 capture_the_flag spikeball
capture_the_flag

In [6]:
df = hf.turn_series_list_to_dataframe([subteam.as_series for subteam in hf.ALL_SUBTEAMS if subteam.is_reserve])
df["reserve_num"] = df["players"].apply(len)
df = df.sort_values("reserve_num")
all_players = [player for player_list in df["players"] for player in player_list]
names, counts = np.unique(all_players, return_counts=True)
print({player: count for player, count in zip(names, counts) if count > 1})
df[["sport", "team_key", "players", "reserve_num"]]


{'Alarmed Bird': 2, 'Animated Yak': 2, 'Dishonest Fangtooth': 2, 'Excited Rabbit': 2, 'Failing Muskrat': 2, 'Gargantuan Okapi': 2, 'Ideal Vaquita': 2, 'Ill-fated Meerkat': 3, 'Neglected Harrier': 3, 'Nice Albatross': 2, 'Trivial Uguisu': 2, 'Unwritten Saiga': 3}


Unnamed: 0,sport,team_key,players,reserve_num
0,basketball,A,[Excited Rabbit],1
1,running_sprints,A,[Ill-fated Meerkat],1
3,football,A,[Sneaky Quokka],1
25,spikeball,C,[Unwritten Saiga],1
22,football,C,[Neglected Harrier],1
9,ping_pong,A,[Limp Monkey],1
10,basketball,B,[Trivial Uguisu],1
19,ping_pong,B,[Ignorant Lemur],1
18,fooseball,B,[Dishonest Fangtooth],1
13,football,B,[Extra-small Coyote],1


## Match scheduling

Now that all subteams have been determined, we can start scheduling the matches between them.

This process provides several challenges:

- We need to find a way for each subteam to play only against subteams form the other main teams, and this in a balanced manner.
- We need to consider that the same subteam can't play at two pitches at the same time.
- We need to look out for players that are doubly-booked and try to find workarounds.
- **Ping pong** is a big headache:
  - We want each player to have one match against one player of each of the other teams, but not have the other two players face each other to avoid grouping effects.
  - We want to schedule matches every day.
  - We want these matches to not overlap with any of the player's other activities.


In [29]:
from helper_functions.classes.sport_event import SportEvent
from helper_functions.setup.match_scheduling import determine_matchups_for_sport, determine_rotated_matchups_for_sport, write_match_backup
from helper_functions.classes.match import Match
from itertools import combinations
import pandas as pd
def schedule_matches(sport_event: SportEvent) -> list[Match]:
    # matchups = determine_matchups_for_sport(hf.ALL_SUBTEAMS, sport_event.sanitized_name)
    matchups = determine_rotated_matchups_for_sport(hf.ALL_SUBTEAMS, sport_event.sanitized_name)
    # Shuffle them around a bit such that the same subteam doesn't have matches at the same time
    matchups = matchups[::4] + matchups[1::4] + matchups[2::4] + matchups[3::4]
    
    courts = [
        str(i + 1)
        for _ in range(len(matchups) // sport_event.num_pitches)
        for i in range(sport_event.num_pitches)
    ]
    date_range = pd.date_range(
            start=sport_event.start,
            periods=len(matchups),
            freq=sport_event.match_duration,
        )
    dates = [
        date
        for date in date_range
        for _ in range(sport_event.num_pitches)
    ]
    matches = [Match(sport_event.sanitized_name, start, sport_event.match_duration, matchup[0], matchup[1], location) for start, matchup, location in zip(dates, matchups, courts)]
    return matches
ALL_MATCHES = [match for sport in hf.SPORTS_EVENTS.values() for match in schedule_matches(sport)]
# Find all conflicting match-ups
[comb[0].has_hard_collision(comb[1], verbose=True) for comb in combinations(ALL_MATCHES, 2)];

match_df = hf.turn_series_list_to_dataframe([m.as_series for m in ALL_MATCHES])
write_match_backup(ALL_MATCHES)
match_df[match_df["sport"] == "beer_pong"]


basketball, running_sprints: {'Scientific Angelfish'} have conflicting schedules.
basketball, ping_pong: {'Scientific Angelfish'} have conflicting schedules.
running_sprints, volleyball: {'Magnificent Barracuda'} have conflicting schedules.
running_sprints, ping_pong: {'Magnificent Barracuda'} have conflicting schedules.
running_sprints, ping_pong: {'Motherly Woodpecker'} have conflicting schedules.
running_sprints, ping_pong: {'Red Eel', 'Magnificent Barracuda'} have conflicting schedules.
running_sprints, ping_pong: {'Trivial Uguisu'} have conflicting schedules.
volleyball, ping_pong: {'Magnificent Barracuda'} have conflicting schedules.
volleyball, ping_pong: {'Radiant Booby'} have conflicting schedules.
volleyball, ping_pong: {'Pushy Bulldog'} have conflicting schedules.
volleyball, ping_pong: {'Animated Yak'} have conflicting schedules.
volleyball, volleyball: {'Tame Mink', 'Mammoth Barnacle', 'Reflecting Pug', 'Exemplary Cassowary'} have conflicting schedules.
volleyball, ping_po

Unnamed: 0,sport,team_a,team_b,location,day,time,result,winner,start,duration,team_a_key,team_b_key
53,beer_pong,A: 3,B: 2,1,Friday,17:30,,,2024-05-03 17:30:00,1200,A: 3,B: 2
54,beer_pong,B: 1,C: 4,2,Friday,17:30,,,2024-05-03 17:30:00,1200,B: 1,C: 4
55,beer_pong,C: 2,A: 2,1,Friday,17:50,,,2024-05-03 17:50:00,1200,C: 2,A: 2
56,beer_pong,B: 2,C: 1,2,Friday,17:50,,,2024-05-03 17:50:00,1200,B: 2,C: 1
57,beer_pong,C: 4,A: 3,1,Friday,18:10,,,2024-05-03 18:10:00,1200,C: 4,A: 3
58,beer_pong,A: 4,B: 3,2,Friday,18:10,,,2024-05-03 18:10:00,1200,A: 4,B: 3
59,beer_pong,C: 1,A: 4,1,Friday,18:30,,,2024-05-03 18:30:00,1200,C: 1,A: 4
60,beer_pong,A: 1,B: 4,2,Friday,18:30,,,2024-05-03 18:30:00,1200,A: 1,B: 4
61,beer_pong,B: 3,C: 3,1,Friday,18:50,,,2024-05-03 18:50:00,1200,B: 3,C: 3
62,beer_pong,A: 2,B: 1,2,Friday,18:50,,,2024-05-03 18:50:00,1200,A: 2,B: 1


In [41]:

matches = [Match.from_dataframe_entry(m, hf.ALL_SUBTEAMS) for _, m in match_df.iterrows()]
match_df2 = hf.turn_series_list_to_dataframe([m.as_series for m in matches])
all(match_df2 == match_df)
match_df.drop(columns=["location"])


Unnamed: 0,sport,team_a,team_b,day,time,result,winner,start,duration,team_a_key,team_b_key
0,basketball,A: 1,B: 1,Monday,17:30,,,2024-04-29 17:30:00,2700,A: 1,B: 1
1,basketball,B: 1,C: 1,Monday,18:15,,,2024-04-29 18:15:00,2700,B: 1,C: 1
2,basketball,C: 1,A: 1,Monday,19:00,,,2024-04-29 19:00:00,2700,C: 1,A: 1
3,running_sprints,A: Magnificent Barracuda,B: Motherly Woodpecker,Monday,17:30,,,2024-04-29 17:30:00,1800,A: 2,B: 1
4,running_sprints,B: Medium Kiwi,C: Thankful Kakapo,Monday,18:00,,,2024-04-29 18:00:00,1800,B: 3,C: 2
...,...,...,...,...,...,...,...,...,...,...,...
120,ping_pong,B: Quixotic Zebu,C: Honorable Pangolin,Monday,22:45,,,2024-04-29 22:45:00,900,B: 5,C: 11
121,ping_pong,C: Surprised Tetra,A: Horrible Lobster,Monday,23:00,,,2024-04-29 23:00:00,900,C: 3,A: 12
122,ping_pong,A: Unlucky Hare,B: Damp Fossa,Monday,23:00,,,2024-04-29 23:00:00,900,A: 3,B: 12
123,ping_pong,B: Glass Wildebeest,C: Thankful Kakapo,Monday,23:15,,,2024-04-29 23:15:00,900,B: 4,C: 7


In [25]:
hf.turn_series_list_to_dataframe(
            [team.as_series for team in hf.ALL_SUBTEAMS]
        )


Unnamed: 0,sport,team_key,sub_key,full_key,players
0,basketball,A,1,A: 1,"[Failing Muskrat, Ill-fated Meerkat, Awesome W..."
1,basketball,A,R,A: R,[Excited Rabbit]
2,running_sprints,A,2,A: 2,[Magnificent Barracuda]
3,running_sprints,A,1,A: 1,[Nutty Sheep]
4,running_sprints,A,R,A: R,[Ill-fated Meerkat]
...,...,...,...,...,...
150,ping_pong,C,5,C: 5,[Jumpy Catfish]
151,ping_pong,C,11,C: 11,[Honorable Pangolin]
152,ping_pong,C,3,C: 3,[Surprised Tetra]
153,ping_pong,C,12,C: 12,[Noted Hippopotamus]
