# Data setup

In this notebook, we run all of the functions necessary for setting up the data that is later displayed on the website.


In [1]:
from helper_functions import setup
from helper_functions import SPORTS_EVENTS
import numpy as np


## Sanitize player data

First, we simply load the responses and anonymize them.

In [9]:
df = setup.sanitize_and_anonymize_data(overwrite=False, verbose=False)

print(f"{len(df)} entries, of which {np.sum(~df.is_postdoc)} are PhDs and {np.sum(df.is_postdoc)} are postdocs")
df.head(3)
# Conflicting sports
# np.sum(df["volleyball"] & df["basketball"])
# np.sum(df["football"] & df["tennis"])
# df[df["capture_the_flag"] & df["spikeball"]]["nickname"].tolist()


90 entries, of which 70 are PhDs and 20 are postdocs


Unnamed: 0,nickname,institute,is_postdoc,avail_monday,avail_tuesday,avail_thursday,avail_friday,wants_basketball,basketball,wants_running_sprints,...,spikeball,wants_beer_pong,beer_pong,wants_fooseball,fooseball,wants_ping_pong,ping_pong,num_sports,num_sports_not_avail,late_entry
0,Pushy Bulldog,MPE,False,True,True,True,True,False,False,True,...,True,False,False,False,False,True,True,5,0,False
1,Thankful Kakapo,MPE,False,True,True,True,True,False,False,True,...,True,True,True,True,True,True,True,7,0,False
2,Exemplary Cassowary,MPA,False,True,True,True,True,False,False,False,...,True,True,True,False,False,False,False,4,0,False


## Team generation

Since we now have all the player data including the sports where they're available, we can generate the teams based on this information.

We want to keep them balanced with regards to all sports; This  is handled in the `create_teams` routine.

In [2]:
teams = setup.create_teams(overwrite=False)
teams[0].player_df.head(3)


Unnamed: 0,nickname,institute,is_postdoc,avail_monday,avail_tuesday,avail_thursday,avail_friday,wants_basketball,basketball,wants_running_sprints,...,spikeball,wants_beer_pong,beer_pong,wants_fooseball,fooseball,wants_ping_pong,ping_pong,num_sports,num_sports_not_avail,late_entry
0,Magnificent Barracuda,MPE,False,True,True,True,True,False,False,True,...,True,True,True,True,True,True,True,9,0,False
1,Animated Yak,MPE,False,True,True,True,True,False,False,False,...,True,True,True,True,True,True,True,8,0,False
2,Failing Muskrat,IPP,False,True,True,True,True,True,True,False,...,False,False,False,False,False,True,True,6,0,False


## Subteam generation

Now we're getting to the spicy stuff!

We can generate the subteams for each main team, but there's a few caveats:

- Some of the sports are going to happen simultaneously, which is accounted for by the SportEvent class keeping book about that, and weights that are assigned while subteams are drawn.
- Some players have only chosen one sport. To make sure they can attend that, we are also adjusting their weights while being drawn.
- Some of the sports do not have traditional subteam generation, which we need to account for.

### Generate the subteams for each sport

For running/sprints, everyone is on their own, and we have different events.\
As a first iteration, we just group everyone to be in their own subteam.\
Conveniently, all reserve players are also taking part in the other concurrent events.

Same with chess.

In [3]:
all_subteams = {}
for sport in SPORTS_EVENTS.values():
    print("-"*40, f"\n{sport.name}")
    subteam_sport_dict = {}
    for team in teams:
        subteams = setup.generate_subteams_for_sport(team, sport)
        print(subteams)
    all_subteams[sport.sanitized_name] = subteam_sport_dict

# # Now we can update the players for each team:
# for team in teams:
#     player_df = team.player_df
#     for sport, subteams in all_subteams.items():
#         relevant


---------------------------------------- 
Basketball
[Subteam(sport='basketball', main_team_letter='A', sub_key='0', players=['Awesome Wolverine', 'Overjoyed Tapir', 'Clumsy Lizard', 'Ill-fated Meerkat', 'Failing Muskrat']), Subteam(sport='basketball', main_team_letter='A', sub_key='R', players=['Excited Rabbit'])]
[Subteam(sport='basketball', main_team_letter='B', sub_key='0', players=['Ignorant Lemur', 'Frail Skunk', 'Revolving Coelacanth', 'Gargantuan Okapi', 'Ideal Vaquita']), Subteam(sport='basketball', main_team_letter='B', sub_key='R', players=['Trivial Uguisu'])]
[Subteam(sport='basketball', main_team_letter='C', sub_key='0', players=['Real Mouse', 'Blank Tiffany', 'Damaged Fly', 'Alarmed Bird', 'Scientific Angelfish']), Subteam(sport='basketball', main_team_letter='C', sub_key='R', players=[])]
---------------------------------------- 
Running/Sprints
[Subteam(sport='running_sprints', main_team_letter='A', sub_key='0', players=['Nutty Sheep']), Subteam(sport='running_sprints',