# Creating a League

> In order to bake an apple pie, you must first invent the universe

-Carl Sagan (I think...)

To evaluate our modelling approaches and get a feel for what kind of data needs to be pulled from the real world let's begin by first.

Let's begin by importing the libraries that will be necessary to 

In [1]:
import sys
print(sys.version)

from faker import Faker
import numpy as np
import scipy as sp
import pandas as pd
import datetime

3.8.5 (default, Sep  4 2020, 02:22:02) 
[Clang 10.0.0 ]


Outline the teams in the league and how many games each team will play in the season

In [2]:
team_names = [
    "Hat-Trick Heroes",
    "No Woman, No Krejci",
    "Flying Elbows",
    "Fresh Prince of Briere",
    "The Lucky Pucks",
    "PCUK",
    "Blades of Steel",
    "Benchwarmers"
]

games_in_a_season = 4 * (len(team_names) - 1)
print(f"Games played in a season by each team: {games_in_a_season}")

Games played in a season by each team: 28


Generate a season schedule

In [3]:
for season in range(5):
    y = 2015 + season
    start_date = datetime.date(y, 10, 1)
    end_date = datetime.date(y+1, 5, 31)
    season_dates = pd.date_range(start_date, end_date, freq="D")

    schedules = {team_name:{"dates":[], "opponent":[]}for team_name in team_names}
    final_schedule = {"date": [], "team_1": [], "team_2":[]}

    for i, team_name in enumerate(team_names[:-1]):
        remaining_to_schedule = team_names[i+1:] * 4
        np.random.shuffle(remaining_to_schedule)
        for opponent in remaining_to_schedule:
            date = np.random.choice(season_dates)
            while date in schedules[team_name]["dates"] or date in schedules[opponent]["dates"]:
                date = np.random.choice(season_dates)

            schedules[team_name]["dates"].append(date)
            schedules[team_name]["opponent"].append(opponent)

            schedules[opponent]["dates"].append(date)
            schedules[opponent]["opponent"].append(team_name)

            final_schedule["date"].append(date)
            final_schedule["team_1"].append(team_name)
            final_schedule["team_2"].append(opponent)
    final_schedule_df = pd.DataFrame(final_schedule)
    final_schedule_df.to_csv(f"../data/schedules/season_{season}_{y}-{y+1}.csv", index=False)

In [5]:
final_schedule_df.sort_values(by="date", inplace=True, ignore_index=True)
print(
    f"First game of the season on {final_schedule_df.loc[0]['date']} "
    f"between {final_schedule_df.loc[0]['team_1']} and "
    f"{final_schedule_df.loc[len(final_schedule_df)-1]['team_2']}"
)

First game of the season on 2019-10-02 00:00:00 between Fresh Prince of Briere and Benchwarmers


Generate the players. First outline the positions stats and number of players per team

In [6]:
player_positions = ["C", "RW", "LW", "D", "G"]
positions_per_team = ["C"] * 2 + ["RW"] * 2 + ["LW"] * 2 +  ["D"] * 4 + ["G"] * 2 + ["B"] * 3
skater_stats_tracked = ["Goals", "Assists", "Hits", "Blocks"]
goalie_stats_tracked = ["Game_Starts", "Wins", "Saves"]
stats_distribution_scaling = {
    "Goals": (4, 2),
    "Assists": (5, 2),
    "Hits": (10, 5),
    "Blocks": (10, 5),
    "Game_Starts": (1, 0.5),
    "Wins": (1, 0.5),
    "Saves": (70, 15)
}

Use this to compute the number of players in the entire league

In [7]:
number_of_positions_per_team = len(positions_per_team)
number_of_players_in_the_league = int(len(team_names) * 1.1 * number_of_positions_per_team) + 1

In [8]:
fake = Faker()

In [9]:
league_players = {"name": [], "position": []}
for stat in skater_stats_tracked + goalie_stats_tracked:
    league_players[f"{stat}_mean"] = []
    league_players[f"{stat}_std"] = []

Generate the players proportionally to the positions available if the player is a bench player then make sure their position is generated randomly this guarantees that the minimum number of players cover each available slot on a teams roster.

Then once the player is produced, generate average stats on the player to expect from them on a nightly basis. This will be used to simulate individual games throughout the season.

In [10]:
for i in range(number_of_players_in_the_league):
    player_name = fake.name()
    league_players["name"].append(player_name)
    
    n = len(positions_per_team)
    pos_idx = i % n
    pos = positions_per_team[pos_idx]
    if pos == "B":
        pos = np.random.choice(player_positions)
    league_players["position"].append(pos)
    
    print(f"Generated player {player_name} - position {pos}")
    for stat in stats_distribution_scaling:
        stat_mean = 0
        stat_std = 0
        
        if (pos == "G") == (stat in goalie_stats_tracked):
            m, s = stats_distribution_scaling[stat]
            stat_mean = m * np.random.random()
            stat_std = s * np.random.random()
            print(f"\t{stat}: {stat_mean:0.4f} (+/- {stat_std:0.4f})")
            
            
        league_players[f"{stat}_mean"].append(stat_mean)
        league_players[f"{stat}_std"].append(stat_std)        

Generated player Anthony Morton - position C
	Goals: 1.6969 (+/- 1.0020)
	Assists: 3.5378 (+/- 1.2995)
	Hits: 5.8572 (+/- 0.8116)
	Blocks: 0.5520 (+/- 2.0598)
Generated player Dr. Jennifer Mcmahon MD - position C
	Goals: 2.9016 (+/- 1.3534)
	Assists: 2.6320 (+/- 0.0607)
	Hits: 8.5561 (+/- 1.0019)
	Blocks: 3.5166 (+/- 4.1980)
Generated player Andrea Beasley - position RW
	Goals: 1.8686 (+/- 0.8041)
	Assists: 3.7896 (+/- 0.1066)
	Hits: 6.1711 (+/- 3.6718)
	Blocks: 8.0691 (+/- 0.0155)
Generated player Danny Harris - position RW
	Goals: 0.4706 (+/- 1.5677)
	Assists: 4.8558 (+/- 1.1372)
	Hits: 9.2897 (+/- 2.9577)
	Blocks: 2.1388 (+/- 1.1495)
Generated player Martha Peterson - position LW
	Goals: 0.5225 (+/- 1.1198)
	Assists: 0.5976 (+/- 1.0578)
	Hits: 3.6838 (+/- 2.5148)
	Blocks: 6.1228 (+/- 4.7656)
Generated player Loretta Wolfe - position LW
	Goals: 3.6220 (+/- 1.1712)
	Assists: 2.5358 (+/- 0.7803)
	Hits: 9.3753 (+/- 0.0468)
	Blocks: 3.2984 (+/- 1.3764)
Generated player Antonio Hardy - po

Generated player Donna Hall - position G
	Game_Starts: 0.5278 (+/- 0.4126)
	Wins: 0.5191 (+/- 0.3896)
	Saves: 40.2620 (+/- 12.9293)
Generated player Antonio Tanner - position RW
	Goals: 1.1214 (+/- 1.8866)
	Assists: 1.2648 (+/- 1.0812)
	Hits: 7.6357 (+/- 2.9563)
	Blocks: 7.2497 (+/- 0.0547)
Generated player Carmen Smith - position G
	Game_Starts: 0.2999 (+/- 0.2368)
	Wins: 0.0469 (+/- 0.1061)
	Saves: 69.9717 (+/- 5.7612)
Generated player David Montgomery - position C
	Goals: 3.7609 (+/- 1.0412)
	Assists: 2.8254 (+/- 1.6426)
	Hits: 9.2549 (+/- 0.8304)
	Blocks: 4.2908 (+/- 0.5277)
Generated player Kathy Allen - position C
	Goals: 3.7146 (+/- 1.5281)
	Assists: 0.2130 (+/- 0.2510)
	Hits: 7.5283 (+/- 3.0063)
	Blocks: 5.0770 (+/- 4.1708)
Generated player Mark Brown - position RW
	Goals: 1.5106 (+/- 0.6800)
	Assists: 2.6408 (+/- 1.7351)
	Hits: 4.1119 (+/- 2.3277)
	Blocks: 4.6584 (+/- 3.5723)
Generated player Danielle Baker - position RW
	Goals: 3.0969 (+/- 1.0624)
	Assists: 2.8190 (+/- 1.5000

In [11]:
league_players_df = pd.DataFrame(league_players)
league_players_df["team"] = "FA"

In [12]:
league_players_df

Unnamed: 0,name,position,Goals_mean,Goals_std,Assists_mean,Assists_std,Hits_mean,Hits_std,Blocks_mean,Blocks_std,Game_Starts_mean,Game_Starts_std,Wins_mean,Wins_std,Saves_mean,Saves_std,team
0,Anthony Morton,C,1.696933,1.002038,3.537770,1.299483,5.857164,0.811611,0.552004,2.059795,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
1,Dr. Jennifer Mcmahon MD,C,2.901576,1.353379,2.632040,0.060701,8.556089,1.001882,3.516572,4.197982,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
2,Andrea Beasley,RW,1.868589,0.804065,3.789637,0.106639,6.171108,3.671790,8.069108,0.015540,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
3,Danny Harris,RW,0.470625,1.567666,4.855820,1.137198,9.289750,2.957675,2.138841,1.149527,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
4,Martha Peterson,LW,0.522476,1.119785,0.597609,1.057772,3.683814,2.514836,6.122802,4.765589,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128,Patricia Hoffman,D,3.127716,1.883762,0.418962,0.584071,2.537578,3.738607,8.653453,1.195755,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
129,Alan Wright,D,1.766235,0.155482,1.247945,0.685769,1.716087,0.177707,0.577812,4.153256,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,FA
130,Brandon Martinez,G,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.774188,0.248221,0.490285,0.292157,0.285805,0.332366,FA
131,Kyle Nguyen,G,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.988963,0.358524,0.654174,0.328952,51.481248,2.070831,FA


In [14]:
team_names_shuffled = team_names.copy()
positions_per_teamn_shuffled = positions_per_team.copy()
np.random.shuffle(positions_per_teamn_shuffled)
np.random.shuffle(team_names_shuffled)
pick_number = 1
for pos in positions_per_team:
    print(f"drafting {pos}")
    available_players = league_players_df.loc[league_players_df["team"] == "FA"]
    if pos != "B":
        available_players = available_players.loc[available_players["position"] == pos]
    
    drafted_players = np.random.choice(available_players.index, size=len(team_names), replace=False)
    np.random.shuffle(team_names_shuffled)
    for player, team in zip(drafted_players, team_names_shuffled):
        print(f"With pick number {pick_number} the {team} select {league_players_df.loc[player]['name']}")
        league_players_df.loc[player, "team"] = team        
        pick_number += 1


drafting C
With pick number 1 the Benchwarmers select Kathy Allen
With pick number 2 the Blades of Steel select Mary Gutierrez
With pick number 3 the Flying Elbows select Dr. Jennifer Mcmahon MD
With pick number 4 the The Lucky Pucks select Ethan Jones
With pick number 5 the Fresh Prince of Briere select Robin Ford
With pick number 6 the PCUK select Dr. Cheryl Burgess
With pick number 7 the No Woman, No Krejci select Dean Bell
With pick number 8 the Hat-Trick Heroes select Dustin Medina
drafting C
With pick number 9 the Hat-Trick Heroes select Lisa Long
With pick number 10 the No Woman, No Krejci select David Montgomery
With pick number 11 the Benchwarmers select Tara Robinson MD
With pick number 12 the The Lucky Pucks select Christopher Young
With pick number 13 the Blades of Steel select Pamela Stewart
With pick number 14 the Fresh Prince of Briere select Timothy Brown
With pick number 15 the Flying Elbows select Heidi Richardson
With pick number 16 the PCUK select Lauren Gomez
draft

In [15]:
league_players_df.to_csv("../data/league_players.csv", index=False)