Now that we have a dataset, we can start to do some actual analysis. I'm going to be attempting to replicate the methodology of this paper:

Sapienza, Anna and Goyal, Palash and Ferrara, Emilio. Deep Neural Networks for Optimal Team Composition. Frontiers in Big Data, vol 2. Jun 2019. https://arxiv.org/abs/1805.03285 

While roller derby and esports games like League of Legends obviously are very different, in many ways, they can be treated similarly- each League match and individual jam of a derby bout consists of a team of 5 players with different defined roles attempting to achieve an objective while slowing the opposing team's attempt to achieve theirs.

A derby bout (game) consists of a series of many individual jams. Each team forwards a defensive line of four "blockers" and an offensive line of one "jammer". The jammer scores points by passing through the "pack" of blockers- one initial non-scoring pass through the pack is required, and then one point is earned for each of the opposing team's blockers that the jammer passes on subsequent laps. Each jam can run for a set amount of time, but the jammer that is the first to complete the non-scoring pass ("lead jammer") can choose to end the jam early. In addition, the jammer can hand off their jammer status to one special blocker on each team called a "pivot" by passing the special helmet cover that the jammer wears. This is the general gist of the sport- in many ways, it's similar to the playground game "Red Rover", but on wheels.

Let's see if wee can build a teammate recommender system similar to the one in the paper. First, we'll need to make the play graphs for the autoencoder to encode.

In this analysis, I'm going to make some assumptions.
-First, that the fundamental unit of derby is not the bout, but the jam. Each jam is unique, and may have starting conditions determined by the preceding jam, but ultimately, for the purposes of this analysis, the only influence jam 1 may have on a jam like jam 20 is player stamina (N.B.: sometimes players can still be in the penalty box from previous jams, so this is not strictly correct! but it's probably correct enough for what we'd like to test here). This means that I will update a player's "rating" each jam rather than each bout.

-Second, that while the "figure of merit" to determine the performance of a jammer is the total number of points they score in a jam, but that the "figure of merit" to determine the performance of a blocker line is the difference between their jammer's score and the opposing jammer's score. A good blocker line is able to slow the opposing jammer substantially while also letting their own through.

-Third: the rules of roller derby change often, as the sport is still relatively new. For instance- at one point, jammers scored an additional point for passing the opposing team's jammer as well as blockers. I'm assuming that we can largely treat them as constant- otherwise, I'm not sure we'll have enough stats to do meaningful analysis. This is perhaps the biggest assumption that goes into this project- the game is constantly evolving, and at high levels, play can change considerably as a result of rule changes!

In [None]:
import requests
import statistics
import pandas as pd
import numpy as np
import trueskill
from bs4 import BeautifulSoup
from itertools import product
from urllib.request import urlopen
import networkx as nx
from networkx.drawing.nx_agraph import to_agraph 
import matplotlib.pyplot as plt
import pylab
import random
from random import sample

import nbimporter
import Webscraper as wsc
import os.path
from os import path

In [None]:
def getstats(teamID,teamName):
#First, get the lineups for each jam KDD has stats available for.
    AllLineups = wsc.GetAllLineups(teamID, teamName)

# Also, get expanding average of score differentials for each jam. We'll use a player's
# average score differential after a given jam as a proxy for their skill ranking as measured
# after playing that jam.

    AllAvgs = wsc.ExpandingAverages(teamID, teamName)
    badjams,badblockers = wsc.GetBadJamsAndBlockers(teamID, teamName,20)
    
    return AllLineups,AllAvgs,badjams,badblockers

Let's only look at blockers for now, since they interact most closely with each other. Matching jammers to blocker lines is a different question than composing the lines themselves, since interplay is different.

Next, let's build the short and long term play networks described in the paper. We'll also need to prune them to remove isolated nodes and edges corresponding to less than two co-play jams. As an aside, there's a pretty clear typo in the paper: the long-term play network should drop off with time since last co-play, not increase (i.e., there should be a negative sign in the exponent).

In [None]:
def GetGraphs(teamID,teamName):
    
    AllLineups,AllAvgs,badjams,badblockers = getstats(teamID,teamName)
    blockerlines = AllLineups[['B1', 'B2', 'B3', 'B4']]
    #print(blockerlines)

    STjams=[]
    for jamnum in range(len((blockerlines.index))):

        if (jamnum in badjams): continue
        G = nx.complete_graph(4, nx.DiGraph())
        blockers = blockerlines.iloc[jamnum].to_list()
        mapping = dict(zip(G, blockers))
        G = nx.relabel_nodes(G, mapping)

        for edge in G.edges():
            weight = AllAvgs.iloc[jamnum][edge[0]]-AllAvgs.iloc[jamnum-1][edge[0]]
            #print(weight)
            G[edge[0]][edge[1]]['weight'] = weight
            STjams.append(G)

    STGraph = nx.DiGraph()
    for jam in STjams:
        for edge in jam.edges():
            if STGraph.has_edge(*edge):
                weightsum = jam.get_edge_data(*edge)['weight'] + STGraph.get_edge_data(*edge)['weight'] 
                STGraph[edge[0]][edge[1]]['weight'] = weightsum
            else: 
                #print("no edge yet")
                STGraph.add_edge(*edge[:2])
                STGraph[edge[0]][edge[1]]['weight'] = 0

    #Now get LTGraph.            
    #Get nodes and edges from the STGraph, remove weights
    LTGraph = STGraph.to_directed()

    for edge in LTGraph.edges():
        LTGraph[edge[0]][edge[1]]['weight'] = 0
        LTGraph[edge[0]][edge[1]]['jamssince'] = 0
        LTGraph[edge[0]][edge[1]]['totalcoplays'] = 0


    #Add a new edge feature: "jams since last co-play" that updates each jam, and use it to get the weights    

    for jamnum in range(len((blockerlines.index))):
        #get all edges in jam
        G = nx.complete_graph(4, nx.DiGraph())
        blockers = blockerlines.iloc[jamnum].to_list()
        mapping = dict(zip(G, blockers))
        G = nx.relabel_nodes(G, mapping)

        #get all possible combos
        for edge in LTGraph.edges():
            #zero if they play together in this jam, increment otherwise
            if edge in G.edges(): LTGraph[edge[0]][edge[1]]['jamssince'] = 0
            else: LTGraph[edge[0]][edge[1]]['jamssince'] += 1

        #get total number of co-play jams    
        for edge in G.edges():    
            if edge in LTGraph.edges(): LTGraph[edge[0]][edge[1]]['totalcoplays'] += 1
        
        if (jamnum in badjams): continue
        
        # Get all blockers in the jam, get all possible teammates
        for node in G:
            edges = LTGraph.out_edges(node)
            for edge in edges:
            # weight them by exp(-time) since last co-play: influence persists across jams but drops off with time
                nomweight = AllAvgs.iloc[jamnum][edge[0]]-AllAvgs.iloc[jamnum-1][edge[0]]
                #print(LTGraph[edge[0]][edge[1]]['jamssince'])
                modifier = np.exp(-LTGraph[edge[0]][edge[1]]['jamssince'])
                LTGraph[edge[0]][edge[1]]['weight'] += nomweight*modifier
    
    return STGraph,LTGraph

In [None]:
def PruneGraphs(STGraph,LTGraph):
   
    #print(len(STGraph))
    edges_to_prune=[]
    nodes_to_prune=[]
    
    #drop all edges with fewer than two co-plays    
    for edge in LTGraph.edges():
        thisedge = LTGraph.get_edge_data(*edge)
        #print(thisedge)
        if LTGraph[edge[0]][edge[1]]['totalcoplays'] < 2: 
            edges_to_prune.append(edge)
    
    for edge in edges_to_prune:
        STGraph.remove_edge(*edge)
        LTGraph.remove_edge(*edge)

   # else:
    largestSTGraph = max(nx.strongly_connected_components(STGraph), key=len)
    largestLTGraph = max(nx.strongly_connected_components(LTGraph), key=len)
    
    #print(STGraph)
    for node in LTGraph: 
        if node not in largestLTGraph: nodes_to_prune.append(node)
            #print(node)
            
    for node in nodes_to_prune:
        #print(node)
        STGraph.remove_node(node)
        LTGraph.remove_node(node)

    return(STGraph,LTGraph)

In [None]:
def GetAndWritePrunedGraphs(teamID,teamName):
    STGraph, LTGraph = GetGraphs(teamID,teamName)
    
    try:
        STpruned, LTpruned = PruneGraphs(STGraph, LTGraph)

        ST_relabel = nx.convert_node_labels_to_integers(STpruned)
        LT_relabel = nx.convert_node_labels_to_integers(LTpruned)

        nx.write_weighted_edgelist(STpruned, "Data/STGraphs/"+teamID+"STGraph.edgelist", delimiter=",,")
        nx.write_weighted_edgelist(LTpruned, "Data/LTGraphs/"+teamID+"LTGraph.edgelist", delimiter=",,")
    
    except: print("not enough data to get LCC!")
    
    return

In [None]:
#make sure this method works for one team
GetAndWritePrunedGraphs(str(3637),'Killamazoo')

In [None]:
#Now make all STGraphs and LTGraphs

IDs, names = wsc.getAllTeamsAndNames()

In [None]:
print(IDs)
print(names)

In [None]:
for ID,name in zip(IDs,names):
    if path.exists("Data/STGraphs/"+ID+"STGraph.edgelist"): continue
    print(ID,name)
    GetAndWritePrunedGraphs(str(ID),name)


Let's merge these edgelists manually using shell commands (i.e., not as a part of this notebook) to make graphs called 'AllTeamsFullSTGraph.edgelist' and 'AllTeamsFullLTGraph.edgelist'. After getting these, let's relabel the nodes from names to indices, so NetworkX can handle them more easily.

In [None]:
STGraphNames = nx.read_weighted_edgelist("Data/AllTeamsFullSTGraph.edgelist", delimiter=",,")
LTGraphNames = nx.read_weighted_edgelist("Data/AllTeamsFullLTGraph.edgelist", delimiter=",,")
STGraph = nx.convert_node_labels_to_integers(STGraphNames)
LTGraph = nx.convert_node_labels_to_integers(LTGraphNames)

nx.write_weighted_edgelist(STGraph, "Data/AllTeamsFullSTGraph.edgelist")
nx.write_weighted_edgelist(LTGraph, "Data/AllTeamsFullLTGraph.edgelist")

Let's randomly split the full graph into train, val, and test edge sets. Note that these won't necessarily be connected- the val and test graphs may contain orphan nodes. However, this is ok for what we're doing: we don't want the autoencoder to only consider observed links, so unobserved links don't have any negative effect on the recommender's performance.

In [None]:
def MakeTrainValTest():
    
    STtrainset = []
    STtestset = []
    STvalset = []
    
    STGraphFullNorm = nx.read_weighted_edgelist("Data/AllTeamsFullSTGraph.edgelist")
    LTGraphFullNorm = nx.read_weighted_edgelist("Data/AllTeamsFullLTGraph.edgelist")

    print(len(LTGraphFullNorm.edges()))
    #25126 total, so 20100 train, 2513 val, 2513 test
    
    testlistLT = random.sample(LTGraphFullNorm.edges(),2513)
    trainvalLT = [x for x in LTGraphFullNorm.edges() if x not in testlistLT]
    vallistLT = random.sample(trainvalLT,2513)
    trainlistLT = [x for x in trainvalLT if x not in vallistLT]
    
    traingraphLT = LTGraphFullNorm.edge_subgraph(trainlistLT)
    valgraphLT = LTGraphFullNorm.edge_subgraph(vallistLT)
    testgraphLT = LTGraphFullNorm.edge_subgraph(testlistLT)    
    nx.write_weighted_edgelist(traingraphLT, "Data/AllTeamsLTGraphTrain.edgelist")
    nx.write_weighted_edgelist(valgraphLT, "Data/AllTeamsLTGraphVal.edgelist")
    nx.write_weighted_edgelist(testgraphLT, "Data/AllTeamsLTGraphTest.edgelist")
    
    testlistST = random.sample(STGraphFullNorm.edges(),2513)
    trainvalST = [x for x in STGraphFullNorm.edges() if x not in testlistST]
    vallistST = random.sample(trainvalST,2513)
    trainlistST = [x for x in trainvalST if x not in vallistST]
    print(len(trainlistST),len(vallistST),len(testlistST))
    traingraphST = STGraphFullNorm.edge_subgraph(trainlistST)
    valgraphST = STGraphFullNorm.edge_subgraph(vallistST)
    testgraphST = STGraphFullNorm.edge_subgraph(testlistST)    
    nx.write_weighted_edgelist(traingraphST, "Data/AllTeamsSTGraphTrain.edgelist")
    nx.write_weighted_edgelist(valgraphST, "Data/AllTeamsSTGraphVal.edgelist")
    nx.write_weighted_edgelist(testgraphST, "Data/AllTeamsSTGraphTest.edgelist")

Let's make some methods for preprocessing the data. Ultimately, min-max scaling seems to perform the best. In both short and long term networks, the distribution of weights is roughly Gaussian, so the data can be safely scaled.

In [None]:
def GetTrainStats():
    STGraphTrainUnNorm = nx.read_weighted_edgelist("Data/AllTeamsSTGraphTrain.edgelist")
    LTGraphTrainUnNorm = nx.read_weighted_edgelist("Data/AllTeamsLTGraphTrain.edgelist")
    
    STweights = []
    LTweights = []
    STweightsNew = []
    LTweightsNew = [] 

    for node1, node2, data in STGraphTrainUnNorm.edges(data=True):
        STweights.append(data['weight'])
    
    for node1, node2, data in LTGraphTrainUnNorm.edges(data=True):
        LTweights.append(data['weight'])
        
    STsig = statistics.stdev(STweights)
    STmean = statistics.mean(STweights)
    STmin = min(STweights)
    STmax = max(STweights)
    LTsig = statistics.stdev(LTweights)
    LTmean = statistics.mean(LTweights)
    LTmin = min(LTweights)
    LTmax = max(LTweights)    
    
    return STmean,STsig,STmin,STmax,LTmean,LTsig,LTmin,LTmax

In [None]:
def StandardizeGraph(graph,trainmean,trainstdev):
    
    weights = []
    weightsNew = []

    for node1, node2, data in graph.edges(data=True):
        weights.append(data['weight'])
    
    #Both ST and LT graph weight dists are Gaussian, so we can safely standardize inputs without 
    #dramatically altering the structure of the data.

    
    for node1, node2, data in graph.edges(data=True):
        data['weight'] = (data['weight'] - trainmean)/trainstdev
        weightsNew.append(data['weight'])
    
    return graph

In [None]:
def NormalizeGraph(graph,trainmin,trainmax):
    
    weights = []
    weightsNew = []

    for node1, node2, data in graph.edges(data=True):
        weights.append(data['weight'])
    
    #Normalize so weights are between zero and one
    
    for node1, node2, data in graph.edges(data=True):
        data['weight'] = (data['weight'] - trainmin)/(trainmax-trainmin)
        weightsNew.append(data['weight'])
    
    return graph

In [None]:
MakeTrainValTest()

In [None]:
STmean,STsig,STmin,STmax,LTmean,LTsig,LTmin,LTmax = GetTrainStats()

In [None]:
STGraphTrainUnNorm = nx.read_weighted_edgelist("Data/AllTeamsSTGraphTrain.edgelist")
LTGraphTrainUnNorm = nx.read_weighted_edgelist("Data/AllTeamsLTGraphTrain.edgelist")
STGraphTestUnNorm = nx.read_weighted_edgelist("Data/AllTeamsSTGraphTest.edgelist")
LTGraphTestUnNorm = nx.read_weighted_edgelist("Data/AllTeamsLTGraphTest.edgelist")
STGraphValUnNorm = nx.read_weighted_edgelist("Data/AllTeamsSTGraphVal.edgelist")
LTGraphValUnNorm = nx.read_weighted_edgelist("Data/AllTeamsLTGraphVal.edgelist")

FullSTGraphUnNorm = nx.read_weighted_edgelist("Data/AllTeamsFullSTGraph.edgelist")
FullLTGraphUnNorm = nx.read_weighted_edgelist("Data/AllTeamsFullLTGraph.edgelist")


#normed but not Standardized
STGraphTrainScaled = NormalizeGraph(STGraphTrainUnNorm, STmin, STmax)
LTGraphTrainScaled = NormalizeGraph(LTGraphTrainUnNorm, LTmin, LTmax)
STGraphValScaled = NormalizeGraph(STGraphValUnNorm, STmin, STmax)
LTGraphValScaled = NormalizeGraph(LTGraphValUnNorm, LTmin, LTmax)
STGraphTestScaled = NormalizeGraph(STGraphTestUnNorm, STmin, STmax)
LTGraphTestScaled = NormalizeGraph(LTGraphTestUnNorm, LTmin, LTmax)


FullSTGraphScaled = NormalizeGraph(FullSTGraphUnNorm, STmin, STmax)
FullLTGraphScaled = NormalizeGraph(FullLTGraphUnNorm, LTmin, LTmax)

nx.write_weighted_edgelist(STGraphTrainScaled, "Data/AllTeamsSTGraphTrainNormalized.edgelist")
nx.write_weighted_edgelist(LTGraphTrainScaled, "Data/AllTeamsLTGraphTrainNormalized.edgelist")
nx.write_weighted_edgelist(STGraphValScaled, "Data/AllTeamsSTGraphValNormalized.edgelist")
nx.write_weighted_edgelist(LTGraphValScaled, "Data/AllTeamsLTGraphValNormalized.edgelist")
nx.write_weighted_edgelist(STGraphTestScaled, "Data/AllTeamsSTGraphTestNormalized.edgelist")
nx.write_weighted_edgelist(LTGraphTestScaled, "Data/AllTeamsLTGraphTestNormalized.edgelist")

nx.write_weighted_edgelist(FullSTGraphScaled, "Data/AllTeamsFullSTGraphNormalized.edgelist")
nx.write_weighted_edgelist(FullLTGraphScaled, "Data/AllTeamsFullLTGraphNormalized.edgelist")

#Standardized but not normed
STGraphTrainNorm = StandardizeGraph(STGraphTrainUnNorm, STmean, STsig)
LTGraphTrainNorm = StandardizeGraph(LTGraphTrainUnNorm, LTmean, LTsig)
STGraphValNorm = StandardizeGraph(STGraphValUnNorm, STmean, STsig)
LTGraphValNorm = StandardizeGraph(LTGraphValUnNorm, LTmean, LTsig)
STGraphTestNorm = StandardizeGraph(STGraphTestUnNorm, STmean, STsig)
LTGraphTestNorm = StandardizeGraph(LTGraphTestUnNorm, LTmean, LTsig)

FullSTGraphNorm = StandardizeGraph(FullSTGraphUnNorm, STmean, STsig)
FullLTGraphNorm = StandardizeGraph(FullLTGraphUnNorm, LTmean, LTsig)

nx.write_weighted_edgelist(STGraphTrainNorm, "Data/AllTeamsSTGraphTrainStandardized.edgelist")
nx.write_weighted_edgelist(LTGraphTrainNorm, "Data/AllTeamsLTGraphTrainStandardized.edgelist")
nx.write_weighted_edgelist(STGraphValNorm, "Data/AllTeamsSTGraphValStandardized.edgelist")
nx.write_weighted_edgelist(LTGraphValNorm, "Data/AllTeamsLTGraphValStandardized.edgelist")
nx.write_weighted_edgelist(STGraphTestNorm, "Data/AllTeamsSTGraphTestStandardized.edgelist")
nx.write_weighted_edgelist(LTGraphTestNorm, "Data/AllTeamsLTGraphTestStandardized.edgelist")

nx.write_weighted_edgelist(FullSTGraphNorm, "Data/AllTeamsFullSTGraphStandardized.edgelist")
nx.write_weighted_edgelist(FullLTGraphNorm, "Data/AllTeamsFullLTGraphStandardized.edgelist")

Okay, looks good! These graphs are ready to be fed into a machine learning algorithm. On to the next notebook!