Now that we have a dataset, we can start to do some actual analysis. I'm going to be attempting to replicate the methodology of this paper:

Sapienza, Anna and Goyal, Palash and Ferrara, Emilio. Deep Neural Networks for Optimal Team Composition. Frontiers in Big Data, vol 2. Jun 2019. https://arxiv.org/abs/1805.03285 

While roller derby and esports games like League of Legends obviously are very different, in many ways, they can be treated similarly- each League match and individual jam of a derby bout consists of a team of 5 players with different defined roles attempting to achieve an objective while slowing the opposing team's attempt to achieve theirs.

A derby bout (game) consists of a series of many individual jams. Each team forwards a defensive line of four "blockers" and an offensive line of one "jammer". The jammer scores points by passing through the "pack" of blockers- one initial non-scoring pass through the pack is required, and then one point is earned for each of the opposing team's blockers that the jammer passes on subsequent laps. Each jam can run for a set amount of time, but the jammer that is the first to complete the non-scoring pass ("lead jammer") can choose to end the jam early. In addition, the jammer can hand off their jammer status to one special blocker on each team called a "pivot" by passing the special helmet cover that the jammer wears. This is the general gist of the sport- in many ways, it's similar to the playground game "Red Rover", but on wheels.

Naturally, when the blockers try to stop the jammer, things can get scrappy! Various penalties are given when a player shoves another in an illegal manner, when a blocker strays too far from the pack, when a player goes out of bounds, when a blocker makes an illegal formation (such as linking arms with another blocker), etc. It's general "derby wisdom" that certain penalties are more common "new-skater" penalties, while the distribution of penalties changes with skill. We can test this!


Let's pick a team. I'll use the Kalamazoo Derby Darlins, the team I've announced for for the past few years. 

In this analysis, I'm going to make some assumptions.
-First, that the fundamental unit of derby is not the bout, but the jam. Each jam is unique, and may have starting conditions determined by the preceding jam, but ultimately, for the purposes of this analysis, the only influence jam 1 may have on a jam like jam 20 is player stamina (N.B.: sometimes players can still be in the penalty box from previous jams, so this is not strictly correct! but it's probably correct enough for what we'd like to test here). This means that I will update a player's "rating" each jam rather than each bout.

-Second, that the "figure of merit" to determine the performance of a jammer is the total number of points they score in a jam, but that the "figure of merit" to determine the performance of a blocker line is the difference between their jammer's score and the opposing jammer's score. A good blocker line is able to slow the opposing jammer substantially while also letting their own through.

-Third: the rules of roller derby change often, as the sport is still relatively new. For instance- at one point, jammers scored an additional point for passing the opposing team's jammer as well as blockers. I'm assuming that we can largely treat them as constant- otherwise, I'm not sure we'll have enough stats.
    

In [6]:
import requests
from time import sleep
import pandas as pd
import numpy as np
import trueskill
from bs4 import BeautifulSoup
from itertools import product
from urllib.request import urlopen
import networkx as nx
from networkx.drawing.nx_agraph import to_agraph 
import matplotlib.pyplot as plt
import pylab

import nbimporter
import Webscraper as wsc


teamID=str(3637)
teamName='Killamazoo'

In [7]:
def getstats(teamID,teamName):
#First, get the lineups for each jam KDD has stats available for.
    AllLineups = wsc.GetAllLineups(teamID, teamName)

# Also, get expanding average of score differentials for each jam. We'll use a player's
# average score differential after a given jam as a proxy for their skill ranking as measured
# after playing that jam.

    AllAvgs = wsc.ExpandingAverages(teamID, teamName)
    badjams,badblockers = wsc.GetBadJamsAndBlockers(teamID, teamName,20)
    
    return AllLineups,AllAvgs,badjams,badblockers
#print(badjams)

Let's only look at blockers for now, since they interact most closely with each other. Matching jammers to blocker lines is a different question than composing the lines themselves, since interplay is different.

In [8]:
#print(AllLineups, AllAvgs)

Next, let's build the short term play network described in the paper. 

In [9]:
def GetGraphs(teamID,teamName):
    
    AllLineups,AllAvgs,badjams,badblockers = getstats(teamID,teamName)
    blockerlines = AllLineups[['B1', 'B2', 'B3', 'B4']]
    #print(blockerlines)

    STjams=[]
    for jamnum in range(len((blockerlines.index))):

        if (jamnum in badjams): continue
        G = nx.complete_graph(4, nx.DiGraph())
        blockers = blockerlines.iloc[jamnum].to_list()
        mapping = dict(zip(G, blockers))
        G = nx.relabel_nodes(G, mapping)

        for edge in G.edges():
            weight = AllAvgs.iloc[jamnum][edge[0]]-AllAvgs.iloc[jamnum-1][edge[0]]
            #print(weight)
            G[edge[0]][edge[1]]['weight'] = weight
            STjams.append(G)

    STGraph = nx.DiGraph()
    for jam in STjams:
        for edge in jam.edges():
            if STGraph.has_edge(*edge):
                weightsum = jam.get_edge_data(*edge)['weight'] + STGraph.get_edge_data(*edge)['weight'] 
                STGraph[edge[0]][edge[1]]['weight'] = weightsum
            else: 
                #print("no edge yet")
                STGraph.add_edge(*edge[:2])
                STGraph[edge[0]][edge[1]]['weight'] = 0

    #Now get LTGraph.            
    #Get nodes and edges from the STGraph, remove weights
    LTGraph = STGraph.to_directed()

    for edge in LTGraph.edges():
        LTGraph[edge[0]][edge[1]]['weight'] = 0
        LTGraph[edge[0]][edge[1]]['jamssince'] = 0
        LTGraph[edge[0]][edge[1]]['totalcoplays'] = 0


    #Add a new edge feature: "jams since last co-play" that updates each jam, and use it to get the weights    

    for jamnum in range(len((blockerlines.index))):
        #get all edges in jam
        G = nx.complete_graph(4, nx.DiGraph())
        blockers = blockerlines.iloc[jamnum].to_list()
        mapping = dict(zip(G, blockers))
        G = nx.relabel_nodes(G, mapping)

        #get all possible combos
        for edge in LTGraph.edges():
            #zero if they play together in this jam, increment otherwise
            if edge in G.edges(): LTGraph[edge[0]][edge[1]]['jamssince'] = 0
            else: LTGraph[edge[0]][edge[1]]['jamssince'] += 1

        #get total number of co-play jams    
        for edge in G.edges():    
            if edge in LTGraph.edges(): LTGraph[edge[0]][edge[1]]['totalcoplays'] += 1
        
        if (jamnum in badjams): continue
        
        # Get all blockers in the jam, get all possible teammates
        for node in G:
            edges = LTGraph.out_edges(node)
            for edge in edges:
            # weight them by exp(-time) since last co-play: influence persists across jams but drops off with time
                nomweight = AllAvgs.iloc[jamnum][edge[0]]-AllAvgs.iloc[jamnum-1][edge[0]]
                #print(LTGraph[edge[0]][edge[1]]['jamssince'])
                modifier = np.exp(-LTGraph[edge[0]][edge[1]]['jamssince'])
                LTGraph[edge[0]][edge[1]]['weight'] += nomweight*modifier
    
    return STGraph,LTGraph

In [10]:
def PruneGraphs(STGraph,LTGraph):
   
    #print(len(STGraph))
    edges_to_prune=[]
    nodes_to_prune=[]
    
    #drop all edges with fewer than two co-plays    
    for edge in LTGraph.edges():
        thisedge = LTGraph.get_edge_data(*edge)
        #print(thisedge)
        if LTGraph[edge[0]][edge[1]]['totalcoplays'] < 2: 
            edges_to_prune.append(edge)
    
    for edge in edges_to_prune:
        STGraph.remove_edge(*edge)
        LTGraph.remove_edge(*edge)

    #get Largest Connected Component
    #if(nx.strongly_connected_components(STGraph) == []): 
    #    largestSTGraph = []
    #    largestLTGraph = [] 
    
   # else:
    largestSTGraph = max(nx.strongly_connected_components(STGraph), key=len)
    largestLTGraph = max(nx.strongly_connected_components(LTGraph), key=len)
    
    #print(STGraph)
    for node in LTGraph: 
        if node not in largestLTGraph: nodes_to_prune.append(node)
            #print(node)
            
    for node in nodes_to_prune:
        #print(node)
        STGraph.remove_node(node)
        LTGraph.remove_node(node)

    '''    
    #Finally, normalize the graphs so edge weights sum to 1.0
    LTtotweight = 0
    STtotweight = 0
    
    for edge in LTGraph.edges():
        LTtotweight += LTGraph[edge[0]][edge[1]]['weight'] 
        
    for edge in STGraph.edges():    
        STtotweight += STGraph[edge[0]][edge[1]]['weight'] 
        
    for edge in LTGraph.edges():
        LTGraph[edge[0]][edge[1]]['weight'] = LTGraph[edge[0]][edge[1]]['weight'] / LTtotweight  
        
    for edge in STGraph.edges():    
        STGraph[edge[0]][edge[1]]['weight'] = STGraph[edge[0]][edge[1]]['weight'] / STtotweight  
    
        
    #print(TGraph)
    '''
    return(STGraph,LTGraph)

In [None]:
#STGraph, LTGraph = GetGraphs(teamID,teamName)

In [None]:
#STpruned, LTpruned = PruneGraphs(STGraph, LTGraph)
#print(nx.is_strongly_connected(STpruned))
#nx.drawing.nx_pylab.draw_circular(STpruned)

In [11]:
def GetAndWritePrunedGraphs(teamID,teamName):
    STGraph, LTGraph = GetGraphs(teamID,teamName)
    
    try:
        STpruned, LTpruned = PruneGraphs(STGraph, LTGraph)

        #ST_relabel = nx.convert_node_labels_to_integers(STpruned)
        #LT_relabel = nx.convert_node_labels_to_integers(LTpruned)

        nx.write_weighted_edgelist(STpruned, "Data/STGraphs/"+teamID+"STGraph.edgelist", delimiter=",,")
        nx.write_weighted_edgelist(LTpruned, "Data/LTGraphs/"+teamID+"LTGraph.edgelist", delimiter=",,")
    
    except: print("not enough data to get LCC!")
    
    return

In [None]:
#GetAndWritePrunedGraphs(str(3637),'Killamazoo')

In [12]:
#Now make all STGraphs and LTGraphs

IDs, names = wsc.getAllTeamsAndNames()



  {


  "    from bs4 import BeautifulSoup\n",


  "In this series of notebooks, I will attempt to do some introductory exploration of various roller derby statistics. We will use the publicly available stats on the FlatTrackStats website. First, I will build a table scraper tool using the BeautifulSoup4 package to parse the stats tables on the website. If not already installed, you will need pandas and BeautifulSoup4 in order to run this notebook. "


  "    import requests\n",


In [14]:
print(IDs)
print(names)


['20988', '8003', '3636', '3422', '9248', '3420', '13529', '7876', '3433', '3399', '3424', '3418', '15073', '4737', '12607', '26170', '18437', '3404', '14228', '3464', '3397', '17350', '8143', '7521', '8059', '3402', '16730', '11127', '3419', '3437', '7928', '4740', '11444', '3414', '4742', '8142', '16731', '8141', '8087', '3463', '3642', '4744', '8140', '3395', '7870', '10187', '8052', '13834', '3427', '7511', '7813', '7696', '7608', '25344', '3432', '13840', '20989', '3400', '14613', '3444', '48273', '3625', '3431', '3471', '3644', '29611', '3392', '8044', '3435', '3430', '28777', '9085', '3406', '3457', '25351', '12621', '8138', '3465', '8731', '3396', '8095', '11351', '8086', '3466', '9086', '3411', '3640', '21447', '8137', '3626', '3426', '8047', '4036', '3413', '3412', '7825', '3643', '8073', '16733', '3456', '3447', '14233', '8127', '25640', '3421', '3646', '7244', '3627', '5916', '32345', '3647', '5917', '3467', '4292', '3637', '3407', '15020', '3470', '3639', '5918', '7745', '

In [None]:
for ID,name in zip(IDs,names):
    print(ID,name)
    GetAndWritePrunedGraphs(str(ID),name)


20988  A'Salt Creek
8003  Acadiana




  "In this series of notebooks, I will attempt to do some introductory exploration of various roller derby statistics. We will use the publicly available stats on the FlatTrackStats website. First, I will build a table scraper tool using the BeautifulSoup4 package to parse the stats tables on the website. If not already installed, you will need pandas and BeautifulSoup4 in order to run this notebook. "


  "    import requests\n",


3636  NEO




  "In this series of notebooks, I will attempt to do some introductory exploration of various roller derby statistics. We will use the publicly available stats on the FlatTrackStats website. First, I will build a table scraper tool using the BeautifulSoup4 package to parse the stats tables on the website. If not already installed, you will need pandas and BeautifulSoup4 in order to run this notebook. "


  "    import requests\n",


3422  Alamo City




  "In this series of notebooks, I will attempt to do some introductory exploration of various roller derby statistics. We will use the publicly available stats on the FlatTrackStats website. First, I will build a table scraper tool using the BeautifulSoup4 package to parse the stats tables on the website. If not already installed, you will need pandas and BeautifulSoup4 in order to run this notebook. "


  "    import requests\n",
