In [None]:
---
title: "Determining optimal shot selection strategy in Super Netball's Power 5 period via numerical simulation"
format:
    html:
        code-fold: true
jupyter: python3
---

In [17]:
#Load the necessary libraries to run analysis
import pandas as pd
import numpy as np
import scipy.stats as stats
import random
import matplotlib.pyplot as plt

# Abstract

TODO: insert abstract...

# Introduction

Netball is a court-based team sport played predominantly among Commonwealth nations, and has one of the highest participation rates for team sports in Australia [@AustralianSportsCommission2020]. As in many court-based team sports, the goal of netball is to score more than the opposition. Netball is, however, unique in that goals may only be scored by two players on each team from within the 'shooting circle' (i.e. a half circle around the goal with a 4.9m radius) at their end of the court [@INFrules]. Traditionally, goals scored from within this circle result in one goal for the team [@INFrules]. In the 2020 season, Australia's national elite-level league (i.e. Suncorp Super Netball) made the decision to introduce the 'Super Shot' [@NetballAusSuperShotIntro]. The Super Shot period provided teams an opportunity to gain one- versus two-goals for successful shots made from the 'inner' (i.e. 0m-3.0m) versus 'outer' (i.e. 3.0m-4.9m) circles, respectively, within the final five minutes of each quarter (i.e. the Power 5 period) [@NetballAusSuperShotIntro]. The rule has remained in place over subsequent seasons since the 2020 inception.

Our analysis prior to the 2020 season [@Fox2020] suggested that the added value of the Super Shot (i.e. two-goals) aligned well with the elevated risk of shooting from long range, and that teams may have been able to maximise their scoring by taking a high proportion of Super Shots. These findings were, however, based on shooting statistics from a past season where the Super Shot rule was not in effect. Further investigation of netball competitions where a 'two-goal rule' was in place (i.e. international Fast5) resulted in a much higher risk of missing long-range shots [@Fox2020]. We hypothesised that the elevated risk of missing long-range shots with a 'two-goal rule' in place stems from situational factors, whereby defensive strategies were likely altered to place a heavier emphasis on defending long-range shots [@Fox2020]. Data from the early years of the Super Shot in place provides an opportunity to re-evaluate the risk:reward value of taking Super Shots with more valid shooting statistics. Further, these data can provide a better foundation for simulating Super Shot periods as a means to identify optimal shooting strategies. In the present study, we firstly re-visited our analysis of whether the 2:1 goal weighting is appropriate based on the relative risk of missing a shot from the outer versus inner circle during the Power 5 period using data from the 2020-2022 seasons. Second, we ran simulations of the Power 5 period using shooting statistics from the 2020-2022 seasons for each team individually in an attempt to identify optimal team-specific shooting strategies for the proportion of Super Shots to take. Third, we ran simulations of teams competing against one another during a Power 5 period to determine how varying the proportion of Super Shots could impact scoring margin.

# Methods

## Participants

Participants for this study included all players across the eight teams from the 2020-2022 seasons season of the Australian national netball league (i.e. Suncorp Super Netball). Our study included publicly available, pre-existing data held on the Suncorp Super Netball match centre. An exemption from ethics review was granted by the Deakin University Human Research Ethics Committee (***TODO: add details***).

## Data Collection

We used the ***{SuperNetballR}*** (***TODO: citation***) package to extract match data from all regular season games during the 2020-2022 Super Netball Seasons via the Champion Data (official provider of competition statistics) match centre. Within the match centre data – all shots are labelled with identifiers that place them in the inner or outer circle, along with whether they were made or missed. Combined with the timestamp of these events within quarters, we extracted team-specific shooting statistics for: (i) the total number of shots taken; (ii) the number of shots taken from the inner and outer circle; and (iii) the number of made and missed shots from the inner and outer circle from each Power 5 period across matches.

## Data Analysis

Our study required estimating the probability of making versus missing shots from the inner versus outer circle across the different teams. We achieved this by defining a beta distribution in a probability density function for the different circle zones, specified by:

$$f(x,a,b) = \frac{\Gamma(a+b)x^{a-1}(1-x)^{b-1}}{\Gamma(a)\Gamma(b)} $$

where $a$ and $b$ represent the number of missed and made shots within a circle zone, respectively; $x$ is the probability of $a$ relative to $b$; and $\Gamma$ is the gamma function [@Virtanen2020,@NIST2020]. Probability density functions were created for made versus missed shots in the inner and outer circles for each team, as well as all teams combined, to be used in subsequent analyses.

In [4]:
#Load in the team shooting data files
scoreFlowData = pd.read_csv('..\\Data\\scoreFlowSuper.csv')
teamSuperCounts = pd.read_csv('..\\Data\\teamSuperCounts.csv')
oppSuperCounts = pd.read_csv('..\\Data\\oppSuperCounts.csv')

To examine the relative value of the 2:1 point ratio, we replicated the approach from our previous work [@Fox2020], but this time with data from the 2020-2022 seasons. Specifically, we compared the average relative odds (± 95% confidence intervals [CI]) of missing from the outer versus inner circle during the Power 5 period. This was achieved by dividing randomly sampled values (*n* = 100,000) from the probability density functions of the outer by those from the inner circle at each sample iteration [@Fox2020]. This analysis was run using shooting statistics from the entire league as well as individual teams to give overall and team-specific risk:reward values for attempting Super Shots. We also applied this analysis to opposition shooting statistics against each team, providing a risk:reward value for attempting Super Shots against an opponent. Theoretically, the relative odds of missing from the outer to inner circle should match the ratio of points awarded (i.e. 2:1) for the Super Shot to represent 'good value.'

In [24]:
####### Run analysis in .py script and just present here...

#Set a check in place whether to run this analysis or load data in from previous run
runRelOdds = True ####change to false to load in existing data

#Set number of trials to run
nRelOddsTrials = 100000

#Run analysis
if runRelOdds:
    
    #Set a dictionary to store data in
    relOddsData = {'squadId': [], 'mean': [], 'lower95': [], 'upper95': []}
    relOddsDefData = {'squadId': [], 'mean': [], 'lower95': [], 'upper95': []}
    
    #Run through first iteration using all teams data
    
    #Set seed
    np.random.seed(12345)
    
    #Group by shot type to create sums
    shotSums = teamSuperCounts.groupby('scoreName').sum()
    
    #Create the beta distributions for inner and outercircle
    betaInner = stats.beta(shotSums['shotCount']['miss'], shotSums['shotCount']['goal'])
    betaOuter = stats.beta(shotSums['shotCount']['2pt Miss'], shotSums['shotCount']['2pt Goal'])
    
    #Sample values from beta distributions
    valsInner = np.random.beta(shotSums['shotCount']['miss'], shotSums['shotCount']['goal'],
                              size = nRelOddsTrials)
    valsOuter = np.random.beta(shotSums['shotCount']['2pt Miss'], shotSums['shotCount']['2pt Goal'],
                              size = nRelOddsTrials)
    
    #Determine relative odds of missing from outer to inner circle
    sampleRatios = valsOuter / valsInner
    
    #Create values for empirical cumulative distribution function
    cdfSplit_x = np.sort(sampleRatios)
    cdfSplit_y = np.arange(1, nRelOddsTrials+1) / nRelOddsTrials
    
    #Calculate confidence intervals of the cumulative distribution function
    #Find where the CDF y-values equal 0.05/0.95, or the closest index to this,
    #and grab that index of the x-values
    lower95ind = np.where(cdfSplit_y == 0.05)[0][0]
    upper95ind = np.where(cdfSplit_y == 0.95)[0][0]
    ci95_lower = cdfSplit_x[lower95ind]
    ci95_upper = cdfSplit_x[upper95ind]
    
    #Store values in dictionary
    relOddsData['squadId'].append('all')
    relOddsData['mean'].append(sampleRatios.mean())
    relOddsData['lower95'].append(ci95_lower)
    relOddsData['upper95'].append(ci95_upper)
    
    #Repeat analysis looping through teams
    squad = 'Fever'
    shotSums = teamSuperCounts.groupby(['squadId','scoreName']).sum()
    
    shotSums['shotCount']['Fever']['goal']

    

783

***TODO: consider same simulation approach but with 'tendency' as a parameter - a more random indication of whether the shot will be taken as a standard or Super (no to all out tendency being 0%, 25%, 50%, 75% and 100% likelihoods) - this becomes a categorical parameter over a continuous parameter of shot proportion. Normalise the data from these simulations to no Super Shots, and investigate the goal benefit (or detriment) of changing Super Shot tendency relative to this.***

Next we ran a series of simulations (*n* = 1,000 each) of the Power 5 period for each team, altering the proportion of Super Shots taken from 0% to 100% at 10% increments. We used the previously calculated probabilities of making versus missing shots from within and outside the outer circle during the Power 5 period from each team to estimate the number of goals the team may score during this period. Across each individual simulation, the total number of shots and the proportion of these that were Super Shots was varied. We calculated the mean and standard deviation for the number of shots a team would expect during a Power 5 period based on overall season statistics — and the total number of shots for a team in a given simulation was randomly sampled from a truncated normal distribution between the lower and upper 95% CI limits of the mean/standard deviation. The number of standard versus Super Shots being taken within the simulation was then determined based on the current Super Shot proportion (i.e. from 0% to 100%) being examined. The success (i.e. make vs. miss) of each individual standard or Super Shot within the simulation was then determined by generating a random value between 0 and 1, alongside a value sampled from the teams relevant probability density function of making a shot from the relevant location (i.e. inner or outer circle). If the value sampled from the probability density function was greater versus lower than the random value — the shot was considered successful versus unsuccessful, respectively. After all individual shots were simulated, the total team score was summed given the value of the made standard and Super shots. We used linear regression models separately for each team to understand the effect of the total number of shots and proportion of these that were Super Shots taken (i.e. independent variables) on total score (i.e. dependent variable).

***TODO: take similar approach to above and include a 'tendency' variable - figure out a way to test the effectiveness of tendency against other approaches***

A similar approach was taken in generating 'competitive' simulations of Power 5 periods. A series of simulations (*n* = 1,000) of Power 5 periods were ran between all combinations of teams. We once again used the probabilities of making versus missing shots from the inner and outer circles during the Power 5 period from each team to estimate scoring. To isolate the effect of Super Shot proportion, we matched the number of shots each team took in the simulations. We used the same truncated normal distribution mentioned in the previous section to select the matched number of shots each team would receive in the simulation. Each series of 1,000 simulations was repeatedly ran between all combinations of teams while altering the proportion of Super versus standard shots. For brevity in these simulations, the proportion of Super Shots taken by each team was altered from 0% to 100% at 25% increments - with every possible combination between teams simulated. Shot success was determined in the same manner as previously outlined (i.e. random number generator vs. value sampled from the teams shot success probability distribution). After the shots from both teams were simulated, each teams score was summed given the value of made standard and Super shots and the subsequent margin determined. We used a single linear regression model to understand the effect of the relative proportion of Super Shots taken between teams (i.e. independent variable) on margin (i.e. dependent variable).