#### Modelling the sequential FPL team selection process as a Belief-State Markov Decision Process
- For the $i$-th gameweek, we define the following terms:
    - $M_i$ is set of matches in gameweek $i$.
    - $P_i$ is the set of players available for selection in gameweek $i$.
    - $A_i$ is the set of actions available in gameweek $i$, where $a \in A_i$ is a subset of $P_i$ and observes all team selection constraints.
    - $p_i \in P_i$ is associated with its FPL-designated position $pos(p_i)$ and price $pr(p_i)$.
    - $\tau_p \in \tau$ is a system of distributions representing player's performance/influence on the matchplay.
    - $O_i$ is the set of match observations in gameweek $i$
    - $o \in O_i$ includes both the result of the matches and the performance of the players in the selected team e.g. goals, assists, clean sheets, yellow cards, red cards, bonus points. The probability of each $o \in O_i$ is somehow dependent on the players' characteristics ($\tau$) i.e. a team with strong attackers is more likely to score goals, therefore, $P(o | \tau)$ is dependent on $\tau$.
    - $R(o, a_{prev}, a_{curr})$ is the reward function, which returns the points scored by the selected team $a_{curr}$, given the match observations $o$. The previous team $a_{prev}$ is also provided to penalize the agent for any player poor player transfers or transfers beyond the allowed number.

Markov's Decision Process (MDP) 
- A state $S_i$ would encapsulate
    - $M_{i,..., 38}$ - set of upcoming fixtures for that gameweek
    - $P_i$ - set of players available for selection
    - $o \in O_{i - 1}$ - the outcome of the previous gameweek
    - $\tau$ - the system of distributions representing players' abilities
- An action $A_i$ is the set of teams selectable in gameweek $i$
- $R$ is the corresponding reward function

Belief model ($\tau$):
- Represent uncertainty over players' abilities and generate samples $\tau$ from the distribution $Pr(\tau | b)$.
- Three distributions are used to model the players' abilities:
    - $\rho_p$ - a three-state categorical distribution representing the player's probability of starting a match, being substituted, or not playing at all i.e. (start, sub, unused).
    - $\omega_p$ - a Bernoulli/Binomial distribution over a single trial, representing the probability of a player scoring a goal given he was playiong at the time
    - $\psi_p$ - a Bernoulli distribution representing the probability of a player providing an assist given he was playing at the time
- Define prior distributions over the parameters of the above distributions and update them using the match observations $o$ to obtain new posterior distributions.
- Use simple closed-form equations e.g. Beta and Dirichlet conjugate priors to update the priors.
- Sample from these conjugate distributions to generate $\tau_p$.
- Define hyperparemeters uniformly across all players i.e. $$\omega_p \sim Beta(1, 1), \psi_p \sim Beta(1, 1),  \rho_p \sim Dirichlet(\frac{1}{4}, \frac{1}{4}, \frac{1}{4})$$
- Potential to use performance data from previous seasons to define priors
- Define 4 global multinomial distributions $S_{pos}$ - one for each position - to describe the distribution of minutes players who play the same position $pos$ are likely to play in a match, given they start the match.
- Player absence via injury/suspension or any other reson is modelled by setting the probability of starting and substituting to zero i.e. $Pr(\rho_p = start) \text{and} Pr(\rho_p = sub) = 0$.

#### Formulating the belief-state MDP
- The belief state at gameweek $i$, $b_i$, is an instantiation of our belief model, updated with the match observations $O_{i - 1}$.
- We observe the posterior player characteristics by updating the belief state in response to an observation $o \in O_i$ via the Bayes rule: $$Pr(\tau | b_{i + 1}) \propto Pr(o | \tau)Pr(\tau | b_i)$$
- The agent can perform optimally by maximizing the value of the Bellman equation: $$V(b_i) = \max_{a \in A_i} Q(b_i, a)$$
- The Q-function is defined as: $$Q(b_i, a) = \int_{\tau} Pr(\tau | b_i)  \int_{o \in O_i} Pr(o | \tau) \left[r_i + \gamma V(b_{i + 1}) \right] \text{dod}\tau$$
- Where:
    - $\gamma \in [0, 1)$ is the discount factor for future rewards
    - $r_i = R(o, a_{prev}, a)$ is the reward function
    - $V(b_{i + 1})$ is the value of the next belief state
- Solutions to the Bellman equation is intractable due to the size of the outcome space $|O_i|$, the size of the action space $|A_i|$, and the need to consider up to 38 gameweeks in order to calculate Q-values exactly.
- We can work around this sampling from $O_i$ and simulating match outcomes to approximate the Q-function


#### Sampling Outcomes
- We describe a model for sampling outcomes for gameweek $i$ from $Pr(O_i | \tau)$. This is then combined with the belief model described beforehand to obtain a joint distribution of player abilities and match outcomes, thus treating uncertainty in player abilities in a Bayesian manner (observations) $$Pr(O_i | \tau)Pr(\tau | b_i)$$
- Sampling procedure for a single match that also extends to any other match in the gameweek (it also takes the perspective of the home team, which naturally extends to the away team as well):
    - Define $P_H$ and $P_A$ as the set of players available the home and away teams respectively.
    - Sample $\tau_p$ for each player $p \in P_H$  from the belief model $Pr(\tau_p | b_i)$
    - Randomly select eleven players from $P_H$ in proportion to their probability of starting the match i.e. $Pr(\rho_p = start)$
        - These players constitute the starting lineup $L_H$
    - The minute each player leaves the pitch is sampled from the $S_{pos}$ distribution for the player's position
    - Each player in $P_H$ and not in $L_H$ is assigned to the set of substitutes $U_H$
        - At the start of each minute of the match, we check if any player in $L_H$ is scheduled to be substituted
        - If so, we randomly select a player from $U_H$ to replace the outgoing player in proportion to the probability of the player being substituted i.e. $Pr(\rho_p = sub)$
        - The replacement is added to $L_H$ (removed from $U_H$). We further assume that the player being substituted is not substituted again in the same match.
        - If a goal is scored according to the underlying team-based model, then it is allocated to player $p$ with probability $Pr(\omega_p = 1)$ while an assist is allocated to player $p$ with probability $Pr(\psi_p = 1)$.
    - These point estimates may then be used in combination with the MDP reward function $R$ to approximate the immediate reward from performing any action, as well as to guide the exploration of high quality regions of the action space.


In [None]:
import gameweek_simulator
import os
import pandas as pd

GAMEWEEKS = 38

fixtures_df = pd.read_csv(filepath_or_buffer="../data/2023-24/fixtures.csv")
position_minutes_file_path = os.path.join(gameweek_simulator.DATA_FOLDER, gameweek_simulator.POSITION_MINUTES_FILE)
position_minutes_df = pd.read_csv(filepath_or_buffer=position_minutes_file_path)
for gw_count in range(1, GAMEWEEKS + 1, 1):
    fixtures_df = fixtures_df[fixtures_df["GW"] == gw_count] # Filter fixtures for specified gameweek
    player_points_df = gameweek_simulator.simulate_gameweek(
        season_start_year="2023", 
        gameweek=str(gw_count),
        fixtures_df=fixtures_df,
        position_minutes_df=position_minutes_df
    ).sort_values(by="points")
    player_points_df.head()
    break

  np.log(poisson.pmf(x, lambda_x)) + np.log(poisson.pmf(y, mu_y)))
  return (np.log(self._rho_correction(x, y, lambda_x, mu_y, rho)) +


Optimization terminated successfully    (Exit mode 0)
            Current function value: 1098.770123460026
            Iterations: 50
            Function evaluations: 2229
            Gradient evaluations: 50


Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 32 seconds.
There were 15847 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 9 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 8 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 34 seconds.
There were 15158 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 39 seconds.
There were 15123 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 60 seconds.
There were 15206 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 4 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 7 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 27 seconds.
There were 16898 divergences after tuning. Increase `target_accept` or reparameterize.
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 7 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 8 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 36 seconds.
There were 16423 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 63 seconds.
There were 14678 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 6 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 11 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 24 seconds.
There were 15849 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 42 seconds.
There were 15194 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 5 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 8 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 44 seconds.
There were 15492 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 9 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 47 seconds.
There were 14688 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 24 seconds.
There were 15873 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 8 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 38 seconds.
There were 14497 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 30 seconds.
There were 17408 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 10 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 12 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 8 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 33 seconds.
There were 16566 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 5 seconds.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()

Sampling 4 chains for 500 tune and 5_000 draw iterations (2_000 + 20_000 draws total) took 25 seconds.
There were 14051 divergences after tuning. Increase `target_accept` or reparameterize.
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [start_sub_unused_dirichlet_dist, score_beta, assist_beta]


Output()