# 3. Probabilistic Ranking

## 3.1. Introduction to Ranking

* **Goal**

>*What is the probability that player 1 defeats player 2?*


* **Generative Model for Game Outcomes**

>* Player has **skills** $\rightarrow$ compute skill difference

>$$s = w_1 - w_2$$

>* **Add noise**

>$$t = s + n \;\;\;\text{where}\;\;\; n \text{~} \mathcal{N}(0,1)$$

>* **Computer game outcome**

>$$y=\text{sign}(t)= \Bigg\{ \begin{matrix} +1 \rightarrow \text{Player 1 wins}\\ -1 \rightarrow \text{Player 1 wins}\end{matrix}$$

>* **Probability that player 1 wins**

>$$p(t|w_1,w_2) = \mathcal{N}(t;w_1-w_2,1)$$

>$$p(y=1|w_1,w_2)=p(t>0|w_1,w_2)=\Phi(w_1-w_2)$$

>$$\Phi(x)=\int^x_{-\infty} \mathcal{N}(z;0,1)dz = \int^\infty_0 \mathcal{N}(z;x,1)dz$$

>* **Likelihood**

>$$p(y|w_1,w_2) = \Phi(y(w_1-w_2))$$

* **TrueSkill: a probabilistic skill rating system**

>* **Prior**

>$$p(w_i) = \mathcal{N}(w_i|\mu_i,\sigma^2_i)$$

>* **Likelihood**
>  * $p(s|w_1,w_2)$: delta fn.
>  * $p(y|t)$: step fn.

>\begin{align}
p(y|w_1,w_2) &= \iint p(y|t)p(t|s)p(s|w_1,w_2)dsdt \\
&= \Phi (y(w_1-w_2))
\end{align}



>* **Posterior:** no longer Gaussian / does not factorise / looks like a high-dim ball

>\begin{align}
p(w_1,w_2|y) &= \frac{p(w_1)p(w_2)p(y|w_1,w_2)}{\iint p(w_1)p(w_2)p(y|w_1,w_2)dw_1 dw_2} \\
&= \frac{\mathcal{N}(w_1;\mu_1,\sigma_1^2)\mathcal{N}(w_2;\mu_2,\sigma_2^2) \Phi(y(w_1-w_2))}{\iint \mathcal{N}(w_1;\mu_1,\sigma_1^2)\mathcal{N}(w_2;\mu_2,\sigma_2^2)\Phi(y(w_1-w_2)) dw_1dw_2}
\end{align}

>* **Normalising constant:** closed form

>$$p(y) = \Phi \left( \frac{y(\mu_1-\mu_2)}{\sqrt{1+\sigma^2_1+\sigma^2_2}} \right) \;\;\;\rightarrow\;\;\; \text{smoother version of the likelihood}$$

## 3.2. Gibbs Sampling

* **Q. How do we integrate wrt an intractable posterior?**
* **The original integral**

>$$\mathbb{E}_{p(\mathbf{x})}[\phi(\mathbf{x})] = \bar{\phi} = \int \phi(\mathbf{x}) p(\mathbf{x}) \text{d}\mathbf{x} \;\;\; , \;\;\; \mathbf{x} \in \mathbb{R}^D$$

* **Numerical integration on a grid** (practical only to $D \leq 4$)

>$$\int{\phi(\mathbf{x})p(\mathbf{x})\text{d}\mathbf{x}} \approx \sum^T_{\tau=1} \phi(\mathbf{x}^{(\tau)}) p(\mathbf{x}^{(\tau)}) \Delta \mathbf{x}$$

* **Monte Carlo**

>$$\mathbb{E}_{p(\mathbf{x})} [\phi(\mathbf{x})] \approx \hat{\phi} = \frac{1}{T} \sum^T_{\tau=1} \phi(\mathbf{x}^{(\tau)}) \;\;\;,\;\;\; \mathbf{x}^{(\tau)} \text{ ~ } p(\mathbf{x})$$

>* $\hat{\phi}$: unbiased estimate with

>$$\mathbb{V}[\hat{\phi}] = \frac{\mathbb{V}[\phi]}{T} \;\;\;,\;\;\; \mathbb{V}[\phi] = \int \left( \phi(\mathbf{x})-\bar{\phi} \right)^2 p(\mathbf{x}) \text{d} \mathbf{x}$$

>* **NOTE:** the variance is independent of the dimension of $\mathbf{x}$

* **Markov Chain Monte Carlo $q(x'|x)$**

>$$\mathbf{x} \rightarrow \mathbf{x}' \rightarrow \mathbf{x}'' \rightarrow \mathbf{x}''' \rightarrow \cdots$$

>* This will eventually generate dependent samples from $p(\mathbf{x})$

* **Gibbs Sampling - Definition**

>$$x'_i \sim p(x_i|x_1,...,x_{i-1},x_{i+1},...,x_D)$$

>* For $x_i$, sample a new value from the conditional distribution of $x_i$ given all other variables

* **Gibbs Sampling - Advantages**

>* Parameter free algorithm, applicable if we know how to sample from the conditional distributions
>* It can be shown that this will eventually generate dependent samples from $p(\mathbf{x})$
>* Sampling from a joint distribution $\rightarrow$ sampling from a sequence of univariate conditional distributions

* **Gibbs Sampling - Disadvantages** 

>* Correlation between consecutive samples 
>  * $\rightarrow$ Samples are thinned (e.g.  every 10th or 100th)
>* Initial convergence
>  * $\rightarrow$ Initial samples are discarded (**what is convergence?**)
>* Dependence on starting point
>  * $\rightarrow$ Run several Gibbs samplers & compare
>* Challenging to judge the ***effective correlation length*** of a Gibbs sampler

## 3.3. Gibbs Sampling in TrueSkill

* **Notation**

>* Game ID: $g=1,...,G$
>* Player ID: $I_g$ and $J_g$
>* Outcome: $y_g=+1$ if $I_g$ wins & $y_g=-1$ if $J_g$ wins

* **Algorithm** (repeat 2&3)

>* **Step 1:** Initialise $\mathbf{w}$ e.g. from the prior $p(w)$

>* **Step 2:** Sample the performance differences from their conditional posteriors

>$$p(t_g|w_{I_g},w_{J_g},y_g) \propto \delta (y_g-\text{sign}(t_g))\mathcal{N}(t_g;w_{I_g}-w_{J_g},1)$$

>* **Step 3:** Jointly sample the skills from the conditional posterior

>$$p(\mathbf{w}|\mathbf{t},\mathbf{y}) = p(\mathbf{w}|\mathbf{t})\propto p(\mathbf{w}) \prod^G_{g=1} p(t_g|w_{I_g},w_{J_g})$$

>$$p(\mathbf{w}|\mathbf{t}) = \mathcal{N}(\mathbf{w};\mu,\Sigma) \;\;\;,\;\;\;
p(\mathbf{w}) = \mathcal{N}(\mathbf{w};\mu_0,\Sigma_0) \;\;\;,\;\;\;
p(t_g|w_{I_g},w_{J_g}) \propto \mathcal{N}(\mathbf{w};\mu_g, \Sigma_g)$$



* **Gaussian Identities**

>\begin{align}
p(t_g|w_{I_g},w_{J_g}) &\propto \exp \left( -\frac{1}{2}(w_{I_g}-w_{J_g}-t_g)^2 \right) \\
&\propto \mathcal{N} \left( -\frac{1}{2} \left( \begin{matrix} w_{I_g}-\mu_1 \\ w_{J_g}-\mu_2 \end{matrix} \right)^T \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} \left( \begin{matrix} w_{I_g}-\mu_1 \\ w_{J_g}-\mu_2 \end{matrix} \right) \right) 
\end{align}

>* Since $\mu_1 - \mu_2 = t_g$

>$$\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} \left( \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right) = \left( \begin{matrix} t_g \\ -t_g \end{matrix} \right)$$

>* Product of Gaussian

>$$\mathcal{N}(\mathbf{w};\mu_a,\Sigma_a) \mathcal{N}(\mathbf{w};\mu_b, \Sigma_b) = z_c \mathcal{N}(\mathbf{w};\mu_c,\Sigma_c)$$

>$$\Sigma_c^{-1}=\Sigma_a^{-1}+\Sigma_b^{-1} \;\;\;,\;\;\; \mu_c = \Sigma_c(\Sigma_a^{-1}\mu_a + \Sigma_b^{-1}\mu_b)$$

* **Conditional Posterior**

>$$\Sigma^{-1}=\Sigma^{-1}_0+\sum^G_{g=1} \Sigma^{-1}_g \;\;\;,\;\;\; \mu=\Sigma \left( \Sigma^{-1}_0 \mu_0 + \sum^G_{g=1} \Sigma^{-1}_g \mu_g \right)$$

>$$\tilde{\Sigma}^{-1} = \sum^G_{g=1} \Sigma^{-1}_g \;\;\;,\;\;\; \tilde{\mu} = \sum^G_{g=1} \Sigma^{-1}_g \mu_g$$

>* Each game precision $\Sigma_g^{-1}$ contains only 4 non-zero entries

* **Combined Precision & Mean**

>\begin{align}
[\tilde{\Sigma}^{-1}]_{ii} &= \sum^G_{g=1} \delta(i-I_g)+\delta(i-J_g) \\
[\tilde{\Sigma}^{-1}]_{i \neq j} &= -\sum^G_{g=1} \delta(i-I_g)\delta(j-J_g) + \delta(i-J_g)\delta(j-I_g) \\
\tilde{\mu}_i &= \sum^G_{g=1} t_g \left( \delta(i-I_g)+\delta(i-J_g) \right)
\end{align}

## 3.4. Factor Graphs and Message Passing

* **Factor Graph:** a type of **probabilistic graphical model**

>* Nodes: factors & variables
>* Edges: Dependency of factors on variables

><img src="images/image3_01.png" width=500>

>* Sums of produces $\rightarrow$ Products of sums
>* Complexity: $\mathcal{O}(K^5) \rightarrow \mathcal{O}(K^2)$

* **The Sum-Product Algorithm** - three update equations

>* **Update 1:** Marginals are the product of all incoming messages from neighbouring factors

>$$p(t) = \prod_{f \in F_t} m_{f \rightarrow t} (t)$$

>* **Update 2:** Messages from factors sum out all variables except the receiving one

>$$m_{f \rightarrow t_1} (t_1) = \sum_{t_2} \sum_{t_3} \cdots \sum_{t_n} f(t_1,t_2,...,t_n) \prod_{i \neq 1} m_{t_i \rightarrow f} (t_i)$$

>* **Update 3:** Messages from variables are the product of all incoming messages except the message from the receiving factor

>$$m_{t \rightarrow f} = \prod_{f_j \in F_t \;,\; f_j \neq f}  m_{f_j \rightarrow t} (t) = \frac{p(t)}{m_{f \rightarrow t}(t)}$$

## 3.5. Message Passing in TrueSkill

* **Algorithm**

><img src="images/image3_02.png" width=300>

>1. Update **skill marginals**
2. Compute **skill to game messages**
3. Compute **game to performance messages**
4. Approximate **performance marginals**
5. Compute **performance to game messages**
6. Compute **game to skill messages**

* **Step 0: Initialise incoming skill messages**, $m^{\tau=0}_{h_g \rightarrow w_i} (w_i)$

>\begin{align}
r^{\tau=0}_{h_g \rightarrow w_i} &= 0 \\ 
\mu^{\tau=0}_{h_g \rightarrow w_i} &= 0 
\end{align}

* **Step 1: Marginal skills**, $q^\tau(w_i)$

>\begin{align}
r^{\tau}_{i} &= r_0 + \sum_g r^{\tau}_{h_g \rightarrow w_i} \\ 
\lambda^\tau_i &= \lambda_0 + \sum_g \lambda^{\tau}_{h_g \rightarrow w_i} \end{align} 

* **Step 2: Skill to game messages**, $m^\tau_{w_i \rightarrow h_g}(w_i)$

>\begin{align}
r^{\tau}_{w_i \rightarrow h_g} &= r^\tau_i - r^{\tau}_{h_g \rightarrow w_i} \\ 
\lambda^\tau_{w_i \rightarrow h_g} &= \lambda^\tau_i - \lambda^{\tau}_{h_g \rightarrow w_i} 
\end{align}

* **Step 3: Game to performance messages**, $m^\tau_{h_g \rightarrow t_g}(t_g)$

>\begin{align}
v^\tau_{h_g \rightarrow t_g} &= 1 + v^\tau_{w_{I_g}\rightarrow h_g} + v^\tau_{w_{J_g}\rightarrow h_g} \\ 
\mu^\tau_{h_g \rightarrow t_g} &= \mu^\tau_{I_g \rightarrow h_g} - \mu^\tau_{J_g \rightarrow h_g}
\end{align}

* **Step 4: Marginal performances**, $q^{\tau+1}(t_g)$

>\begin{align}
p(t_g) &\propto \mathcal{N} (\mu^\tau_{h_g \rightarrow t_g}, v^\tau_{h_g \rightarrow t_g}) \mathbb{I} (y-\text{sign}(t)) \\
&\approx \mathcal{N}(\tilde{\mu}^{\tau+1}_g,\tilde{v}^{\tau+1}_g) = q^{\tau+1}(t_g)
\end{align}

>* Paramters of $q$: found by ***moment matching***

>\begin{align}
\tilde{v}^{\tau+1}_g &= v^\tau_{h_g \rightarrow t_g} \left( 1-\Lambda \left( \frac{\mu^\tau_{h_g \rightarrow t_g}}{\sigma^\tau_{h_g \rightarrow t_g}} \right) \right) \\
\tilde{\mu}^{\tau+1}_g &= \mu^\tau_{h_g \rightarrow t_g} + \sigma^\tau_{h_g \rightarrow t_g} \Psi \left( \frac{\mu^\tau_{h_g \rightarrow t_g}}{\sigma^\tau_{h_g \rightarrow t_g}} \right)
\end{align}

>* $\Psi(x) = \mathcal{N}(x)/\Phi(x)$ and $\Lambda(x) = \Psi(x)(\Psi(x)+x)$

* **Step 5: Performance to game message**, $m^{\tau+1}_{t_g \rightarrow h_g} (t_g)$

>\begin{align}
r^{\tau+1}_{t_g \rightarrow h_g} &= \tilde{r}^{\tau+1}_g - r^\tau_{h_g \rightarrow t_g} \\
\lambda^{\tau+1}_{t_g \rightarrow h_g} &= \tilde{\lambda}^{\tau+1}_g - \lambda^\tau_{h_g \rightarrow t_g}
\end{align}

* **Step 6: Game to skill message**, $m^{\tau+1}_{h_g \rightarrow w_{I_g}}(w_{I_g})$ and $m^{\tau+1}_{h_g \rightarrow w_{J_g}}(w_{J_g})$

>* **Player 1 (winner)**

>\begin{align}
v^{\tau+1}_{h_g \rightarrow w_{I_g}} &= 1 + v^{\tau+1}_{t_g \rightarrow h_g} + v^\tau_{w_{J_g} \rightarrow h_g} \\
\mu^{\tau+1}_{h_g \rightarrow w_{I_g}} &= \mu^\tau_{w_{J_g} \rightarrow h_g} + \mu^{\tau+1}_{t_g \rightarrow h_g}
\end{align}

>* **Player 2 (loser)**

>\begin{align}
v^{\tau+1}_{h_g \rightarrow w_{J_g}} &= 1 + v^{\tau+1}_{t_g \rightarrow h_g} + v^\tau_{w_{I_g} \rightarrow h_g} \\
\mu^{\tau+1}_{h_g \rightarrow w_{J_g}} &= \mu^\tau_{w_{I_g} \rightarrow h_g} + \mu^{\tau+1}_{t_g \rightarrow h_g}
\end{align}