# Problem 1. [100 points] Ability of Sports Teams

Consider $n$ teams competing each other in a sports (e.g., soccer, football or basketball) tournament. Some teams are stronger than others due to reasons such as players' skills, coaching and staff support, amount of practice etc. We want to model each of the $n$ team's ability $a_x \in [0,1]\;\forall x = 1,...,n$. 

**The purpose of this problem is to learn the ability vector $a\in[0,1]^{n}$ from historical game outcome data.** Once determined, we may use $a\in[0,1]^{n}$ to predict the outcome of a future sports game.

## (a) [50 points] Maximum Likelihood Estimate

When teams $x$ and $y$ play each other, the probability that team $x$ wins is equal to $\mathbb{P}\left(a_x - a_y + v \geq 0\right)$ where the noise $v\sim\mathcal{N}\left(0,\sigma^{2}\right)$, the normal distribution with mean zero and variance $\sigma^2$. The noise random variable $v$ models things like players' injuries, weather during the game, players' mental stress etc.

Suppose we are given historical game outcome data $\left(x^{(i)},y^{(i)},z^{(i)}\right)$ for $i=1,...,m$ games, meaning that game $i$ was played between teams $x^{(i)}$ and $y^{(i)}$, and the result was 
$$z^{(i)} = \begin{cases}
+1 & \text{if team $x^{(i)}$ won},\\
-1 & \text{if team $y^{(i)}$ won}.
\end{cases}$$
We assume that there were no ties. We collect this historical data in a matrix $H\in\mathbb{R}^{m\times n}$ whose each row denotes a game, and each column denotes a team, i.e.,
$$H_{ij} = \begin{cases}
+z^{(i)} & \text{if $j=x^{(i)}$},\\
-z^{(i)} & \text{if $j=y^{(i)}$},\\
0 & \text{otherwise}.
\end{cases}$$
Assuming the outcomes of past games were statistically independent, formulate computing the maximum likelihood estimate of the ability vector $a^{*}_{\text{MLE}}$ as an optimization problem where matrix $H$ appears as a known parameter. **Explain all mathematical derivation, including why the derived problem is convex**.

**Hints:** You need to derive an optimization problem over the decision variable $a\in[0,1]^{n}$. If $v\sim\mathcal{N}\left(0,\sigma^{2}\right)$ then $\frac{v}{\sigma} \sim\mathcal{N}(0,1)$, the standard normal distribution. The cumulative distribution function for standard normal distribution is discussed in Lec. 10, p. 14.

1.
From the problem, we can rewrite the $\mathbb{P}(\text{x wins})$ as a **standard normal random variable CDF**
\begin{align}
\mathbb{P}(\text{x wins}) & = \mathbb{P}(a_x - a_y + v \geq 0) \\
& = \mathbb{P}(\frac{a_x - a_y}{\sigma} \geq -\frac{v}{\sigma}) \\
& = \mathbb{P}(-\frac{v}{\sigma} \leq \frac{a_x - a_y}{\sigma}) \\
& = \Phi_{X}(\frac{a_x - a_y}{\sigma}) \quad \text{for some standard normal variable } \frac{-v}{\sigma} \sim\mathcal{N}(0,1) \text{   (negative of standard normal is standard normal) }\\
\end{align}
2.
From the formulation of $H$, we see that it is a sparse, Kronecker/selection type matrix where for every match $i$:
$$H_{ij} = \begin{cases}
1 & \text{if $j=\text{winner}$},\\
-1 & \text{if $j=\text{loser}$},\\
0 & \text{otherwise}.
\end{cases}$$
3.
We can see that given a decision variable $a \in [0, 1]^{n}$, it follows:
\begin{align}
\frac{1}{\sigma} H a =
\begin{bmatrix}
\frac{1}{\sigma} (a_{0, \text{winner}} - a_{0, \text{loser}}) \\
\frac{1}{\sigma} (a_{1, \text{winner}} - a_{1, \text{loser}}) \\
...
\end{bmatrix}
\end{align}
4.
We see that taking the **elementwise norm CDF** of the above terms gives us the **elementwise** $\mathbb{P}(\text{winner wins})$ for $m$ i.i.d observations, with by definition **known probability** $=1$
\begin{align}
\text{element-wise  } \Phi(\frac{1}{\sigma} H a) & =
\begin{bmatrix}
\Phi(\frac{1}{\sigma} (a_{0, \text{winner}} - a_{0, \text{loser}})) \\
\Phi(\frac{1}{\sigma} (a_{1, \text{winner}} - a_{1, \text{loser}})) \\
...
\end{bmatrix} \\
& = \begin{bmatrix}
\mathbb{P}(\text{winner 0 wins}) \\
\mathbb{P}(\text{winner 1 wins}) \\
...
\end{bmatrix}
= 1^{m}
\end{align}
5. Choice: if we want to want to estimate the parameterization $a$ to best reproduce the observations in $H$, then we need to maximize the **elementwise norm CDF** to be as close to $1$ as possible. This is equivalent to maximizing the **joint cdf** $\underline{\Phi}$ for these $m$ i.i.d measurements. (derivation is similar to in Lec 17 pg 4)
6. Then it follows that:
\begin{align}
\underline{\Phi} (\frac{1}{\sigma}(a_{\text{winner}} - a_{\text{loser}})) & = \prod_{i=1}^{m} \Phi(\frac{1}{\sigma}(a_{i, \text{winner}} - a_{i, \text{loser}})) \\
\Rightarrow \log \underline{\Phi} & = \sum_{i=1}^{m} \log \Phi(\frac{1}{\sigma}(a_{i, \text{winner}} - a_{i, \text{loser}})) \\
\end{align}
7. And our maximum likelihood estimation **MLE** problem is $$\underset{a \in [0, 1]^{n}}{\max} \sum_{i=1}^{m} \log (\text{element-wise  } \Phi(\frac{1}{\sigma} H a)) $$
8. From the textbook pg 107, we know that the CDF of a standard normal variable is **log-concave**, and from lec 17 pg 4 we know that if $\Phi$ is log-concave, then this MLE is a **convex optimization problem** (convex objective function, and the constraint $a \in [0,1]^{n}$ is convex ($0 \leq a_i \leq 1$, intersection of halfspaces, a polyhedron, a convex set)

## (b) [50 points] Numerical Solution

Fix $\sigma = 0.25$, and $n=10$ teams playing $m=45$ matches in a tournament where each team plays another team once. For row index $i=1,...,m$, the tuple $\left(x^{(i)},y^{(i)},z^{(i)}\right)$ are given by the following array of 45 rows and 3 columns:
$$\texttt{[1 2 1;
1 3 1;
1 4 1;
1 5 1;
1 6 1;
1 7 1;
1 8 1;
1 9 1;
1 10 1;
2 3 -1;
2 4 -1;
2 5 -1;
2 6 -1;
2 7 -1;
2 8 -1;
2 9 -1;
2 10 -1;
3 4 1;
3 5 -1;
3 6 -1;
3 7 1;
3 8 1;
3 9 1;
3 10 1;
4 5 -1;
4 6 -1;
4 7 1;
4 8 1;
4 9 -1;
4 10 -1;
5 6 1;
5 7 1;
5 8 1;
5 9 -1;
5 10 1;
6 7 1;
6 8 1;
6 9 -1;
6 10 -1;
7 8 1;
7 9 1;
7 10 -1;
8 9 -1;
8 10 -1;
9 10 1]}$$
Use the above data to write a code to first construct the matrix $H\in\mathbb{R}^{m\times n}$, and then compute $a^{*}_{\text{MLE}}$ in the same code via cvx/cvxpy/Convex.jl. **Please submit your numerically computed $a^{*}_{\text{MLE}}$ as well as the code.**

In [21]:
import numpy as np
import cvxpy as cp

Z = np.array([
[1, 2 , 1],
[1, 3 , 1],
[1, 4 , 1],
[1, 5 , 1],
[1, 6 , 1],
[1, 7 , 1],
[1, 8 , 1],
[1, 9 , 1],
[1, 10,  1],
[2, 3 , -1],
[2, 4 , -1],
[2, 5 , -1],
[2, 6 , -1],
[2, 7 , -1],
[2, 8 , -1],
[2, 9 , -1],
[2, 10,  -1],
[3, 4 , 1],
[3, 5 , -1],
[3, 6 , -1],
[3, 7 , 1],
[3, 8 , 1],
[3, 9 , 1],
[3, 10,  1],
[4, 5 , -1],
[4, 6 , -1],
[4, 7 , 1],
[4, 8 , 1],
[4, 9 , -1],
[4, 10,  -1],
[5, 6 , 1],
[5, 7 , 1],
[5, 8 , 1],
[5, 9 , -1],
[5, 10,  1],
[6, 7 , 1],
[6, 8 , 1],
[6, 9 , -1],
[6, 10,  -1],
[7, 8 , 1],
[7, 9 , 1],
[7, 10,  -1],
[8, 9 , -1],
[8, 10,  -1],
[9, 10,  1],
])

sigma = 0.25
m = 45
n = 10
H = np.zeros((m, n))
for m, z in enumerate(Z):
    x_i, y_i, z_i = z
    H[m][x_i-1] = z_i
    H[m][y_i-1] = -z_i
all_zeros = not np.any(np.sum(H, axis=1))
assert(all_zeros)

print(H)

##############################

a = cp.Variable((n), nonneg=True)
constraints = [np.eye(n) @ a <= np.ones(n)]

objective = cp.Maximize(cp.sum(cp.log_normcdf((1/sigma) * H @ a)))

prob = cp.Problem(objective, constraints)

result = prob.solve()
print(result)
print(a.value)

[[ 1. -1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  0. -1.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  0. -1.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  0. -1.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0. -1.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0. -1.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0. -1.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0. -1.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0. -1.]
 [ 0. -1.  1.  0.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  1.  0.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0. -1.  0.  0.  0.  0.  0.  0.  0.  1.]
 [ 0.  0.  1. -1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  0.  1.  0.  0.  0.  0.  0.]
 [ 0.  0. -1.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0. -1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0. -1.  0.  0.]
 [ 0.  0.  1.  0.  0.  0.  0.  0. -1.  0.]
 [ 0.  0.  