# Colley's Method, a bit of Dejavu

Colley's Method modifies the winning percentage rate formula i.e $r_i = \frac{w_i}{i_i}$, where the rating $(r_i)$ is the wins ($w_i$) over the games played ($t_i$) of team $i$, ny using Laplace's Rule of Succession. The reasoning is that the usual method to rate team has a few flaws:
- At the start of the season $r_i = \frac{0}{0}$
- All wins and losses are treated equally. If a "weaker" team defeats a "stronger" or vice versa is treated the same as two balanced teams playing.
- A winless team has rating 0.


### Laplace's Rule of Succession

The Colley matrix is derived from Laplace's Rule of Succession It states the following: If $X_1, X_2, \dots, X_n$ are conditionally independent random variables where $X_i = 0$ or $X_i = 1, \forall i \in \{1, 2, \dots, n\}$ then

$$\mathbb{P}(X_{n+1}| X_1 + \cdots + X_n = s) = \frac{s + 1}{n + 2} $$
Where $s$ is the number of successes.

If we let $s = w_i$ and $n = t_i$, he have the basis of Colley's Method, 

$$ r_i = \frac{1 + w_i}{2 + t_i}$$

This causes all teams to start with $r_i = \frac{1}{2}$. Moreover, this means the ratings are centered at 0.5 so teams will move above or below 0.5 as the season progresses.

## Cool, what now?

Colley's matrix is very similar to the Massey matrix. The main difference is we add 2 to the total number of games played (the main diagonal).


$$
\[
\begin{pmatrix}
t_i + 2 & -n_{ij} & \cdots & -n_{ij} \\
-n_{ij} & t_i + 2 & \cdots & -n_{ij} \\
\vdots & \vdots & \ddots & \vdots \\
-n_{ij} & -n_{ij} & \cdots & t_i + 2 \\
\end{pmatrix}
\begin{pmatrix}
r_1 \\
r_2 \\
\vdots \\
r_n
\end{pmatrix}
= 
\begin{pmatrix}
b_1 \\
b_2 \\
\vdots \\
b_n
\end{pmatrix} \]
$$

$t_i$  := # of games played <br />
$n_{ij}$ := # of times team i played again team j <br />
$b_i$  := $ 1 + \frac{1}{2}(w_i - l_i)$, where $w_i = $ # of wins, $l_i = $ # of losses <br />
$r_i$  := ranking 


Each $b_i$ is derived from the Rule of Succession. This system of equations starts by modifying $w_i$. Note that, 

$$ \begin{align}
w_i &= \frac{w_i-l_i}{2} + \frac{w_i+l_i}{2}\\ 
   &= \frac{w_i-l_i}{2} + \frac{t_i}{2}\\
   &= \frac{w_i-l_i}{2} + \sum_{j=1}^{t_i} \frac{1}{2}
\end{align}$$

Now if we let teh cumulative ratings be the following summation, where $O_i$ is the teams candidate $j$ goes against we get, $$ \sum_{j \in O_i} r_j $$

In [2]:
# Load yo packages bruh
import numpy as np
import pandas as pd


#Open the test data file and get the data as a NumPy array.
f = open('test_data.txt')
candidates = f.readline().split()
f.close 

#This is the data that we will use.
num_data = np.loadtxt('test_data.txt', skiprows = 1)

#The data frame makes calculations easier.
matrix = pd.DataFrame(num_data)

print(candidates)
print(num_data)

['A', 'B', 'C', 'D', 'E']
[[5. 4. 3. 2. 1.]
 [5. 1. 4. 3. 2.]
 [2. 5. 4. 1. 3.]
 [5. 1. 3. 2. 4.]
 [2. 4. 3. 5. 1.]
 [1. 2. 4. 5. 3.]]


# Let's get to work

Using the same data as before, 


Let's use our data set from before.

|        |  A  |  B  |  C  |  D  |  E  |
|--------|-----|-----|-----|-----|-----|
|Voter 1 |  5  |  4  |  3  |  2  |  1  |
|Voter 2 |  5  |  1  |  4  |  3  |  2  |
|Voter 3 |  2  |  5  |  4  |  1  |  3  |
|Voter 4 |  5  |  1  |  3  |  2  |  4  |
|Voter 5 |  2  |  4  |  3  |  5  |  1  |
|Voter 6 |  1  |  2  |  4  |  5  |  3  |

Where the wins and losses for each candidate are:

|         |   A  |  B   |   C  |  D   |   E  |
|---------|------|------|------|------|------|
| Wins    |  14  |  11  |  15  |  12  |   8  |
| Lossess |  10  |  13  |   9  |  12  |  16  |

Our Colley matrix is a 5-by-5 matrix with ratings as follow.
$$\[
\begin{pmatrix}
26 & -6 & -6 & -6 & -6 \\
-6 & 26 & -6 & -6 & -6 \\
-6 & -6 & 26 & -6 & -6 \\
-6 & -6 & -6 & 26 & -6 \\
-6 & -6 & -6 & -6 & 26 
\end{pmatrix}
\begin{pmatrix}
r_1 \\
r_2 \\
r_3 \\
r_4 \\
r_5
\end{pmatrix}
= 
\begin{pmatrix}
3 \\
0 \\
4 \\
1 \\ 
-3
\end{pmatrix} \]
$$

Notice that extra two we add to the main diagonal gives us an inveritble symmetric matrix. Thus, we can solve the system of linear equations without modifying the matrix, as we did with Massey. 


In [3]:
# Find shape of matrix.
voters, num_cand = np.shape(num_data)

# Create a matrix with the negative number of candidates. 
colley = -(voters * np.ones([num_cand,num_cand]))

# Allocate data for wins/loss counters.
wins = np.zeros(num_cand)
losses = np.zeros(num_cand)

# Create the appropriate main diagonal.
for i in range(len(colley)):
    colley[i,i] =  (voters * (num_cand - 1)) + 2


#print(colley)
# Create rating vector
rating = np.zeros(num_cand)

#Brute force the wins and losses
for i in range(voters):
    for j in range(num_cand):
        for k in range(j, num_cand):
            
            points = num_data[i,j] - num_data[i,k]
            if points > 0:
                wins[j] += 1
                losses[k] += 1
            elif points < 0:
                wins[k] += 1
                losses[j] += 1
                 
#print(wins)
#print(losses)


In [4]:
#Calculate the values of vector b
b_vec = np.zeros(num_cand)

for i in range(num_cand):
    b_vec[i] = 1 + .5*(wins[i] - losses[i])

print(b_vec)

[ 3.  0.  4.  1. -3.]


# Results

The code above allows us to create the Colley  matrix and solve the system of linear equations. When we solve we get the following results:

|         |    A   |    B    |    C   |    D   |    E    |
|---------|--------|---------|--------|--------|---------|
| Colley  |  0.56  |   0.47  |  0.59  |   0.5  |   0.37  |
| Massey  |  0.33  |  -0.17  |  0.5   |    0   |  -0.67  |


# All Methods Compared


|            |      A     |      B     |     C     |      D     |      E     |
|-------- ---|------------|------------|-----------|------------|------------|
|Colley      |  $2^{nd}$  |  $4^{th}$  |  $1^{st}$ |  $3^{rd}$  |  $5^{th}$  |
|Massey      |  $2^{nd}$  |  $4^{th}$  |  $1^{st}$ |  $3^{rd}$  |  $5^{th}$  |
|Plurality   |  $1^{st}$  |  $3^{rd}$  |  $4^{th}$ |  $2^{nd}$  |  $4^{th}$  |
|Borda Count |  $2^{nd}$  |  $4^{th}$  |  $1^{st}$ |  $3^{rd}$  |  $5^{th}$  |


In [5]:
# Solve system of linear equations.
ratings = np.linalg.solve(colley, b_vec)

In [7]:
#Print results.
for i in range(len(candidates)):
    print('{}: '.format(candidates[i]), str(round(ratings[i],2)))

A:  0.56
B:  0.47
C:  0.59
D:  0.5
E:  0.37
