# Introduction to Data Science, CS 5963 / Math 3900
## Lecture 16: Rating/Ranking

We introduce the least squares, a.k.a. HodgeRank, method for rating. 

In [1]:
import numpy as np

### Least Squares Ranking

We consider the following hypothetical sports problem. There are four teams: Utah (U), BYU (Y), Colorado (C), and Idaho (I). Five games are played with the following results:

U vs Y: 20 - 10 

Y vs C: 7 - 10

U vs C: 10 -10

U vs I: 10 - 7

I vs Y: 7 - 7

We first construct the pairwise comparisons, $y_{i,j}$ defined by
$$
y_{i,j} = \frac{\text{points team $j$ scored - points team $i$ scored}}{\text{total points in game}}. 
$$

In [2]:
scores = np.array([(20, 10), (7, 10), (10, 10), (10, 7), (7, 7)])
print(scores)

y = (scores[:,1] - scores[:,0]) / (scores[:,0] + scores[:,1])
print(y)

[[20 10]
 [ 7 10]
 [10 10]
 [10  7]
 [ 7  7]]
[-0.33333333  0.17647059  0.         -0.17647059  0.        ]


We also number the teams

In [3]:
teams = ['Utah (U)','BYU (Y)','Colorado (C)','Idaho (I)']
for i,t in enumerate(teams):
    print(str(i) + ': ' + t)


0: Utah (U)
1: BYU (Y)
2: Colorado (C)
3: Idaho (I)


Construct the arc-vertex incidence matrix
$$
B_{k,j} = \begin{cases}
1 & j = \textrm{head}(k) \\
-1 & j = \textrm{tail}(k) \\
0 & \textrm{otherwise}. 
\end{cases}
$$
This just keeps track of which teams have played which teams. 

In [4]:
B = np.zeros((5, 4))

B[0,1] = 1; B[0,0] =-1;
B[1,2] = 1; B[1,1] =-1; 
B[2,2] = 1; B[2,0] =-1; 
B[3,3] = 1; B[3,0] =-1; 
B[4,1] = 1; B[4,3] =-1; 
print(B)

# now we have enough information just to print the  game results 
for i,sc in enumerate(y):
    head = np.where(B[i,:]==1)[0][0]
    tail = np.where(B[i,:]==-1)[0][0]
    print(teams[head] + ' vs. ' + teams[tail] + ': ' +str(sc))


[[-1.  1.  0.  0.]
 [ 0. -1.  1.  0.]
 [-1.  0.  1.  0.]
 [-1.  0.  0.  1.]
 [ 0.  1.  0. -1.]]
BYU (Y) vs. Utah (U): -0.333333333333
Colorado (C) vs. BYU (Y): 0.176470588235
Colorado (C) vs. Utah (U): 0.0
Idaho (I) vs. Utah (U): -0.176470588235
BYU (Y) vs. Idaho (I): 0.0


We now use the *lstsq* function in the np.linalg library to find the least squares rating, solving the least squares problem, 
$$
\min_{\phi} \ \| B \phi - y \|^2. 
$$

In [5]:
sol = np.linalg.lstsq(B,y)
phi = sol[0]
print(phi)

[ 0.12745098 -0.12745098  0.08823529 -0.08823529]


In [6]:
for i,t in enumerate(teams):
    print(t + ': rating = ' + str(phi[i]))

Utah (U): rating = 0.127450980392
BYU (Y): rating = -0.127450980392
Colorado (C): rating = 0.0882352941176
Idaho (I): rating = -0.0882352941176


You can sort this list and generate a ranking

In [7]:
np.sort(phi,)

array([-0.12745098, -0.08823529,  0.08823529,  0.12745098])