# Extending the SVD with More Factors
More complicated matrices cannot be completely predicted just by using one set of factors as we have done. In that case, we have to introduce a second set of factors to refine our predictions. To do that, we subtract our predicted scores from the actual scores, getting the residual scores. Then we find a second set of HoleDifficulty2 and PlayerAbility2 numbers that best predict the residual scores.

Rather than guessing HoleDifficulty and PlayerAbility factors and subtracting predicted scores, there exist powerful algorithms than can calculate SVD factorizations for you. Let's look at the actual scores from the first 9 holes of the 2007 Players Championship as played by Phil, Tiger, and Vijay.

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import psycopg2

con = psycopg2.connect("dbname=test1")

In [4]:
col = ["Holes","Par","Phil","Tiger","Vijay"]
score = np.array([
    [1,4,4,4,5],
    [2,5,4,5,5],
    [3,3,3,3,2],
    [4,4,4,5,4],
    [5,4,4,4,4],
    [6,4,3,5,4],
    [7,4,4,4,3],
    [8,3,2,4,4],
    [9,5,5,5,5]])

df = pd.DataFrame(score, columns=col)
print df.to_string(index=False)

Holes  Par  Phil  Tiger  Vijay
    1    4     4      4      5
    2    5     4      5      5
    3    3     3      3      2
    4    4     4      5      4
    5    4     4      4      4
    6    4     3      5      4
    7    4     4      4      3
    8    3     2      4      4
    9    5     5      5      5


In [5]:
scores = df[df.columns[2:]]
print "Players scores"
print "=============="
print scores.to_string(index=False)

Players scores
Phil  Tiger  Vijay
   4      4      5
   4      5      5
   3      3      2
   4      5      4
   4      4      4
   3      5      4
   4      4      3
   2      4      4
   5      5      5


The 1-D SVD factorization of the scores is shown below. To make this example easier to understand, I have incorporated the ScaleFactor into the PlayerAbility and HoleDifficulty vectors so we can ignore the ScaleFactor for this example.

In [15]:
scoresmat = df.as_matrix(columns=["Phil","Tiger","Vijay"])
scoresmat

array([[4, 4, 5],
       [4, 5, 5],
       [3, 3, 2],
       [4, 5, 4],
       [4, 4, 4],
       [3, 5, 4],
       [4, 4, 3],
       [2, 4, 4],
       [5, 5, 5]])

In [34]:
U, s, V = np.linalg.svd(scoresmat, full_matrices=False)

In [35]:
print U

[[-0.35485277  0.08912626  0.63510622]
 [-0.38423571  0.18894413  0.10273509]
 [-0.21806175 -0.39604843 -0.28094945]
 [-0.35676267 -0.07556623 -0.32999583]
 [-0.32737973 -0.17538409  0.2023753 ]
 [-0.33177371  0.33260802 -0.48022986]
 [-0.29990669 -0.43989445 -0.23035562]
 [-0.27740183  0.6409644  -0.09809276]
 [-0.40922466 -0.21923012  0.25296912]]


In [36]:
print s

[ 21.11673273   2.0140035    1.423864  ]


In [33]:
print V

[[-0.52768502 -0.62047161 -0.58014093]
 [-0.82206436  0.20103353  0.53272479]
 [ 0.21391282 -0.75802408  0.61614998]]
