### Codio Activity 19.4: Implementing Funk SVD


**Expected Time = 60 minutes**

**Total Points = 40**


This activity focuses on using gradient descent to provide recommendations with collaborative filtering.  The purpose here is to get a high level introduction to the implementation of SVD Funk.  You will use the earlier ratings and a given user and item matrix to update the user factors.  In the next activity, you will implement the algorithms using `Surprise`.

### Index


- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [1]:
import pandas as pd
import numpy as np

#### The Data

Below, we load in the user reviews as well as a $Q$ and $P$ matrix with some randomly built values from a similar process to the last activity.

In [2]:
reviews = pd.read_csv('data/user_rated.csv', index_col=0).iloc[:, :-2]
Q = pd.read_csv('data/Q.csv', index_col=0)
P = pd.read_csv('data/P.csv', index_col=0)
Q = Q[['F1', 'F2']]
P = P[['F1', 'F2']]

In [3]:
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.0,4.0,,4.0,4.0
Mandy,,9.0,,3.0,8.0
Lenny,2.0,5.0,8.0,9.0,
Joan,3.0,,9.0,4.0,9.0
Tino,1.0,1.0,,9.0,5.0


In [4]:
Q.T #item factors

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
F1,-0.510093,0.181804,-7.554766,-0.520113,-0.458392
F2,-0.480414,-3.22799,-0.348831,-0.533289,-1.413967


In [5]:
P #user factors

Unnamed: 0,F1,F2
Alfred,-4.427436,-1.58782
Mandy,-9.01971,-3.437908
Lenny,-1.015713,-0.936057
Joan,-0.932923,-5.595791
Tino,-2.538133,-0.043783


[Back to top](#-Index)

### Problem 1

**10 Points**

#### Making Predictions

To make predictions you multiply a given row of $P$ by a column of $Q$.  Perform this operation for all users and items and assign a DataFrame of predicted values to `pred_df` below.  

HINT: Try to do this using matrix multiplication rather than a nested loop. Matrix Multiplication is done with @

In [6]:
Q

Unnamed: 0,F1,F2
Michael Jackson,-0.510093,-0.480414
Clint Black,0.181804,-3.22799
Dropdead,-7.554766,-0.348831
Anti-Cimex,-0.520113,-0.533289
Cardi B,-0.458392,-1.413967


In [7]:
### GRADED
pred_df = ''

    
# YOUR CODE HERE
#raise NotImplementedError()
pred_df = P @ Q.T

### ANSWER CHECK
pred_df

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.021214,4.320545,34.002121,3.149535,4.274625
Mandy,6.252507,9.457719,69.341043,6.524669,8.995648
Lenny,0.967803,2.836922,8.0,1.027474,1.789148
Joan,3.164175,17.89355,9.0,3.469398,8.339908
Tino,1.315717,-0.32011,19.19027,1.343466,1.225366


### Problem 2

**10 Points**

#### Measuring Error

Use your prediction for Mandy in terms of Clint Black to determine the error squared.  Assign this value to `ans2` below.

In [8]:
reviews[reviews.index == 'Mandy']['Clint Black']
pred_df[pred_df.index == 'Mandy']['Clint Black']

Mandy    9.457719
Name: Clint Black, dtype: float64

In [9]:
### GRADED
ans2 = ''

    
# YOUR CODE HERE
#raise NotImplementedError()
ans2 = (reviews[reviews.index == 'Mandy']['Clint Black'][0] - pred_df[pred_df.index == 'Mandy']['Clint Black'][0]) ** 2

### ANSWER CHECK
print(ans2)

0.20950654368339033


### Problem 3

**10 Points**

#### Error for all Mandy Predictions

Now, compute the error squared for each of Mandy's ratings where she had them -- Clint Black, Anti-Cimex, and Cardi B.  Assign these as a numpy array to `ans3`.

In [10]:
sum([reviews[reviews.index == 'Mandy'][c][0] for c in ['Clint Black', 'Anti-Cimex', 'Cardi B']])

20.0

In [11]:
( (reviews.iloc[1].dropna() - pred_df.iloc[1].loc[reviews.iloc[1].notnull()]) **2 ).values

array([ 0.20950654, 12.42328982,  0.99131421])

In [12]:
### GRADED
ans3 = ''

    
# YOUR CODE HERE
#raise NotImplementedError()
ans3 = ([(reviews[reviews.index == 'Mandy'][c][0] - pred_df[pred_df.index == 'Mandy'][c][0]) ** 2
             for c in ['Clint Black', 'Anti-Cimex', 'Cardi B']])
ans3 = np.array(ans3)
### ANSWER CHECK
print(ans3)
ans3

[ 0.20950654 12.42328982  0.99131421]


array([ 0.20950654, 12.42328982,  0.99131421])

### Problem 4

**10 Points**

#### Updating the Values

Now, perform the update for matrix $P$ based on the rule:

$$P_{a,b} := P_{a,b} - \alpha \sum_{j \in R_a}^N e_{a,j}Q_{b,j}$$

You will do this for the first factor of Mandy.  This means:

$$P_{1, 0} = -9.019710 - \alpha(e_{1, 1}Q_{1, 0} + e_{1, 3}Q_{3, 0} + e_{1, 4}Q_{4, 0})$$

Use $\alpha = 0.1$, and assign this new value as a float to `P_new`.

$r̂_{i,𝑗}$ is equal to $row_{i}(P) ⋅ col_{j}(Q)$. We can give the squared error of a prediction $𝑒_{𝑖,𝑗}^2$, as the difference between $(𝑟̂_{i,j} − 𝑟_{i,j})^2$

In [13]:
P

Unnamed: 0,F1,F2
Alfred,-4.427436,-1.58782
Mandy,-9.01971,-3.437908
Lenny,-1.015713,-0.936057
Joan,-0.932923,-5.595791
Tino,-2.538133,-0.043783


In [14]:
Q

Unnamed: 0,F1,F2
Michael Jackson,-0.510093,-0.480414
Clint Black,0.181804,-3.22799
Dropdead,-7.554766,-0.348831
Anti-Cimex,-0.520113,-0.533289
Cardi B,-0.458392,-1.413967


In [15]:
pred_df

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.021214,4.320545,34.002121,3.149535,4.274625
Mandy,6.252507,9.457719,69.341043,6.524669,8.995648
Lenny,0.967803,2.836922,8.0,1.027474,1.789148
Joan,3.164175,17.89355,9.0,3.469398,8.339908
Tino,1.315717,-0.32011,19.19027,1.343466,1.225366


In [16]:
reviews

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.0,4.0,,4.0,4.0
Mandy,,9.0,,3.0,8.0
Lenny,2.0,5.0,8.0,9.0,
Joan,3.0,,9.0,4.0,9.0
Tino,1.0,1.0,,9.0,5.0


In [17]:
reviews.iloc[1,1]

9.0

In [18]:
Q.iloc[4,0]

-0.4583915634968533

In [19]:
### GRADED
P_new = ''

    
# YOUR CODE HERE
#raise NotImplementedError()
e11 = (reviews.iloc[1,1] - pred_df.iloc[1,1]) ** 2
e13 = (reviews.iloc[1,3] - pred_df.iloc[1,3]) ** 2
e14 = (reviews.iloc[1,4] - pred_df.iloc[1,4]) ** 2
alpha = 0.1
P_new = -9.019710 - alpha * ( e11 * Q.iloc[1,0] + e13 * Q.iloc[3,0] + e14 * Q.iloc[4,0] )

### ANSWER CHECK
print(P_new)

-8.331926013496945


As an extra exercise, consider how to modularize this for each value of $P$.  Further, the update for $Q$ that occurs consistent with that of $P$ -- consider working through the full update process and modularizing the update process.