## Collaborative Filtering

**Cost Function**

Notation
- r(i,j) = 1 if user j has rater movie i (0 if they have not yet rated)
- $y^{(i,j)}$ = rating fiven by user j on movie i (if defined)
- $w^{(j)}$, $b^{(j)}$ = parameters for user j
- $x^{(i)}$ = feature vector for movie i
- $m^{(j)}$ = number of movies rated by user j

For user j and movie i, predicted rating =  $w^{(j)}$ ⋅ $x^{(i)}$ + $b^{(j)}$

**Cost Function** 
$$\frac{1}{2} \sum_{i:r(i,j) = 1} (w^{(j)} ⋅ x^{(i)} + b^{(j)} - y^{(i,j)}) + \frac{λ}{2} \sum_{k=1}^n (w_k^{(j)})^2 $$
- sum cost function for all users

**Collaborative Filtering**
$$\frac{1}{2} \sum_{i:r(i,j) = 1} (w^{(j)} ⋅ x^{(i)} + b^{(j)} - y^{(i,j)}) + \frac{λ}{2} \sum_{i=1}^{n_u} \sum_{k=1}^n (w_k^{(j)})^2 \frac{λ}{2} + \sum_{i=1}^{n_m} \sum_{k=1}^n (w_k^{(j)})^2$$

- cost function now a function of w, b, and x rather than just w and b
- through gradient descent, update all values


**Binary Labels**
- rather than predicting y through a linear funciton, use logistic function
 $$ \sum_{(i,j):r(i,j) = 1} L(f_{(w,b,x)}, y^{(i,j)}) $$
 where 
$$L(f_{(w,b,x)}, y^{(i,j)}) = -y^{(i,j)}log(f_{(w,b,x)}(x)) - (1-y^{(i,j)})log(1-f_{(w,b,x)}(x))$$
 $$ f_{(w,b,x)}(x) = g(w^{(j)} ⋅ x^{(i)} + b^{(j)}) $$

**Mean Normalizaiton**
- when there is a new user that hasnt provided any ratings, normaize the rows of the current rating matrix to be zero
 $$ w^{(j)} ⋅ x^{(i)} + b^{(j)} + μ_{1} $$



**Gradient Descent Implementation of Collaborative Filtering**

In [1]:
import tensorflow as tf

w = tf.Variable(3.0) # tf variables are the parameters we want to optimize
x = 1.0
y = 1.0 # target value
alpha = 0.01

iterations = 30
for iter in range(iterations):

    # use TensorFlow's Gradient tape to record the steps used to compute tje cost J, to enable auto differentiaion
    with tf.GradientTape() as tape:
        fwb = w*x
        costJ = (fwb - y)**2
    
    # Use the gradient tape to calculate the gradients of the cost with respect to the parameter w
    [dJdw] = tape.gradient(costJ, [w])

    # Run one step of gradient descent by updating the value of w to reduce the cost
    w.assign_add(-alpha * dJdw)

**Adam Optimizer Implementation of Collaborative Filtering**

In [None]:
import keras

# instance of optimizer
optimizer = keras.optimizers.Adam(learning_rate = 1e-1)

iterations = 200
for iter in range(iterations):
    # use TensorFlow's GradientTape to record operatiomns used to compute the cost

    with tf.GradientTape() as tape:

        # compute cost (forward pass is included in cost)
        cost_value = coifCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda)

        # use the fradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss
        grads = tape.gradient(cost_value, [X, W, b])

        # Run one step of gradient descent by updating the value of the variables to minimize the loss
        optimizer.apply_gradients(zip(grads, [X, W, b]))
        

## Practice

| General Notation       | Description                                                             | Python (if any) |
|:-----------------------|:------------------------------------------------------------------------|:----------------|
| $r(i,j)$               | scalar; = 1 if user j rated movie i, = 0 otherwise                      |                 |
| $y(i,j)$               | scalar; rating given by user j on movie i (if $r(i,j) = 1$ is defined)  |                 |
| $\mathbf{w}^{(j)}$     | vector; parameters for user j                                           |                 |
| $b^{(j)}$              | scalar; parameter for user j                                            |                 |
| $\mathbf{x}^{(i)}$     | vector; feature ratings for movie i                                     |                 |
| $n_u$                 | number of users                                                         | `num_users`     |
| $n_m$                 | number of movies                                                        | `num_movies`    |
| $n$                   | number of features                                                      | `num_features`  |
| $\mathbf{X}$           | matrix of vectors $\mathbf{x}^{(i)}$                                    | `X`             |
| $\mathbf{W}$           | matrix of vectors $\mathbf{w}^{(j)}$                                    | `W`             |
| $\mathbf{b}$           | vector of bias parameters $b^{(j)}$                                     | `b`             |
| $\mathbf{R}$           | matrix of elements $r(i,j)$                                             | `R`             |


**Goal of Collaborative Filtering** - generate 2 vectors of the same size
- For **each user**: a parameter vector that embodies the movie tastes of a user
- For **each movie**: a feature vector that embodies the description of the movie

In [None]:
def coif_cost_func(X, W, b, Y, R, lambda_):
        x
        """
        Returns the cost for the content-based filtering
        Args:
        X (ndarray (num_movies,num_features)): matrix of item features
        W (ndarray (num_users,num_features)) : matrix of user parameters
        b (ndarray (1, num_users)            : vector of user parameters
        Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
        R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
        lambda_ (float): regularization parameter
        Returns:
        J (float) : Cost
        """
        nm, nu = Y.shape
        J = 0

        for j in range(nu):
            w = W[j, :]
            b_j = b[0, j]
            for i in range(nm):
                x = X[i, :]
                y = Y[i, j]
                r = R[i, j]
                J += 1/2 * (r * (np.dot(w, x) + b_j - y)**2)
            
        # adding regularization
        J += lambda_/2 * (np.sum(np.square(W)) + np.sum(np.square(X)))

        return J
        


**Vectorized Implementation**

In [None]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

**Training Algorithm**

In [None]:
#  Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

iterations = 200
lambda_ = 1

for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

**Recommendations**

In [None]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(17):
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

______________________________________________________

## Content Based Filtering

**What's the difference?**
- **Collaborative Filtering**: recommends items based on ratings of users who gave similar ratings as you
- **Content-Based Fitlering**: recommends items based on features of user and item to find a good match -> requires features of users and items

**Examples**
- User Features $x_u^{(j)}$
    - Age
    - Gender
    - Country
    - Movies watched

- Movie Filters $x_m^{{i}}$
    - Year
    - Genre(s)
    - Reviews
    - Average ratings


### Deep Learning for Content-Based Filtering

- using a few layers, will output a vector ($v_u$) that describes the user
- inputs can be different size, outputs same size
- combine together computing $v_u$ ⋅ $v_m$
- training with a single cost function using graiend descent or other optimization
- **cost function**: J = $ \sum_{(i,j):r(i,j) = 1} (v_u^{(j)} ⋅ v_m^{(i)}  - y^{(i,j)})^2 $

**User Network**
- takes as input: features of user
- output: $v_u$

**Movie Network**
- takes as input: features of the movie
- output: $v_m$



### Recommending from a Large Catalogue
- many large scale recommender systems are implemented in 2 steps
1. retrevial step
- generate large list of plausable item candidates
- combine and retrieve items into list, removing duplicates and items already watched/purchases
2. ranking step
- take retrieved list and rank them using learned model

In [None]:
user_NN = tf.keras.models.Sequential(
    [tf.keras.Dense(256, activation = 'relu'),
     tf.keras.Dense(128, activation = 'relu'),
     tf.keras.Dense(32)
     ])

item_NN = tf.keras.models.Sequential(
    [tf.keras.Dense(256, activation = 'relu'),
     tf.keras.Dense(128, activation = 'relu'),
     tf.keras.Dense(32)
     ])

# create user input and point back to base network
input_user = tf.keras.layers.Input(shape=(num_user_features))
vu = user_NN(input_user)
vu = tf.linalg.12_normalize(vu, axis = 1)

# create item input and point back to base network
input_item = tf.keras.layers.Input(shape=(num_item_features))
vm = item_NN(input_item)
vm = tf.linalg.12_normalize(vm, axis = 1)

# measure similarity of two vector outputs
output = tf.keras.layers.Dot(axis=1)([vu, vm])

# specify inputs and outputs of the model
model = Model([input_user, input_item], output)

# specify cost function
cost_fn = tf.keras.losses.MeanSquaredError()


