## Cost Function

Notation
- r(i,j) = 1 if user j has rater movie i (0 if they have not yet rated)
- $y^{(i,j)}$ = rating fiven by user j on movie i (if defined)
- $w^{(j)}$, $b^{(j)}$ = parameters for user j
- $x^{(i)}$ = feature vector for movie i
- $m^{(j)}$ = number of movies rated by user j

For user j and movie i, predicted rating =  $w^{(j)}$ ⋅ $x^{(i)}$ + $b^{(j)}$

**Cost Function** 
$$\frac{1}{2} \sum_{i:r(i,j) = 1} (w^{(j)} ⋅ x^{(i)} + b^{(j)} - y^{(i,j)}) + \frac{λ}{2} \sum_{k=1}^n (w_k^{(j)})^2 $$
- sum cost function for all users

**Collaborative Filtering**
$$\frac{1}{2} \sum_{i:r(i,j) = 1} (w^{(j)} ⋅ x^{(i)} + b^{(j)} - y^{(i,j)}) + \frac{λ}{2} \sum_{i=1}^{n_u} \sum_{k=1}^n (w_k^{(j)})^2 \frac{λ}{2} + \sum_{i=1}^{n_m} \sum_{k=1}^n (w_k^{(j)})^2$$

- cost function now a function of w, b, and x rather than just w and b
- through gradient descent, update all values


**Binary Labels**
- rather than predicting y through a linear funciton, use logistic function
 $$ \sum_{(i,j):r(i,j) = 1} L(f_{(w,b,x)}, y^{(i,j)}) $$
 where 
$$L(f_{(w,b,x)}, y^{(i,j)}) = -y^{(i,j)}log(f_{(w,b,x)}(x)) - (1-y^{(i,j)})log(1-f_{(w,b,x)}(x))$$
 $$ f_{(w,b,x)}(x) = g(w^{(j)} ⋅ x^{(i)} + b^{(j)}) $$

**Mean Normalizaiton**
- when there is a new user that hasnt provided any ratings, normaize the rows of the current rating matrix to be zero
 $$ w^{(j)} ⋅ x^{(i)} + b^{(j)} + μ_{1} $$



**Gradient Descent Implementation of Collaborative Filtering**

In [1]:
import tensorflow as tf

w = tf.Variable(3.0) # tf variables are the parameters we want to optimize
x = 1.0
y = 1.0 # target value
alpha = 0.01

iterations = 30
for iter in range(iterations):

    # use TensorFlow's Gradient tape to record the steps used to compute tje cost J, to enable auto differentiaion
    with tf.GradientTape() as tape:
        fwb = w*x
        costJ = (fwb - y)**2
    
    # Use the gradient tape to calculate the gradients of the cost with respect to the parameter w
    [dJdw] = tape.gradient(costJ, [w])

    # Run one step of gradient descent by updating the value of w to reduce the cost
    w.assign_add(-alpha * dJdw)

**Adam Optimizer Implementation of Collaborative Filtering**

In [None]:
import keras

# instance of optimizer
optimizer = keras.optimizers.Adam(learning_rate = 1e-1)

iterations = 200
for iter in range(iterations):
    # use TensorFlow's GradientTape to record operatiomns used to compute the cost

    with tf.GradientTape() as tape:

        # compute cost (forward pass is included in cost)
        cost_value = coifCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda)

        # use the fradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss
        grads = tape.gradient(cost_value, [X, W, b])

        # Run one step of gradient descent by updating the value of the variables to minimize the loss
        optimizer.apply_gradients(zip(grads, [X, W, b]))
        