# Recommender Systems
- multiple users
- multiple items
## Cost Function:
- J = 1/2 * sum((theta.T * x - y)^2)
- for user j, item i
- $J = \frac{1}{2} \sum_{(i,j):r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{j=1}^n \sum_{k=1}^n (\theta_k^{(j)})^2 + \frac{\lambda}{2} \sum_{i=1}^n \sum_{k=1}^n (x_k^{(i)})^2$
## Collaborative Filtering
- we have x and theta
- iniitially we assume theta as random
### Cost Function:
- i: movies
- j: users
- Cost function to learn w, b:
- $J(w,b) = \frac{1}{2m} \sum_{i=1}^m (h_w(x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=1}^n w_j^2$
- Cost function to learn x:
- $J(x^{(1)},...,x^{(n_m)}) = \frac{1}{2} \sum_{i=1}^m (h_w(x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2} \sum_{i=1}^m \sum_{k=1}^n (x_k^{(i)})^2$
- Putting it together:
- $J(x^{(1)},...,x^{(n_m)},\theta^{(1)},...,\theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2$

## Binary Classification for Recommender Systems:
- y = 1 if user rated movie, 0 otherwise
- cost function:
- $J(x^{(1)},...,x^{(n_m)},\theta^{(1)},...,\theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2$
- Loss function for binary classification:
- $Loss(h_\theta(x),y) = -y log(h_\theta(x)) - (1-y) log(1-h_\theta(x))$

### Mean Normalisation:
- $x_k^{(i)} = x_k^{(i)} - \mu_k$
- it is better than feature scaling as it is not affected by outliers, and it is not affected by different scales of different features


## TensorFlow Implementation:
### Custom Training Loop:
- for example we are finding optimum values of w assuming b = 0
- we can use GradientTape() to find the gradients of the cost function wrt w
- GradientTape() finds the gradients of the cost function wrt all the variables
- then we can use optimizer.apply_gradients() to update the values of w
```python
w = tf.Variable(0.0)
optimizer = tf.keras.optimizers.Adam(0.1)
for i in range(1000):
    with tf.GradientTape() as tape:
        cost = tf.square(w-5)
    grads = tape.gradient(cost, [w])
    optimizer.apply_gradients(zip(grads, [w]))
```
## Finding related items:
- we can use cosine similarity to find the similarity between two items
- $similarity = cos(\theta) = \frac{A.B}{||A|| ||B||}$
- we can use the distance between two items to find the similarity between them
- $similarity = \frac{1}{distance}$


# Collaborative Filtering vs Content-Based Filtering:
### Content-Based Filtering:
- we have features of the items
- we can use these features to find the similarity between the items
- we can use this similarity to recommend items to the user
### Collaborative Filtering:
- we have the ratings of the users for the items
- we can use these ratings to find the similarity between the items
- we can use this similarity to recommend items to the user

## Deep Learning for Content-Based Filtering:
- we can use a neural network to find the similarity between the items
- Xu = features of users
- Xm = features of movies
- in $w^j * x^i + b^j$ -> we use Xu as w and Xm as x
### Cost Function:
- $J(x^{(1)},...,x^{(n_m)},\theta^{(1)},...,\theta^{(n_u)}) = \frac{1}{2} \sum_{(i,j):r(i,j)=1} ((\theta^{(j)})^T x^{(i)} - y^{(i,j)})^2 + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n} (\theta_k^{(j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2$
### Recommendation:
 two steps: 
- Retrieval - generate large list of plausible item candidates based on movie features
- Ranking - rank the plausible candidates based on user features