#### Collaborative Filtering - Recommendations
* 1. Memory-based
* 2. Model based

Data Set
* needs set of items
* set of users - that react to the items
* explicit - like likes/dislikes
* implicit - viewing item, added to wish list, time spent
* seen in the form of a matrix 
* rows = ratings given by a user
* columns = contain ratings received by an item
* most cells will be empty - as users only rate a few items (sparse matrix)

Steps Involved
* determine which users are similar to each other 
* determine rating for a given item based on ratings of similar users
* accuracy of ratings - via error - RMSE or MAE

#### Memory Based
Finding Similar USers
* find users similar to U (User) who rated the item (I)
* calcualte rating (R) based on ratings from previous step
* plotting a user's rating on a certain item - calculate distances between users
* format is basically a list of users (outer) with a list of ratings (inner) for movies
* euclidean dsitance 
* or consider the angle from the origin (cosine similarity)
* low angle - higher similarity/smaller distance
* high angle - lower similarity/larger distance
* cosine angle - ranges from 0-180, into -1 to 1
* cosine distance - higher value for higher angle (lower similarity)
* scipy spatial.distance.cosine(a,b)
* can normalize/adjust data to remove the individual user preferences - measuring the average of ratings and subtracting that from all ratings
* 'centred cosine' finding angle of adjusted vectors, done when there's a lot of missing values
* could impute missing values with average rating by each user - but must also adjust current ratings - to avoid bias/inaccuracies 

Calculating Ratings
* after finding similar users
* predict ratings as average rating of top n (5-10) users that are most similar
* could also do weighted average - with more weight to users with most similarity 

User-Based vs Item-Based Collaborative Filtering
* similar mathematically - just different conceptually
* User Based
    * rating for item - is found by picking out N users from similarity list who has rating the item and then calcualte ratings based on these N-ratings
* item-based
    * for item - with set of similar items based on rating vectors with item ratings by a user - who hasn't rated it
    * found by picking out N items with similarity with I that a user U has rated - and find rating based on these N ratings
* item based is ususally more stable and faster than user-based 
* library - Surprise

#### Model Based 
* reduce or compress large and sparse user-item matrix 
* using matrix factorization or autoencoders
* Matrix factorization - Dimensionality Reduction
    * break large matrix to smaller ones
    * e.g. factoring 12 to 4,3,2,6
    * reduce large sparse matrix into two matrices 
    * get similar item vectors (i,j)
    * gives latent factors about users/items
    * too many latent factors and recommendations become too specific/overfitted

Algorithms for Matrix Factorization
* SVD - singular value decomposition, PCA
* Autoencoders for Neural Networks

Using python
* numpy and scikit-surprise
* load dataset.load_builtin()
* can specify user_based, min_support 
* can also use recommender.algo
* can use GridSearch in combination with surprise
* using KNNWithMeans - which is similar to centeredCosine similarity
* can also use SVD


#### From Lecture
* recommender based on historical viewing of two users
* utility matrix - for every user (row) - column (items) that they have used or interacted with (e.g. columns of movies)
* cells of ratings, clicks, etc. can be combined as weighted average
* usually very sparse - with lots of missing/unknowns
* if no missing - job is done - just sort and recommend top
* collaborative approach essentially filling missing - things that they will 'enjoy' - and then compare ratings vs predicted ratings - to optimize
* other measure more important - is whether users actually engage with the recommendations - can only be observed with A/B testing 
* can have billions of users in a row, x thousands of columns for movies
* can have one utility matrix for each country, distribution of movies/users are different

Content Based Recommender
* no collaboration between content and users
* only looks at features of the content - what content is similar 
* can recommend even without lots of users
* can capture unique tastes of users - no interaction b/w users
* easier to explain why recommendations are the way they are
* can recommend new items - no priors from user
* Disadvantage
    * initial set up is expensive
    * feature acquisition for all catalogue
    * low diversity: hardly recommend an item outside of the user's preference
    * Cold start - no information about new users what to do

#### Collaborative - Memory Based

* Memory based collaborative 
    * similarities between users and items predicts new rating for an item
    * taking weighted average of ratings from the similar group
    * memorizes utility matrix, no modeling

* User-to-User
    * how similar is User 1 to everyone else
    * content consumed by U1 - compared to  other Users 
    * calculate similarity (cosine, euclidean distance) between vectors (e.g. users ratings from a bunch of movies)
    * rating for movie U1 hasn't consumed - based on weighted average of ratings with other Users ratings
    * based on user similarity - aggregated scores from items
    * combined item-to-item 
    * missing values found using: ~user1.isna()

* Item-to-Item
    * similar to content-based
    * but also takes into account user interaction with items
    * invented at amazon
    * taking the row vectors - downward across users - for a given movie
    * only row vectors where we actually have a value
    * different from content-based because it takes into consideration user ratings or inputs
    * rating for a movie (not seen) - weighted average rating of every users to similar movies
    * based on item similarity - with aggregated scores from users
    * usually combined with user-to-user

* can get recommendation from u-to-u and i-to-i and pick common recommended
* often used all together because - content-based/collaborative - especially for users that just started 
* viral content - not as good with collaborative - because no matter what 'cluster' a user is in - it should be recommended 
* MAE/RMSE of predicted scores might not always be accurate 
* will need to use A/B test paired with stats based hypothesis testing
* easier with users - but if not available - must use MAE/RMSE 

Diversity 
* rather than just taking top case similarity
* take some less similar suggestions within the 10 same ones
* exploration instead of exploitation

Novelty
* new or unknown items
* need balance here 
* popular items wont be surprising, but surprises might not be good 

Responsiveness
* how fast the system changes as new user/item interactions arrive
* recommendations can get stale 
* but more updates takes more resources
* persistence: how long to keep an item in the recommendations, not being clicked can be removed 

Non-algorithmic
* stimulating demand
* not always based on algorithms
* e.g. Stranger Things or new movies coming out 

Types of Data
* not always direct data (ratings, thumbs up)
* consumers don't usually use explicit data
* collected from your behaviour - implictly - mouse clicks, purchases, time spent doing something

#### Collaborative - Latent Factors
* actually models the data and provides approximation
* still uses ultility matrix
* but uses both - u-to-u and i-to-i similarity
* similar to PCA or SVD
* reducing the dimensions
* Movie ratings -> matrix factorization --> 1. user attributes 2. movie attributes
* multiplying user x movie attributes gives matrix factorization
* interpolate - missing in utility matrix imputed by multiplying, dot product U and M - getting a complete utility matrix 
* randomly initialize U and M -> then compare to target utility matrix (assuming all value filled in) -> gradient descent -> to get closer to target matrix
* get error - MSE - loss function, minimize by changing the latent variables
* missing values ignored in the cost-function calculation
* once optimized with values you do have - can then fill in missing values 

SVD (singular value decomp) example
* similar to PCA
* M - movies in rows with latent variables columns
* U - users in rows with latent variables in columns
* same latent variables in U and M