# Demo of [svdRec](https://github.com/ZWMiller/svdRec), a Python3 module for Recommenders

The `ratings.csv` file we'll use is 20M rows and can take a while to process. The following will download the file and process it in bash for you:

In [None]:
! mkdir -p ~/.surprise_data/ml-20m
! wget http://files.grouplens.org/datasets/movielens/ml-20m.zip -nc -P ~/.surprise_data/ml-20m
! unzip -n ~/.surprise_data/ml-20m/ml-20m.zip -d ~/.surprise_data/ml-20m
! cat ~/.surprise_data/ml-20m/ml-20m/ratings.csv | head -100000 > ~/.surprise_data/ml-20m/ml-20m/ratings_small.csv

In [None]:
! pip install svdRec -U

We'll also load in the movies.csv file to a DataFrame - this will act as a dictionary to translate between MovieID and Movie Titles.

In [None]:
from svdRec import svdRec
import pandas as pd
from pathlib import Path

svd = svdRec.svdRec()
svd.load_csv_sparse(Path.home()/'.surprise_data/ml-20m/ml-20m/ratings_small.csv', delimiter=',', skiprows=1) # update this path
svd.SVD(num_dim=100) # we create 100 latent features

In [None]:
# load and convert movie info to dictionary format
movies = pd.read_table('recommenders/ml-20m/movies.csv', sep=',',names = ['movieId',"Title","genres"])
movie_dict = {}
for i, row in movies.iterrows():
    movie_dict.update({row['movieId']: row['Title']})
svd.load_item_encoder(movie_dict)

In [None]:
MOVIE_ID = 3114 # Toy Story 2
for item in svd.get_similar_items(MOVIE_ID,show_similarity=True):
    print(item)
    print(svd.get_item_name(item[0]),'\n')

As expected, Toy Story 2 is most like itself based on the hidden dimensions we derive via *Collaborative Filtering*, or using human ratings to understand movie themes. The other recommendations make sense also.

In [None]:
# find the top recommendations for user 25

USERID=25
for item in svd.recommends_for_user(USERID, num_recom=5, show_similarity=False):
    print("ID: ", item)
    print("Actual Rating: ", svd.mat.toarray()[USERID][item])
    print("Title: ",svd.get_item_name(item),'\n')

Note that the "Actual Rating: 0.0" means the user has not seen or rated those movies. This also looks good!

In [None]:
# find user 3's most similar person
# generate list of movie IDs for user 3 to watch

user_to_rec = 3
print("Items for User %s to check out based on similar user:\n"% user_to_rec, svd.recs_from_closest_user(user_to_rec))

## Discussion:

- How could we recommend solely using Content-Based methods?
- Are there issues with this approach?
- What would a hybrid setup look like?
- What are the pros and cons of a hybrid approach over just Collaborative Filtering?