## Demo of svdRec, a Python3 module for Recommenders

Download the movielens dataset [here](http://files.grouplens.org/datasets/movielens/ml-20m.zip) 

** Before loading in the data, I highly recommend cutting out some of the "ratings.csv" file. It's 20M rows long and can take a long time to process. In bash you can do something like: **

`cat ratings.csv | head -100000 > ratings_small.csv` 

We'll also load in the movies.csv file to a DataFrame - this will act as a dictionary to translate between MovieID and Movie Titles.

In [1]:
from svdRec import svdRec
import pandas as pd 

svd = svdRec.svdRec()
svd.load_csv_sparse('data/ml-20m/ratings_small.csv', delimiter=',', skiprows=1)
svd.SVD(num_dim=100)

Note: load_csv_sparse expects a csv in the format of: rowID, colID, Value, ...
Created matrix of shape:  (702, 128594)


In [2]:
movies = pd.read_table('data/ml-20m/movies.csv', sep=',',names = ['movieId',"Title","genres"])
movie_dict = {}
for i, row in movies.iterrows():
    movie_dict.update({row['movieId']: row['Title']})
svd.load_item_encoder(movie_dict)

In [3]:
MOVIE_ID = 3114 # Toy Story 2
for item in svd.get_similar_items(MOVIE_ID,show_similarity=True):
    print(item)
    print(svd.get_item_name(item[0]),'\n')

(3114, 0.12452773823527524)
Toy Story 2 (1999) 

(1, 0.096984857294616089)
Toy Story (1995) 

(2355, 0.043104443630875205)
Bug's Life, A (1998) 

(2987, 0.041949127538017023)
Who Framed Roger Rabbit? (1988) 

(6377, 0.040522854363774369)
Finding Nemo (2003) 



In [6]:
USERID=25
for item in svd.recommends_for_user(USERID, num_recom=5, show_similarity=False):
    print("ID: ", item)
    print("Actual Rating: ", svd.mat.toarray()[USERID][item])
    print("Title: ",svd.get_item_name(item),'\n')

ID:  356
Actual Rating:  5.0
Title:  Forrest Gump (1994) 

ID:  1961
Actual Rating:  0.0
Title:  Rain Man (1988) 

ID:  1270
Actual Rating:  0.0
Title:  Back to the Future (1985) 

ID:  1097
Actual Rating:  0.0
Title:  E.T. the Extra-Terrestrial (1982) 

ID:  1307
Actual Rating:  0.0
Title:  When Harry Met Sally... (1989) 



In [9]:
user_to_rec = 3
print("Items for User %s to check out based on similar user:\n"% user_to_rec, svd.recs_from_closest_user(user_to_rec,num_users=2))

User #3's most similar user is User #[282, 488] 
Items for User 3 to check out based on similar user:
 [0, 1, 5, 4104, 9, 2057, 6153, 6154, 15, 16, 18, 20, 24, 28, 31, 33, 2081, 35, 38, 46, 49, 57, 2107, 61, 4160, 4161, 2114, 6217, 86, 2141, 94, 2151, 30824, 109, 110, 2158, 116, 6265, 2173, 4224, 4231, 6280, 2187, 140, 139, 4239, 149, 4245, 4247, 152, 6293, 6296, 156, 160, 164, 8359, 8360, 171, 172, 8372, 2230, 184, 6332, 8382, 193, 197, 4298, 207, 4305, 4307, 2268, 6364, 6366, 2272, 4320, 6372, 230, 231, 4328, 4326, 234, 6376, 240, 241, 2290, 2288, 4339, 2293, 246, 4342, 4343, 6385, 6391, 2299, 2301, 255, 2305, 259, 261, 2312, 4360, 6409, 2317, 4366, 271, 2320, 4367, 4368, 2323, 276, 4369, 6415, 2328, 6428, 2334, 287, 2336, 289, 4385, 291, 294, 295, 299, 2352, 306, 307, 2354, 4405, 315, 317, 318, 328, 2383, 336, 338, 2389, 343, 2392, 347, 348, 349, 2395, 4445, 4446, 2401, 354, 355, 356, 2405, 6497, 6502, 360, 361, 363, 366, 369, 373, 376, 2426, 379, 6533, 4488, 6536, 6538, 6549, 2454,