## Demo of svdRec, a Python3 module for Recommenders

Download the movielens dataset [here](http://files.grouplens.org/datasets/movielens/ml-20m.zip) 

** Before loading in the data, I highly recommend cutting out some of the "ratings.csv" file. It's 20M rows long and can take a long time to process. In bash you can do something like: **

`cat ratings.csv | head -100000 > ratings_small.csv` 

We'll also load in the movies.csv file to a DataFrame - this will act as a dictionary to translate between MovieID and Movie Titles.

In [3]:
import svdRec
import pandas as pd 

svd = svdRec.svdRec()
svd.load_csv_sparse('data/ml-20m/ratings_small.csv', delimiter=',', skiprows=1)
svd.SVD(num_dim=100)

Note: load_csv_sparse expects a csv in the format of: rowID, colID, Value, ...
Created matrix of shape:  (702, 128594)


In [4]:
movies = pd.read_table('data/ml-20m/movies.csv', sep=',',names = ['movieId',"Title","genres"])
movie_dict = {}
for i, row in movies.iterrows():
    movie_dict.update({row['movieId']: row['Title']})
svd.load_item_encoder(movie_dict)

In [5]:
MOVIE_ID = 3114 # Toy Story 2
for item in svd.get_similar_items(MOVIE_ID,show_similarity=True):
    print(item)
    print(svd.get_item_name(item[0]),'\n')

(3114, 0.12452773823527534)
Toy Story 2 (1999) 

(1, 0.096984857294616075)
Toy Story (1995) 

(2355, 0.043104443630874914)
Bug's Life, A (1998) 

(2987, 0.041949127538016995)
Who Framed Roger Rabbit? (1988) 

(6377, 0.040522854363774119)
Finding Nemo (2003) 



In [7]:
USERID=25
for item in svd.recommends_for_user(USERID, num_recom=5, show_similarity=False):
    print("ID: ", item)
    print("Actual Rating: ", svd.mat.toarray()[USERID][item])
    print("Title: ",svd.get_item_name(item),'\n')

ID:  356
Actual Rating:  5.0
Title:  Forrest Gump (1994) 

ID:  1961
Actual Rating:  0.0
Title:  Rain Man (1988) 

ID:  1270
Actual Rating:  0.0
Title:  Back to the Future (1985) 

ID:  1097
Actual Rating:  0.0
Title:  E.T. the Extra-Terrestrial (1982) 

ID:  1307
Actual Rating:  0.0
Title:  When Harry Met Sally... (1989) 



In [11]:
user_to_rec = 3
print("Items for User %s to check out based on similar user:\n"% user_to_rec, svd.recs_from_closest_user(user_to_rec))

User #3's most similar user is User #282 
Items for User 3 to check out based on similar user:
 [5, 9, 15, 24, 28, 31, 33, 35, 46, 49, 94, 109, 110, 140, 149, 164, 171, 184, 193, 197, 207, 230, 231, 234, 246, 259, 287, 289, 295, 299, 306, 307, 315, 317, 328, 336, 343, 348, 355, 356, 379, 433, 434, 453, 456, 467, 470, 479, 480, 491, 499, 506, 519, 526, 538, 540, 550, 579, 588, 589, 591, 592, 596, 607, 647, 664, 713, 719, 732, 735, 777, 779, 783, 785, 834, 857, 903, 911, 912, 922, 923, 952, 967, 1035, 1041, 1046, 1078, 1079, 1088, 1089, 1093, 1096, 1119, 1126, 1128, 1134, 1135, 1172, 1174, 1175, 1185, 1188, 1192, 1195, 1197, 1199, 1200, 1205, 1207, 1209, 1213, 1214, 1215, 1218, 1220, 1221, 1222, 1224, 1229, 1233, 1239, 1240, 1244, 1245, 1246, 1255, 1257, 1258, 1259, 1260, 1262, 1269, 1274, 1275, 1280, 1287, 1290, 1297, 1319, 1345, 1347, 1349, 1353, 1369, 1386, 1392, 1393, 1465, 1484, 1499, 1512, 1526, 1543, 1572, 1579, 1586, 1616, 1640, 1672, 1679, 1689, 1692, 1700, 1703, 1720, 1728, 173