# Recommender Systems with Surprise
- **Created by Andrés Segura Tinoco**
- **Created on May 23, 2019**

## Experiment description
- Model built from a plain text file
- The algorithm used is: KNNBasic
- Model trained using the technique of cross validation (5 folds)
- The RMSE and MAE metrics were used to estimate the model error
- Type of filtering: collaborative

In [1]:
# Load the Pandas libraries
import os
import io

In [2]:
# Load Surprise libraries
from surprise import KNNBasic
from surprise import Reader
from surprise import Dataset
from surprise.model_selection import cross_validate

## 1. Loading data

In [3]:
# Path to dataset file
file_path = os.path.expanduser('../data/u.data')

In [4]:
# Read the data into a Surprise dataset
reader = Reader(line_format = 'user item rating timestamp', sep = '\t', rating_scale = (1, 5))
data = Dataset.load_from_file(file_path, reader = reader)

## 2. Train the model and measure its error

In [5]:
# Use k-NN inspired algorithms
kk = 50
algo = KNNBasic(k = kk, verbose = True)

In [6]:
# Run 5-fold cross-validation and print results
cv = cross_validate(algo, data, measures = ['RMSE', 'MAE'], cv = 5, verbose = True)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9797  0.9879  0.9828  0.9768  0.9810  0.9816  0.0037  
MAE (testset)     0.7737  0.7785  0.7783  0.7707  0.7777  0.7758  0.0031  
Fit time          0.68    0.67    0.67    0.85    0.82    0.74    0.08    
Test time         4.59    5.29    4.98    5.43    5.68    5.19    0.38    


## 3. Make predictions

In [7]:
# Without real rating
p1 = algo.predict(uid = '13', iid = '181', verbose = True)

user: 13         item: 181        r_ui = None   est = 4.30   {'actual_k': 50, 'was_impossible': False}


In [8]:
# With real rating
p2 = algo.predict(uid = '196', iid = '302', r_ui = 4, verbose = True)

user: 196        item: 302        r_ui = 4.00   est = 4.21   {'actual_k': 50, 'was_impossible': False}


## 4. Get the k nearest neighbors of a item

In [9]:
# Read the u.item file from MovieLens 100-k dataset and return two
#    mappings to convert raw ids into movie names and movie names into raw ids
def read_item_names(file_path):
    rid_to_name = {}
    name_to_rid = {}
    
    with io.open(file_path, 'r', encoding = 'ISO-8859-1') as f:
        for line in f:
            line = line.split('|')
            rid_to_name[line[0]] = line[1]
            name_to_rid[line[1]] = line[0]
    
    return rid_to_name, name_to_rid

In [10]:
# Read the mappings raw id <-> movie name
item_filepath = '../data/u.item'
rid_to_name, name_to_rid = read_item_names(item_filepath)

In [11]:
# Retrieve inner id of the movie Toy Story
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
print('Toy Story (1995):', toy_story_inner_id)

Toy Story (1995): 283


In [12]:
# Retrieve inner ids of the nearest neighbors of Toy Story
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k = 10)
toy_story_neighbors

[142, 166, 214, 254, 417, 458, 567, 597, 636, 641]

In [13]:
# The 10 nearest neighbors of Toy Story are:
for inner_id in toy_story_neighbors:
    raw_id = algo.trainset.to_raw_iid(inner_id)
    movie = rid_to_name[raw_id]
    print(raw_id, '-', movie)

346 - Jackie Brown (1997)
416 - Old Yeller (1957)
90 - So I Married an Axe Murderer (1993)
298 - Face/Off (1997)
8 - Babe (1995)
193 - Right Stuff, The (1983)
485 - My Fair Lady (1964)
223 - Sling Blade (1996)
630 - Great Race, The (1965)
99 - Snow White and the Seven Dwarfs (1937)


---
<a href="https://ansegura7.github.io/RS_Surprise/">&laquo; Home</a>