# Recommendation engine
Mathematics and Methods in Machine Learning and Neural Networks<br>
Jori Nordlund, Simo Ojala ja Esa Ryömä<br>
Helsinki Metropolia University of Applied sciences<br>
12.03.2020

## Assignment

Load the public domain Anime dataset either from the original location (https://www.kaggle.com/CooperUnion/anime-recommendations-database/version/1) or from the Methods/Data/Anime folder in the course’s Oma workspace.
In either case, get acquainted with the data description at the website.
This assignment is of exploratory nature. Your task is to explore the applicability of scikitsurprise in building a recommendation engine for the Anime dataset.
The questions of interest include:
1. What kind of preprocessing is necessary for the ratings dataset?
2. How do the recommendation algorithms (e.g. KNN and SVD) perform with a data set of this magnitude? Do you encounter hardware limitations? If yes, how can you circumvent some of the limitations to be able to carry on with the experiment?
3. Can you combine the information in the two files in a meaningful way to have the recommender display the titles of the recommended movies?


## Imports

In [1]:
import pandas as pd
import numpy as np
import time
from collections import defaultdict, OrderedDict
from surprise import Reader
from surprise import KNNBasic
from surprise import Dataset

## Data

In [2]:
#df = pd.read_csv(r'http://users.metropolia.fi/~simooj/rating_sample.csv', sep=',')
df = pd.read_csv(r'http://users.metropolia.fi/~simooj/rating_lite.csv', sep=',')
df_animu = pd.read_csv(r'http://users.metropolia.fi/~simooj/anime.csv', sep=',')

df.head()



Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [3]:
df_animu.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


There were ratings of -1 in the dataset. They signified that a user has watched the anime, but has not rated it.
We removed all -1 ratings with the line:

In [4]:
df = df.drop(df[df.rating == -1].index)


In [5]:
df.tail()

Unnamed: 0,user_id,anime_id,rating
101109,1037,25649,10
101110,1037,25681,10
101111,1037,27899,7
101112,1037,28223,9
101113,1037,30240,10


In [6]:
df.shape

(81047, 3)

In [7]:
df.describe()

Unnamed: 0,user_id,anime_id,rating
count,81047.0,81047.0,81047.0
mean,537.515084,10846.818241,7.860106
std,289.439584,9083.065924,1.535608
min,1.0,1.0,1.0
25%,294.0,2447.5,7.0
50%,541.0,9731.0,8.0
75%,783.0,16742.0,9.0
max,1037.0,34240.0,10.0


## Algorithm

In [8]:
# Construct reader
reader = Reader(rating_scale=(1, 10))

# Generate surprise Dataset
data = Dataset.load_from_df(df[['user_id', 'anime_id', 'rating']], reader)

In [9]:
%%time
# Set all data as training set
trainset = data.build_full_trainset()

# Build and train an algorithm.

sim_options = {
    'user_based': True,  # compute  similarities between items
    'min_support': 7
}

algo = KNNBasic(sim_options=sim_options)
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.
Wall time: 438 ms


<surprise.prediction_algorithms.knns.KNNBasic at 0x1eeb7a87a88>

## Predictions (recommendations)

In [10]:
# Testing the prediction for the rating of certain anime for a user 
user_id = 1;
anime_id = 11617;

pred = algo.predict(user_id, anime_id, verbose=True)

user: 1          item: 11617      r_ui = None   est = 10.00   {'actual_k': 1, 'was_impossible': False}


In [11]:
%%time
testset = trainset.build_anti_testset()
predictions = algo.test(testset)

Wall time: 1min 57s


In [12]:
# This block copied from Surprise documentation at
# http://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user

def get_top_n(predictions, n=3): # n = 3 so we get top 3 recommendations for each user
    '''Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    '''

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

top_n = get_top_n(predictions)

counts = {}
# Print the recommended items for each user
for uid, user_ratings in top_n.items():
    #print([iid for (iid, _) in user_ratings])
    recommendations = [iid for (iid, _) in user_ratings]
    recName = []
    
    for animu in recommendations:
        name = df_animu.loc[df_animu['anime_id'] == animu].values[0,1]
        recName.append(name)
        if name in counts:
            counts[name] += 1
        else:
            counts[name] = 1
        
    print(uid, recName) # Prints the top 3 recommendations for each user
    
    
    
    #print(uid, [iid for (iid, _) in user_ratings])

1 ['Kuroko no Basket', 'Naruto', 'Shaman King']
2 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
3 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage']
5 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
7 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
8 ['There She Is!!', 'New Prince of Tennis Specials', 'Kinnikuman II Sei']
9 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
10 ['Highschool of the Dead', 'High School DxD', 'High School DxD New']
11 ['Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)', 'Queen&#039;s Blade OVA Specials']
12 ['Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Kono Danshi, Ningyo Hiroimashita.', 'Gintama: Shinyaku Benizakura-hen']
14 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
15 ['Highschool of the Dead', 'High School DxD', 'Sword Art Onl

165 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
166 ['Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage', 'Uchuu Kyoudai: Number Zero']
167 ['Wangan Midnight', 'Gokusen', 'Shangri-La']
168 ['Miss Monochrome: The Animation 3', 'Bakugan Battle Brawlers: New Vestroia', 'Pokemon Black and White 2: Introduction Movie']
169 ['Muumindani no Suisei', 'Uchuu Kyoudai: Number Zero', 'Gakkou no Kaidan: Kubinashi Rider!! Shi no Noroi']
170 ['Muumindani no Suisei', 'Uchuu Kyoudai: Number Zero', 'Gakkou no Kaidan: Kubinashi Rider!! Shi no Noroi']
171 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
172 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
173 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
174 ['Fullmetal Alchemist: Brotherhood', 'Hoshi wo Ou Kodomo', 'God Eater']
175 ['Arcana Famiglia: Capriccio - s

327 ['School Rumble Ichi Gakki Hoshuu', 'Muumindani no Suisei', 'Bakemono no Ko']
329 ['Tsubasa: Shunraiki', 'New Prince of Tennis Specials', 'Haibane Renmei']
330 ['Saint Seiya: Meiou Hades Meikai-hen', 'Saint Seiya: Meiou Hades Elysion-hen', 'Azumanga Web Daioh']
331 ['Muumindani no Suisei', 'Hanasakeru Seishounen', 'Nerima Daikon Brothers']
332 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
333 ['High School DxD', 'Sword Art Online', 'High School DxD New']
334 ['New Prince of Tennis', 'Hagure Yuusha no Aesthetica: Hajirai Ippai', 'Rewrite']
335 ['Angel Densetsu', 'Mikakunin de Shinkoukei: Mite. Are ga Watashitachi no Tomatteiru Ryokan yo.', 'Rewrite']
336 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
337 ['Miss Monochrome: The Animation 3', 'Bakugan Battle Brawlers: New Vestroia', 'Interstella5555: The 5tory of The 5ecret 5tar 5ystem']
338 ['Highschool of the Dead', 'High School DxD',

494 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage']
495 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
496 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
497 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
499 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
500 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
501 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
502 ['Bakugan Battle Brawlers: New Vestroia', 'Ane to Boin', 'Uchuu Kyoudai: Number Zero']
503 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
504 ['Highschool of the Dead', 'High School DxD', 'High School DxD New']
505 ['Bakugan Battle Brawlers: New Vestroia', 'Uchuu Kyoudai: Number Zero'

660 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
661 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
662 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
663 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage']
664 ['Bokusatsu Tenshi Dokuro-chan 2', 'Death Note Rewrite', 'Mikakunin de Shinkoukei: Mite. Are ga Watashitachi no Tomatteiru Ryokan yo.']
665 ['Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Gintama: Nanigoto mo Saiyo ga Kanjin nano de Tasho Senobisuru Kurai ga Choudoyoi', 'New Prince of Tennis Specials']
666 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
667 ['Photokano', 'Kuroshitsuji: Book of Murder', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de']
668 ['Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Gintama: Shinyaku Benizakura-hen', 'Uchuu Kyoudai: Number Ze

822 ['High School DxD', 'Sword Art Online', 'High School DxD New']
823 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
824 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
825 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
826 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
827 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
828 ['Highschool of the Dead', 'High School DxD', 'High School DxD New']
829 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage']
830 ['Log Horizon Recap', 'Ane to Boin', 'Doukyuusei (Movie)']
831 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
832 ['Digimon Adventure', 'Gintama Movie: Kanketsu-hen - Yorozuya yo Eien Nare', 'Clannad']
833 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja S

991 ['Uchuu Kyoudai: Number Zero', 'Gakkou no Kaidan: Kubinashi Rider!! Shi no Noroi', 'Bishoujo Senshi Sailor Moon R: Make Up! Sailor Senshi']
992 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Wangan Midnight']
993 ['Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage', 'Uchuu Kyoudai: Number Zero']
994 ['Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage', 'Uchuu Kyoudai: Number Zero']
995 ['Muumindani no Suisei', 'Yoroiden Samurai Troopers', 'Mahou Shoujo Pretty Sammy (1996)']
996 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Uchuu Kyoudai: Number Zero']
997 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de', 'Ninja Senshi Tobikage']
998 ['Highschool of the Dead', 'High School DxD', 'Sword Art Online']
999 ['Muumindani no Suisei', 'Bakugan Battle Brawlers: New Vestroia', 'Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de']
1000 ['Muumindani no Suisei', 'Oyakodon: Oppai Tokumo

In [13]:
animu = df_animu.loc[df_animu['anime_id'] == 2313]
animu = animu.astype('str')
print(animu.values[0,1])


Muumindani no Suisei


## Series and ratings

In [14]:
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value
# This link was helpful when trying to arrange the dictionary based on the values
sorted_x = sorted(counts.items(), key=lambda kv: kv[1], reverse=True)
sorted_dict = OrderedDict(sorted_x)

### Printing the series and their amount of ratings in descending order

In [26]:
for i in sorted_dict:
    print("%-80s : %3d" % (i,sorted_dict.get(i)))

Muumindani no Suisei                                                             : 476
Oyakodon: Oppai Tokumori Bonyuu Tsuyudaku de                                     : 316
Uchuu Kyoudai: Number Zero                                                       : 205
Ninja Senshi Tobikage                                                            : 189
High School DxD                                                                  : 165
Highschool of the Dead                                                           : 161
Sword Art Online                                                                 : 151
Yoroiden Samurai Troopers                                                        : 151
Mahou Shoujo Pretty Sammy (1996)                                                 : 151
Bakugan Battle Brawlers: New Vestroia                                            :  88
Gokusen                                                                          :  72
Gakkou no Kaidan: Kubinashi Rider!! Shi no 

# Answers

1. We removed ratings with value of "-1" from the data since it meant the user had watched the anime but had not given it a rating.
2. We had to reduce the rating dataset to approximately 1/70 of its original size,
   because it was too large to work with.
3. We used the two datasets so that the recommendations for each user would show the anime names. This was done with the anime_id value and getting the name of the anime from df_anime dataframe.