# Recommendation system for Animes

👉 This study will examine the dataset named as **"Anime Recommendations Database"** under the 'water-potability' file at Kaggle website [external link text](https://www.kaggle.com/datasets/CooperUnion/anime-recommendations-database).

The data contains information on user preference data from 73,516 users on 12,294 anime.Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.

**Features are belog two dataset anime.csv and ratings.csv**;

---**Anime.csv**

**anime_id** - myanimelist.net's unique id identifying an anime.

**name** - full name of anime.

**genre** - comma separated list of genres for this anime.

**type** - movie, TV, OVA, etc.

**episodes** - how many episodes in this show. (1 if movie).

**rating** - average rating out of 10 for this anime.

**members** - number of community members that are in this anime's
"group".

---**rating.csv**

**user_id**- non identifiable randomly generated user id.

**anime_id** - the anime that this user has rated.

**rating**- rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).

## Popularity based recommendation

In [1]:
import pandas as pd
import numpy as np

In [None]:
ratings=pd.read_csv('anime.csv')
ratings.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [None]:
ratings['members'].unique

<bound method Series.unique of 0        200630
1        793665
2        114262
3        673572
4        151266
          ...  
12289       211
12290       183
12291       219
12292       175
12293       142
Name: members, Length: 12294, dtype: int64>

In [None]:
movies=pd.read_csv('rating.csv')
movies.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [None]:
ratings=pd.merge(movies,ratings)




In [None]:
ratings.head()

Unnamed: 0,user_id,anime_id,rating,name,genre,type,episodes,members
0,17,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
1,3479,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
2,9584,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
3,9867,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
4,10557,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680


In [None]:
ratings.name.nunique(), ratings.user_id.nunique()

(83, 5762)

In [None]:
movie_grouped=ratings.groupby('name').agg({'rating':[np.size,np.sum,np.mean]})

In [None]:
#I will recommend popular movies that received good reviews from many people.So I want to being high both size and mean.

movie_grouped 

Unnamed: 0_level_0,rating,rating,rating
Unnamed: 0_level_1,size,sum,mean
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
001,1,4,4.0
Accelerando: Datenshi-tachi no Sasayaki,36,252,7.0
Alps no Shoujo Heidi (1979),14,98,7.0
Aoi Blink,12,84,7.0
Aoi Sekai no Chuushin de,158,948,6.0
...,...,...,...
Uba,16,96,6.0
Vicious,8,48,6.0
Wellber no Monogatari: Sisters of Wellber Zwei,34,238,7.0
Xiao Yeyou,1,4,4.0


In [None]:
popular_movies=movie_grouped.sort_values(('rating','mean'),ascending=False)

In [None]:
popular_movies.head()

Unnamed: 0_level_0,rating,rating,rating
Unnamed: 0_level_1,size,sum,mean
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Kamisama Hajimemashita OVA,251,2008,8.0
Maria-sama ga Miteru 3rd,138,1104,8.0
Mobile Suit Gundam Thunderbolt,97,776,8.0
City Hunter,130,1040,8.0
Gekkan Shoujo Nozaki-kun Specials,501,4008,8.0


In [None]:
grouped_sum=movie_grouped['rating']['sum'].sum()
grouped_sum

53667

In [None]:
#We look at the popularity of the movie, how many of the total comments it received.

popular_movies['percentage']=popular_movies['rating','sum']/[grouped_sum]*100 

In [None]:
popular_movies.head()

Unnamed: 0_level_0,rating,rating,rating,percentage
Unnamed: 0_level_1,size,sum,mean,Unnamed: 4_level_1
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Kamisama Hajimemashita OVA,251,2008,8.0,3.741592
Maria-sama ga Miteru 3rd,138,1104,8.0,2.05713
Mobile Suit Gundam Thunderbolt,97,776,8.0,1.445954
City Hunter,130,1040,8.0,1.937876
Gekkan Shoujo Nozaki-kun Specials,501,4008,8.0,7.468277


In [None]:
#We will put the highest rating from those who receive the most comments in the suggestions.
popular_movies.sort_values(('percentage'),ascending=False)

Unnamed: 0_level_0,rating,rating,rating,percentage
Unnamed: 0_level_1,size,sum,mean,Unnamed: 4_level_1
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Little Witch Academia,1208,9664,8.0,18.007342
Sword Art Online: Extra Edition,1145,8015,7.0,14.934690
Gekkan Shoujo Nozaki-kun Specials,501,4008,8.0,7.468277
Detective Conan Movie 02: The Fourteenth Target,330,2640,8.0,4.919224
Cossette no Shouzou,371,2597,7.0,4.839100
...,...,...,...,...
Shirouto Kurabu,1,5,5.0,0.009317
Ichigeki Sacchuu!! Hoihoi-san: Legacy,1,5,5.0,0.009317
Xiao Yeyou,1,4,4.0,0.007453
001,1,4,4.0,0.007453


In [None]:
popular_movies=popular_movies.sort_values(('percentage'),ascending=False)

In [None]:
popular_movies['Rank']=popular_movies['percentage'].rank(ascending=False,method='min')

In [None]:
popular_movies.head(20)

Unnamed: 0_level_0,rating,rating,rating,percentage,Rank
Unnamed: 0_level_1,size,sum,mean,Unnamed: 4_level_1,Unnamed: 5_level_1
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Little Witch Academia,1208,9664,8.0,18.007342,1.0
Sword Art Online: Extra Edition,1145,8015,7.0,14.93469,2.0
Gekkan Shoujo Nozaki-kun Specials,501,4008,8.0,7.468277,3.0
Detective Conan Movie 02: The Fourteenth Target,330,2640,8.0,4.919224,4.0
Cossette no Shouzou,371,2597,7.0,4.8391,5.0
Samurai Flamenco,323,2261,7.0,4.213017,6.0
Kamisama Hajimemashita OVA,251,2008,8.0,3.741592,7.0
Recorder to Randoseru Re♪,262,1834,7.0,3.41737,8.0
Schwarzesmarken,221,1547,7.0,2.882591,9.0
Comet Lucifer,234,1404,6.0,2.616133,10.0


In [None]:
ratings

Unnamed: 0,user_id,anime_id,rating,name,genre,type,episodes,members
0,17,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
1,3479,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
2,9584,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
3,9867,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
4,10557,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
...,...,...,...,...,...,...,...,...
7397,53465,31699,6,Good Morning!!! Doronjo,"Comedy, Parody",TV,225,135
7398,53492,31177,4,Yakyuubu Aruaru,Sports,TV,3,149
7399,53935,2691,6,Gensei Shugoshin P-hyoro Ikka,"Adventure, Fantasy",Movie,1,150
7400,65836,24275,6,Medamayaki no Kimi Itsu Tsubusu?,"Comedy, Seinen",Special,4,712


## Finding distance of KNN

In [None]:
movieProperties=ratings.groupby('anime_id').agg({'rating': [np.size, np.mean]})
movieProperties.head()

Unnamed: 0_level_0,rating,rating
Unnamed: 0_level_1,size,mean
anime_id,Unnamed: 1_level_2,Unnamed: 2_level_2
124,125,7.0
429,134,8.0
514,371,7.0
750,49,7.0
780,330,8.0


In [None]:
movieNumRatings=pd.DataFrame(movieProperties['rating']['size'])
movieNormalizedNumRatings=movieNumRatings.apply(lambda x: (x-np.min(x))/(np.max(x)-np.min(x)))
movieNormalizedNumRatings.head()

Unnamed: 0_level_0,size
anime_id,Unnamed: 1_level_1
124,0.102734
429,0.110191
514,0.306545
750,0.039768
780,0.272577


In [None]:
movieNumRatings2=pd.DataFrame(movieProperties['rating']['mean'])
movieNumRatings2.head()

Unnamed: 0_level_0,mean
anime_id,Unnamed: 1_level_1
124,7.0
429,8.0
514,7.0
750,7.0
780,8.0


# I will examine them according to categories of values, we will look at their similarities.

In [None]:
#df=pd.read_csv('anime.csv', sep=',', encoding='latin-1') #I synchronized the old data frame so that the changes I made on it are not lost.
df=ratings
df.head()

Unnamed: 0,user_id,anime_id,rating,name,genre,type,episodes,members
0,17,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
1,3479,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
2,9584,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
3,9867,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680
4,10557,3287,2,Tenkuu Danzai Skelter+Heaven,"Mecha, Sci-Fi",OVA,1,7680


In [None]:
df = df.join(df.pop('genre').str.get_dummies(',')) #get dummies only "genre".



In [None]:
df

Unnamed: 0,user_id,anime_id,rating,name,type,episodes,members,Adventure,Comedy,Demons,...,Hentai,Historical,Kids,Mecha,Music,Mystery,Parody,Psychological,School,Sports
0,17,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,3479,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,9584,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
3,9867,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
4,10557,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7397,53465,31699,6,Good Morning!!! Doronjo,TV,225,135,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7398,53492,31177,4,Yakyuubu Aruaru,TV,3,149,0,0,0,...,0,0,0,0,0,0,0,0,0,1
7399,53935,2691,6,Gensei Shugoshin P-hyoro Ikka,Movie,1,150,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7400,65836,24275,6,Medamayaki no Kimi Itsu Tsubusu?,Special,4,712,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
df.shape

(7402, 60)

In [None]:
df.info()   # there are duplicate columns.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7402 entries, 0 to 7401
Data columns (total 60 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   user_id         7402 non-null   int64 
 1   anime_id        7402 non-null   int64 
 2   rating          7402 non-null   int64 
 3   name            7402 non-null   object
 4   type            7402 non-null   object
 5   episodes        7402 non-null   object
 6   members         7402 non-null   int64 
 7    Adventure      7402 non-null   int64 
 8    Comedy         7402 non-null   int64 
 9    Demons         7402 non-null   int64 
 10   Drama          7402 non-null   int64 
 11   Ecchi          7402 non-null   int64 
 12   Fantasy        7402 non-null   int64 
 13   Game           7402 non-null   int64 
 14   Hentai         7402 non-null   int64 
 15   Historical     7402 non-null   int64 
 16   Horror         7402 non-null   int64 
 17   Josei          7402 non-null   int64 
 18   Kids   

In [None]:
df.head()

Unnamed: 0,user_id,anime_id,rating,name,type,episodes,members,Adventure,Comedy,Demons,...,Hentai,Historical,Kids,Mecha,Music,Mystery,Parody,Psychological,School,Sports
0,17,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,3479,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,9584,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
3,9867,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0
4,10557,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [None]:
df.columns = df.columns.str.replace(' ', '') # clear the spaces in column names.

In [None]:
# Compare columns with the same name
columns = df.columns
duplicated_columns = set([x for x in columns if columns.tolist().count(x) > 1])

# compare and combine values
for column in duplicated_columns:
    cols = [col for col in df.columns if column in col]
    df[cols[0]] = df[cols].sum(axis=1)
    df = df.loc[:,~df.columns.duplicated()]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[cols[0]] = df[cols].sum(axis=1)


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7402 entries, 0 to 7401
Data columns (total 45 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   user_id        7402 non-null   int64 
 1   anime_id       7402 non-null   int64 
 2   rating         7402 non-null   int64 
 3   name           7402 non-null   object
 4   type           7402 non-null   object
 5   episodes       7402 non-null   object
 6   members        7402 non-null   int64 
 7   Adventure      7402 non-null   int64 
 8   Comedy         7402 non-null   int64 
 9   Demons         7402 non-null   int64 
 10  Drama          7402 non-null   int64 
 11  Ecchi          7402 non-null   int64 
 12  Fantasy        7402 non-null   int64 
 13  Game           7402 non-null   int64 
 14  Hentai         7402 non-null   int64 
 15  Historical     7402 non-null   int64 
 16  Horror         7402 non-null   int64 
 17  Josei          7402 non-null   int64 
 18  Kids           7402 non-null

In [None]:
df

Unnamed: 0,user_id,anime_id,rating,name,type,episodes,members,Adventure,Comedy,Demons,...,Shounen,SliceofLife,Space,Sports,SuperPower,Supernatural,Yaoi,Yuri,Action,Harem
0,17,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3479,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,9584,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,9867,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,10557,3287,2,Tenkuu Danzai Skelter+Heaven,OVA,1,7680,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7397,53465,31699,6,Good Morning!!! Doronjo,TV,225,135,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7398,53492,31177,4,Yakyuubu Aruaru,TV,3,149,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7399,53935,2691,6,Gensei Shugoshin P-hyoro Ikka,Movie,1,150,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7400,65836,24275,6,Medamayaki no Kimi Itsu Tsubusu?,Special,4,712,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
#We took the movie, ID, genre and created a dictionary like a vector,
#then I'll look at the cosinus distance between them so we find K nearest.

movieDict = {}
for index, row in df.iterrows():
    movieID = int(row['anime_id'])
    name = row['name']
    genres = row[6:50]
    genres = list(map(int,genres))
    movieDict[movieID] = (name, genres,movieNormalizedNumRatings.loc[movieID].get('size'),movieProperties.loc[movieID].rating.get('mean'))



In [None]:
movieDict

{3287: ('Tenkuu Danzai Skelter+Heaven',
  [7680,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0],
  0.03231151615575808,
  2.0),
 827: ('Uba',
  [2437,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0],
  0.012427506213753107,
  6.0),
 31318: ('Comet Lucifer',
  [66659,
   1,
   0,
   0,
   0,
   0,
   2,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   0],
  0.19304059652029826,
  6.0),
 20021: ('Sword Art Online: Extra Edition',
  [121722,
   1,
   0,
   0,
   0,
   0,
   2,
   1,
   0,
   0,
  

In [None]:
from scipy import spatial
def ComputeDistance (a,b):
    genresA=a[1]
    genresB=b[1]
    genreDistance=spatial.distance.cosine(genresA,genresB)
    popularityA=a[2]
    popularityB=b[2]
    popularityDistance=abs(popularityA-popularityB)
    return genreDistance+popularityDistance

In [None]:
ComputeDistance(movieDict[18661],movieDict[27633])

  dist = 1.0 - uv / np.sqrt(uu * vv)


0.2071251035625518

In [None]:
import operator
def getNeighbors(movieID,K):
    distances=[]
    for movie in movieDict:
        if(movie!=movieID):
            dist=ComputeDistance(movieDict[movieID],movieDict[movie])
            distances.append((movie,dist))
    distances.sort(key=operator.itemgetter(1))
    neighbors=[]
    for x in range(K):
        neighbors.append(distances[x][0])
    return neighbors
K=30
avgRating=0
neighbors=getNeighbors(27633,K)
for neighbor in neighbors:
    avgRating+= movieDict[neighbor][3]
    print(movieDict[neighbor][0]+" "+str(movieDict[neighbor][3]))

Cossette no Shouzou 7.0
Detective Conan Movie 02: The Fourteenth Target 8.0
Samurai Flamenco 7.0
Recorder to Randoseru Re♪ 7.0
Kamisama Hajimemashita OVA 8.0
Comet Lucifer 6.0
Aoi Sekai no Chuushin de 6.0
Shin Angyo Onshi 7.0
Super Seisyun Brothers 7.0
Maria-sama ga Miteru 3rd 8.0
Sex Pistols 7.0
Kaleido Star: Legend of Phoenix - Layla Hamilton Monogatari 8.0
City Hunter 8.0
Fushigi Yuugi: Eikouden 7.0
Rio: Rainbow Gate! 6.0
Sekai de Ichiban Tsuyoku Naritai! 6.0
Mobile Suit Gundam Thunderbolt 8.0
Mobile Suit Gundam MS IGLOO: The Hidden One Year War 7.0
Cheer Danshi!! 7.0
Fushigi no Kuni no Miyuki-chan 6.0
The Everlasting Guilty Crown 8.0
Giant Robo the Animation: Chikyuu ga Seishi Suru Hi 8.0
Buzzer Beater 2nd Season 7.0
Overman King Gainer 7.0
Binchou-tan 7.0
Hetalia: The Beautiful World Specials 8.0
Kishin Taisen Gigantic Formula 7.0
Tenkuu Danzai Skelter+Heaven 2.0
Brothers Conflict OVA 7.0
Accelerando: Datenshi-tachi no Sasayaki 7.0


In [2]:
# Higher rates means higher similarity of the movie in question.
# Accoding to these higher rates model selects and suggests to customer.