# <span style="color:orange"> **Anime 1: Recomendación de animes**

## <span style="color:#87CEEB"> **1.- Problema**

    Se solicita crear una herramienta que permita entregar recomendaciones sobre qué ver a los usuarios, basados en su historial de votación.

    Dado el problema planteado, es necesario calcular la similitud entre usuarios, por lo que las evaluaciones del dataframe anime no son necesarios.

## <span style="color:#87CEEB"> **2.- Aspectos computacionales**

### <span style="color:#87CEEB"> 2.1.- Librerías

In [1]:
# Ingesta
import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity

# Visualización
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.style.use("seaborn-v0_8-whitegrid")

# Otros
import warnings
warnings.filterwarnings("ignore")

## <span style="color:#87CEEB"> **3.- Exploración básica**

In [2]:
# importación de base de datos
anime_df = pd.read_csv('anime.csv', delimiter=';')
rating_df = pd.read_csv('rating.csv')

In [3]:
# Visualización simple de anime_df
display(anime_df.head())
print(anime_df.shape, '\n')
anime_df.info()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,937,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,926,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,925,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,917,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,916,151266


(12294, 7) 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   anime_id  12294 non-null  int64 
 1   name      12294 non-null  object
 2   genre     12247 non-null  object
 3   type      12269 non-null  object
 4   episodes  12294 non-null  object
 5   rating    12064 non-null  object
 6   members   12294 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 672.5+ KB


In [4]:
# Visualización simple de rating_df
display(rating_df.head())
print(rating_df.shape, '\n')
rating_df.info()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


(7813737, 3) 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7813737 entries, 0 to 7813736
Data columns (total 3 columns):
 #   Column    Dtype
---  ------    -----
 0   user_id   int64
 1   anime_id  int64
 2   rating    int64
dtypes: int64(3)
memory usage: 178.8 MB


In [5]:
# inspección básica discrepancias en texto
anime_df['name'].sort_values(ascending=True)

7749                                         &quot;0&quot;
8059     &quot;Aesop&quot; no Ohanashi yori: Ushi to Ka...
3156     &quot;Bungaku Shoujo&quot; Kyou no Oyatsu: Hat...
1436                    &quot;Bungaku Shoujo&quot; Memoire
1199                      &quot;Bungaku Shoujo&quot; Movie
                               ...                        
215                                           xxxHOLiC Rou
341                                      xxxHOLiC Shunmuki
8185                                               Üks Uks
10944                                              ēlDLIVE
7993                                                     ◯
Name: name, Length: 12294, dtype: object

In [6]:
# Reemplazar '&quot;' por '"' en la columna 'name'
anime_df['name'] = anime_df['name'].str.replace('&quot;', '"').str.replace('&amp;', '&')

anime_df['name'].sort_values(ascending=True)

7749                                                   "0"
8059     "Aesop" no Ohanashi yori: Ushi to Kaeru, Yokub...
3156             "Bungaku Shoujo" Kyou no Oyatsu: Hatsukoi
1436                              "Bungaku Shoujo" Memoire
1199                                "Bungaku Shoujo" Movie
                               ...                        
215                                           xxxHOLiC Rou
341                                      xxxHOLiC Shunmuki
8185                                               Üks Uks
10944                                              ēlDLIVE
7993                                                     ◯
Name: name, Length: 12294, dtype: object

##### <span style="color:orange"> Estadísticas descriptivas anime_df

In [7]:
anime_df.describe().round().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
anime_id,12294.0,14058.0,11455.0,1.0,3484.0,10260.0,24794.0,34527.0
members,12294.0,18071.0,54821.0,5.0,225.0,1550.0,9437.0,1013917.0


In [8]:
anime_df.describe(include=object).round().T

Unnamed: 0,count,unique,top,freq
name,12294,12292,Shi Wan Ge Leng Xiaohua,2
genre,12247,3271,Hentai,823
type,12269,6,TV,3787
episodes,12294,187,1,5677
rating,12064,598,600,141


##### <span style="color:orange"> Estadísticas descriptivas rating_df

In [9]:
rating_df.describe().round().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
user_id,7813737.0,36728.0,20998.0,1.0,18974.0,36791.0,54757.0,73516.0
anime_id,7813737.0,8909.0,8884.0,1.0,1240.0,6213.0,14093.0,34519.0
rating,7813737.0,6.0,4.0,-1.0,6.0,7.0,9.0,10.0


##### <span style="color:orange"> Cambio de nombre a variables rating

In [10]:
# Cambio de nombre de la columna 'rating' en anime_df a 'rating_global'
anime_df = anime_df.rename(columns={'rating': 'rating_global'})

# Cambio de nombre de la columna 'rating' en rating_df a 'rating_user'
rating_df = rating_df.rename(columns={'rating': 'rating_user'})

In [11]:
# Unión de dataframes en función de la columna 'anime_id'
rated_anime = anime_df.merge(rating_df, on='anime_id', how='inner') # 'inner' solo incluirá las filas con valores coincidentes en ambas columnas
rated_anime

Unnamed: 0,anime_id,name,genre,type,episodes,rating_global,members,user_id,rating_user
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,937,200630,99,5
1,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,937,200630,152,10
2,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,937,200630,244,10
3,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,937,200630,271,10
4,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,937,200630,278,-1
...,...,...,...,...,...,...,...,...,...
7813722,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,498,175,39532,-1
7813723,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,498,175,48766,-1
7813724,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,498,175,60365,4
7813725,26081,Yasuji no Pornorama: Yacchimae!!,Hentai,Movie,1,546,142,27364,-1


In [12]:
# extraigo las variables que me interesan
rated_anime = rated_anime[['user_id', 'anime_id', 'name', 'rating_user']]
rated_anime

Unnamed: 0,user_id,anime_id,name,rating_user
0,99,32281,Kimi no Na wa.,5
1,152,32281,Kimi no Na wa.,10
2,244,32281,Kimi no Na wa.,10
3,271,32281,Kimi no Na wa.,10
4,278,32281,Kimi no Na wa.,-1
...,...,...,...,...
7813722,39532,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,-1
7813723,48766,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,-1
7813724,60365,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,4
7813725,27364,26081,Yasuji no Pornorama: Yacchimae!!,-1


In [13]:
rated_anime.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7813727 entries, 0 to 7813726
Data columns (total 4 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   user_id      int64 
 1   anime_id     int64 
 2   name         object
 3   rating_user  int64 
dtypes: int64(3), object(1)
memory usage: 298.1+ MB


##### <span style="color:orange"> Filas que contienen valores nulos

In [14]:
rated_anime.isna().sum()

user_id        0
anime_id       0
name           0
rating_user    0
dtype: int64

##### <span style="color:orange"> Cantidad de observaciones y rating medio por anime

In [15]:
pivot_rated_anime= rated_anime.pivot_table(index='name', values=['rating_user'], aggfunc=[len, np.mean])
pivot_rated_anime

Unnamed: 0_level_0,len,mean
Unnamed: 0_level_1,rating_user,rating_user
name,Unnamed: 1_level_2,Unnamed: 2_level_2
"""0""",26,2.769231
"""Aesop"" no Ohanashi yori: Ushi to Kaeru, Yokubatta Inu",2,0.000000
"""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi",782,5.774936
"""Bungaku Shoujo"" Memoire",809,6.155748
"""Bungaku Shoujo"" Movie",1535,6.457980
...,...,...
xxxHOLiC Kei,3413,6.720774
xxxHOLiC Movie: Manatsu no Yoru no Yume,2365,6.313742
xxxHOLiC Rou,1513,6.403173
xxxHOLiC Shunmuki,1974,6.238602


In [16]:
# Cambio de nombre de las columnas
pivot_rated_anime.columns = ['Cantidad', 'Media_rating']

# Ordenar por la cantidad de observaciones 
pivot_rated_anime.sort_values(by='Media_rating', ascending=False)

Unnamed: 0_level_0,Cantidad,Media_rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Choegang Top Plate,1,10.0
STAR BEAT!: Hoshi no Kodou,1,10.0
Shiroi Zou,1,10.0
Warui no wo Taose!! Salaryman Man,1,10.0
"Yakushiji Ryouko no Kaiki Jikenbo: Hamachou, Voice & Fiction",1,9.0
...,...,...
Kyoufu no Kyou-chan,1,-1.0
Hana no Zundamaru,2,-1.0
Hana to Shounen,1,-1.0
Pro Yakyuu wo 10-bai Tanoshiku Miru Houhou Part 2,1,-1.0


##### <span style="color:orange"> Top 10 animes con mejor valoraciones

In [17]:
# ordenar por rating_global y aquellas observaciones que tienen cantidad > 1000
pivot_rated_anime[pivot_rated_anime['Cantidad'] > 1000].sort_values(by='Media_rating', ascending=False).head(10)

Unnamed: 0_level_0,Cantidad,Media_rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kimi no Na wa.,2199,8.297863
Steins;Gate,19283,8.126796
Fullmetal Alchemist: Brotherhood,24574,8.028933
Gintama°,1386,7.95671
Hunter x Hunter (2011),8575,7.924082
Clannad: After Story,17854,7.835275
Monster,4594,7.809099
Gintama,4974,7.775231
Code Geass: Hangyaku no Lelouch R2,24242,7.765943
Shigatsu wa Kimi no Uso,9448,7.740262


## <span style="color:#87CEEB"> **4.- Recomendador**

In [18]:
rated_anime.head()

Unnamed: 0,user_id,anime_id,name,rating_user
0,99,32281,Kimi no Na wa.,5
1,152,32281,Kimi no Na wa.,10
2,244,32281,Kimi no Na wa.,10
3,271,32281,Kimi no Na wa.,10
4,278,32281,Kimi no Na wa.,-1


In [19]:
ratings_pivot = rated_anime.pivot_table(index='user_id', columns='name', values='rating_user', aggfunc=np.mean)
ratings_pivot.head()

name,"""0""","""Aesop"" no Ohanashi yori: Ushi to Kaeru, Yokubatta Inu","""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi","""Bungaku Shoujo"" Memoire","""Bungaku Shoujo"" Movie","""Eiji""",.hack//G.U. Returner,.hack//G.U. Trilogy,.hack//G.U. Trilogy: Parody Mode,.hack//Gift,...,makemagic,"on-chan, Yume Power Daibouken!",s.CRY.ed,vivi,xxxHOLiC,xxxHOLiC Kei,xxxHOLiC Movie: Manatsu no Yoru no Yume,xxxHOLiC Rou,xxxHOLiC Shunmuki,◯
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,2.0,,,,,


In [20]:
ratings_pivot.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 73515 entries, 1 to 73516
Columns: 11196 entries, "0" to ◯
dtypes: float64(11196)
memory usage: 6.1 GB


In [21]:
ratings_pivot.fillna(0, inplace=True)
ratings_pivot.head()

name,"""0""","""Aesop"" no Ohanashi yori: Ushi to Kaeru, Yokubatta Inu","""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi","""Bungaku Shoujo"" Memoire","""Bungaku Shoujo"" Movie","""Eiji""",.hack//G.U. Returner,.hack//G.U. Trilogy,.hack//G.U. Trilogy: Parody Mode,.hack//Gift,...,makemagic,"on-chan, Yume Power Daibouken!",s.CRY.ed,vivi,xxxHOLiC,xxxHOLiC Kei,xxxHOLiC Movie: Manatsu no Yoru no Yume,xxxHOLiC Rou,xxxHOLiC Shunmuki,◯
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0


##### <span style="color:orange"> Recomendador colaborativo

In [22]:
sparse_ratings = csr_matrix(ratings_pivot.values)
item_similarity = cosine_similarity(sparse_ratings.T)
item_similarity_df = pd.DataFrame(item_similarity, index=ratings_pivot.columns, columns=ratings_pivot.columns)

In [23]:
def top_animes(name):
    count = 1
    print(f'Animes similares a {name}:')
    similar_indexes = item_similarity_df.sort_values(by=name, ascending=False).index[1:11]
    similar_values = item_similarity_df.sort_values(by=name, ascending=False).loc[:, name].tolist()[1:11]
    for anime, sim in zip(similar_indexes, similar_values):
        print(f'{count}: {anime}.')
        #print(f'{count}: {anime} - Similitud: {round(sim, 4)}.')
        count += 1

In [24]:
top_animes('Blood-C')

Animes similares a Blood-C:
1: Blood-C: The Last Dark.
2: Deadman Wonderland.
3: Blood+.
4: Another.
5: Shiki.
6: Highschool of the Dead.
7: Black★Rock Shooter (TV).
8: Ao no Exorcist.
9: Guilty Crown.
10: C: The Money of Soul and Possibility Control.


In [25]:
top_animes('Hajime no Ippo')

Animes similares a Hajime no Ippo:
1: Hajime no Ippo: New Challenger.
2: Hajime no Ippo: Champion Road.
3: Hajime no Ippo: Mashiba vs. Kimura.
4: Hajime no Ippo: Rising.
5: Hajime no Ippo: Boxer no Kobushi.
6: Great Teacher Onizuka.
7: Shijou Saikyou no Deshi Kenichi.
8: Major S1.
9: Major S2.
10: One Outs.


In [26]:
top_animes('Z/X: Ignition')

Animes similares a Z/X: Ignition:
1: Mahou Sensou.
2: Buddy Complex.
3: Nobunaga the Fool.
4: Nobunagun.
5: Witch Craft Works.
6: Toaru Hikuushi e no Koiuta.
7: Wizard Barristers: Benmashi Cecil.
8: Seikoku no Dragonar.
9: Shirogane no Ishi: Argevollen.
10: BlazBlue: Alter Memory.


In [27]:
top_animes('Rainbow: Nisha Rokubou no Shichinin')

Animes similares a Rainbow: Nisha Rokubou no Shichinin:
1: Bakuman..
2: Gyakkyou Burai Kaiji: Hakairoku-hen.
3: Monster.
4: Gyakkyou Burai Kaiji: Ultimate Survivor.
5: Kiseijuu: Sei no Kakuritsu.
6: One Outs.
7: Great Teacher Onizuka.
8: Steins;Gate.
9: Bakuman. 2nd Season.
10: Hajime no Ippo.


In [28]:
top_animes('Mogura no Motoro')

Animes similares a Mogura no Motoro:
1: Omedetou Jesus-sama.
2: Tetsu no Ko Kanahiru.
3: Kikansha Yaemon: D51 no Daibouken.
4: Hitotsu no Hana.
5: Tetsujin 28-gou Gao!.
6: Madonna (Movie).
7: Hitoribotchi.
8: Biriken Nandemo Shoukai.
9: Biriken.
10: Shiokari Touge.


In [29]:
top_animes('Bleach')

Animes similares a Bleach:
1: Bleach Movie 4: Jigoku-hen.
2: Naruto.
3: Fairy Tail.
4: Bleach Movie 3: Fade to Black - Kimi no Na wo Yobu.
5: Ao no Exorcist.
6: Sword Art Online.
7: Fullmetal Alchemist: Brotherhood.
8: Bleach Movie 2: The DiamondDust Rebellion - Mou Hitotsu no Hyourinmaru.
9: Bleach Movie 1: Memories of Nobody.
10: Shingeki no Kyojin.
