# Data Understanding

## Load Data

In [9]:
! pip install kaggle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [10]:
! mkdir ~/.kaggle

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [11]:
! cp kaggle.json ~/.kaggle/

In [12]:
! chmod 600 ~/.kaggle/kaggle.json

In [13]:
! kaggle datasets download hernan4444/anime-recommendation-database-2020

anime-recommendation-database-2020.zip: Skipping, found more recently modified local copy (use --force to force download)


In [None]:
! unzip anime-recommendation-database-2020.zip -d anime-recommendation-database

In [15]:
# read data
import pandas as pd

# data yang akan digunakan hanya anime_with_synopsis.csv dan rating_complete.csv
anime_synopsis_df = pd.read_csv('/content/anime-recommendation-database/anime_with_synopsis.csv')

# Data Preparation

In [16]:
anime_synopsis_df.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ..."
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0..."
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...


## Menghilangkan Missing Value

Cek missing value

In [17]:
anime_synopsis_df.isna().sum()

MAL_ID       0
Name         0
Score        0
Genres       0
sypnopsis    8
dtype: int64

Hilangkan missing value

In [18]:
anime_synopsis_df = anime_synopsis_df.dropna()

In [19]:
anime_synopsis_df.isna().sum()

MAL_ID       0
Name         0
Score        0
Genres       0
sypnopsis    0
dtype: int64

# Modeling

## Feature Engineering dengan TF-IDF

In [20]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Inisialisasi TfidfVectorizer
tf = TfidfVectorizer(stop_words='english', min_df=5, max_df=0.9)

# Melakukan perhitungan idf pada synopsis
tf.fit(anime_synopsis_df['sypnopsis']) 

TfidfVectorizer(max_df=0.9, min_df=5, stop_words='english')

In [21]:
# Melakukan fit lalu ditransformasikan ke bentuk matrix
tfidf_matrix = tf.fit_transform(anime_synopsis_df['sypnopsis']) 
 
# Melihat ukuran matrix tfidf
tfidf_matrix.shape 

(16206, 11792)

## Cosine Similarity

In [22]:
from sklearn.metrics.pairwise import cosine_similarity
 
# Menghitung cosine similarity pada matrix tf-idf
cosine_sim = cosine_similarity(tfidf_matrix) 
cosine_sim

array([[1.        , 0.25512308, 0.02576066, ..., 0.        , 0.01650876,
        0.07160921],
       [0.25512308, 1.        , 0.05486192, ..., 0.        , 0.        ,
        0.01856863],
       [0.02576066, 0.05486192, 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.        ,
        0.        ],
       [0.01650876, 0.        , 0.        , ..., 0.        , 1.        ,
        0.03013486],
       [0.07160921, 0.01856863, 0.        , ..., 0.        , 0.03013486,
        1.        ]])

In [23]:
# Membuat dataframe dari variabel cosine_sim dengan baris dan kolom berupa nama anime
cosine_sim_df = pd.DataFrame(cosine_sim, index=anime_synopsis_df['Name'], columns=anime_synopsis_df['Name'])
print('Shape:', cosine_sim_df.shape)
 
# Melihat similarity matrix pada setiap anime
cosine_sim_df.sample(5, axis=1).sample(5, axis=0)

Shape: (16206, 16206)


Name,Karugamo Oyako no Koutsuu Anzen,24H,Donyatsu,Arashi no Yoru ni: Himitsu no Tomodachi,Tekken Chinmi
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Akita Kenritsu Iburi Gakkou Chuutou-bu,0.0,0.0,0.0,0.011909,0.0
One Piece: Mezase! Kaizoku Yakyuu Ou,0.0,0.003202,0.022085,0.0,0.015044
Cardfight!! Vanguard: Asia Circuit-hen,0.0,0.001288,0.008883,0.0,0.027911
Ishikeri,0.0,0.0,0.0,0.0,0.012342
Shichinin no Nana,0.0,0.005442,0.006795,0.0,0.004629


## Mendapatkan rekomendasi

In [49]:
def content_recommendations(nama_anime, similarity_data=cosine_sim_df, items=anime_synopsis_df[['Name','Score','Genres','sypnopsis',]], k=10):

    # Mengambil data dengan menggunakan argpartition untuk melakukan partisi secara tidak langsung sepanjang sumbu yang diberikan    
    # Dataframe diubah menjadi numpy
    # Range(start, stop, step)
    index = similarity_data.loc[:,nama_anime].to_numpy().argpartition(
        range(-1, -k, -1))
    
    # Mengambil data dengan similarity terbesar dari index yang ada
    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    
    # Drop nama anime agar nama anime yang dicari tidak muncul dalam daftar rekomendasi
    closest = closest.drop(nama_anime, errors='ignore')
 
    return pd.DataFrame(closest).merge(items).head(k)

In [50]:
anime_synopsis_df[anime_synopsis_df.Name.eq('Bakemonogatari')]

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
3434,5081,Bakemonogatari,8.36,"Romance, Supernatural, Mystery, Vampire","Koyomi Araragi, a third-year high school stude..."


In [51]:
content_recommendations('Bakemonogatari')

Unnamed: 0,Name,Score,Genres,sypnopsis
0,Nekomonogatari: Kuro,7.98,"Comedy, Supernatural, Romance, Ecchi","fter surviving a vampire attack, Koyomi Ararag..."
1,Owarimonogatari 2nd Season,8.93,"Mystery, Comedy, Supernatural, Vampire",Following an encounter with oddity specialist ...
2,Nisemonogatari,8.18,"Mystery, Comedy, Supernatural, Ecchi","Surviving a vampire attack, meeting several gi..."
3,Monogatari Series: Second Season,8.78,"Mystery, Comedy, Supernatural, Romance, Vampire","pparitions, oddities, and gods continue to man..."
4,Kizumonogatari I: Tekketsu-hen,8.4,Vampire,During Koyomi Araragi's second year at Naoetsu...
5,Zoku Owarimonogatari,8.5,"Mystery, Comedy, Supernatural, Vampire","Graduation day is finally here, marking the en..."
6,Owarimonogatari 2nd Season Recaps,7.83,"Mystery, Comedy, Supernatural, Vampire",Two recaps aired before the first episode of t...
7,Kizumonogatari III: Reiketsu-hen,8.82,"Action, Mystery, Supernatural, Vampire",fter helping revive the legendary vampire Kiss...
8,Owarimonogatari,8.46,"Mystery, Comedy, Supernatural, Vampire",peculiar transfer student named Ougi Oshino ha...
9,Kizumonogatari II: Nekketsu-hen,8.61,"Action, Mystery, Supernatural, Vampire","No longer truly human, Koyomi Araragi decides ..."


# Evaluasi

Saya akan mengukur precision metric untuk 3 rekomendasi anime yang berbeda kemudian menghitung rata-rata dari 3 nilai tersebut.

In [56]:
# buat list untuk menampung nilai precision
precision = []

## Anime ke-1

In [60]:
anime_synopsis_df[anime_synopsis_df.Name.eq('Bakemonogatari')]

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
3434,5081,Bakemonogatari,8.36,"Romance, Supernatural, Mystery, Vampire","Koyomi Araragi, a third-year high school stude..."


In [53]:
content_recommendations('Bakemonogatari')

Unnamed: 0,Name,Score,Genres,sypnopsis
0,Nekomonogatari: Kuro,7.98,"Comedy, Supernatural, Romance, Ecchi","fter surviving a vampire attack, Koyomi Ararag..."
1,Owarimonogatari 2nd Season,8.93,"Mystery, Comedy, Supernatural, Vampire",Following an encounter with oddity specialist ...
2,Nisemonogatari,8.18,"Mystery, Comedy, Supernatural, Ecchi","Surviving a vampire attack, meeting several gi..."
3,Monogatari Series: Second Season,8.78,"Mystery, Comedy, Supernatural, Romance, Vampire","pparitions, oddities, and gods continue to man..."
4,Kizumonogatari I: Tekketsu-hen,8.4,Vampire,During Koyomi Araragi's second year at Naoetsu...
5,Zoku Owarimonogatari,8.5,"Mystery, Comedy, Supernatural, Vampire","Graduation day is finally here, marking the en..."
6,Owarimonogatari 2nd Season Recaps,7.83,"Mystery, Comedy, Supernatural, Vampire",Two recaps aired before the first episode of t...
7,Kizumonogatari III: Reiketsu-hen,8.82,"Action, Mystery, Supernatural, Vampire",fter helping revive the legendary vampire Kiss...
8,Owarimonogatari,8.46,"Mystery, Comedy, Supernatural, Vampire",peculiar transfer student named Ougi Oshino ha...
9,Kizumonogatari II: Nekketsu-hen,8.61,"Action, Mystery, Supernatural, Vampire","No longer truly human, Koyomi Araragi decides ..."


Bisa dilihat bahwa semua rekomendasi yang ditampilkan merupakan serial dari anime "Bakemonogatari"

In [57]:
p1 = 10/10
precision.append(p1)

## Anime ke-2

In [61]:
anime_synopsis_df[anime_synopsis_df.Name.eq('Boku no Hero Academia')]

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
10097,31964,Boku no Hero Academia,8.11,"Action, Comedy, School, Shounen, Super Power","The appearance of ""quirks,"" newly discovered s..."


In [54]:
content_recommendations('Boku no Hero Academia')

Unnamed: 0,Name,Score,Genres,sypnopsis
0,Boku no Hero Academia 2nd Season,8.33,"Action, Comedy, Super Power, School, Shounen","UA Academy, not even a violent attack can disr..."
1,Boku no Hero Academia: Sukue! Kyuujo Kunren!,7.32,"Action, Comedy, School, Shounen, Super Power",UA High School must regain the public's confid...
2,Boku no Hero Academia 3rd Season,8.25,"Action, Comedy, Super Power, School, Shounen",s summer arrives for the students at UA Academ...
3,Boku no Hero Academia the Movie 1: Futari no Hero,7.7,"Action, Shounen, Super Power",U.A. High School's students of Class 1-A have ...
4,Boku no Hero Academia: Training of the Dead,7.32,"Action, Comedy, Super Power, School, Shounen",OVA bundled with the 14th manga volume. Story ...
5,Boku no Hero Academia 4th Season,8.06,"Action, Comedy, Super Power, School, Shounen",fter successfully passing his Provisional Hero...
6,Boku no Hero Academia the Movie 2: Heroes:Rising,8.08,"Action, Super Power, Shounen","Izuku ""Deku'' Midoriya and his fellow students..."
7,Yuusha ni Narenakatta Ore wa Shibushibu Shuush...,6.9,"Comedy, Romance, Ecchi, Fantasy",Dreaming of becoming a hero and vanquishing th...
8,Taegeugsonyeon Huin Dogsuli,Unknown,"Action, Adventure, Super Power, Shounen",Korean super hero movie.
9,Hero Bank,6.05,"Game, Kids","In Big Money City, players participate in ""Her..."


Dari 10 rekomendasi ada 2 anime yang tidak dapat direkomendasikan karena perbedaan genre, yaitu "Yuusha ni Narenakatta Ore wa Shibushibu Shuush..." dan "Hero Bank"

In [58]:
p2 = 8/10
precision.append(p2)

## Anime ke-3

In [62]:
anime_synopsis_df[anime_synopsis_df.Name.eq('Nisekoi')]

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
7049,18897,Nisekoi,7.65,"Harem, Comedy, Romance, School, Shounen","aku Ichijou, a first-year student at Bonyari H..."


In [59]:
content_recommendations('Nisekoi')

Unnamed: 0,Name,Score,Genres,sypnopsis
0,Aishite Knight,6.72,"Comedy, Romance, Shoujo",Yaeko Mitamura is an 18 year old girl working ...
1,Mitsubachi Maya no Bouken,6.44,"Adventure, Comedy, Kids","aya, a newborn honeybee, brims with curiosity ..."
2,Tight-rope,7.07,"Romance, Shounen Ai",Tightrope tells the tale of the reluctant heir...
3,Yotsunoha,6.36,"Romance, School",Based on the game by Haikuo-Soft. A story abou...
4,Chu Feng: Yi Dian Zhi Zi,Unknown,"Action, Sci-Fi, Romance, Fantasy, Mecha",Ordinary high school student Haoxuan Sun was t...
5,Tegamibachi: Hikari to Ao no Gensou Yawa,7.47,"Adventure, Slice of Life, Supernatural, Fantas...",The story takes place in the land of Amber Gro...
6,Tegamibachi Reverse,7.78,"Adventure, Supernatural, Fantasy, Shounen",fter Niche carries the wounded and stunned Lag...
7,Golgo 13: Queen Bee,6.59,"Action, Adventure, Military, Drama, Seinen",aster assassin Golgo 13 is hired by the adviso...
8,My Life,6.33,"Slice of Life, Drama","high school girl, while pondering her future, ..."
9,Baby☆Love,6.27,"Comedy, Romance",Seara loved Shuhei ever since she was a little...


Dari 10 rekomendasi hanya 3 yang bisa direkomendasikan, yaitu "Aishite Knight", "Yotsunoha" dan "Baby☆Love". Yang lainnya tidak bisa direkomendasikan karena perbedaan genre

In [63]:
p3 = 3/10
precision.append(p3)

## Kesimpulan

In [64]:
# menghitung rata-rata nilai precision

# membuat variabel untuk menampung jumlah nilai precision
jumlah = 0

for i in precision:
  jumlah += i

rata2 = jumlah/3
print("Nilai rata-rata precision :",rata2)

Nilai rata-rata precision : 0.7000000000000001


Nilai presisi sudah cukup tinggi, namun karena pembuatan model hanya menggunakan data sinopsis, maka rekomendasi yang diberikan akan sangat bergantung pada kata-kata dalam sinopsis. Sehingga kejadian seperti anime ke-3 mungkin akan sering terjadi. 