# **LOAD DATA**

Mengimpor modul files dari google.colab dan menjalankan fungsi files.upload() untuk memungkinkan pengguna mengunggah file lokal (misalnya, kaggle.json) ke lingkungan Google Colab.

In [91]:
from google.colab import files
files.upload()  # Upload kaggle.json di sini

Saving kaggle.json to kaggle (1).json


{'kaggle (1).json': b'{"username":"yuliantoaryaseta","key":"a6fa6b69246c8dc275ec2b3c6594245c"}'}

Membuat direktori untuk menyimpan file konfigurasi API Kaggle, menyalin kredensial API, mengunduh dataset "anime-recommendations-database" dari Kaggle, dan mengekstraknya untuk digunakan dalam analisis.

In [92]:
# Persiapan kredensial Kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download dataset dari Kaggle
!kaggle datasets download -d cooperunion/anime-recommendations-database

# Ekstrak file zip
!unzip anime-recommendations-database.zip


Dataset URL: https://www.kaggle.com/datasets/cooperunion/anime-recommendations-database
License(s): CC0-1.0
anime-recommendations-database.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  anime-recommendations-database.zip
replace anime.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: anime.csv               
replace rating.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: rating.csv              


# **IMPORT LIBRARY**

Mengimport library yang dibutuhkan untuk membuat sistem rekomendasi

In [93]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# **DATA UNDERSTANDING**

Menampilkan preview data dari anime

In [94]:
anime = pd.read_csv('/content/anime.csv')
anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


Menampilkan jumlah kolom/fitur, banyak data, dan tipe data dari data anime. Dataset anime ini terdiri dari 12.294 entri yang mencakup informasi seperti judul, genre, tipe, jumlah episode, rating, dan jumlah anggota yang menonton. Beberapa kolom memiliki data yang hilang, yakni kolom genre (62 data kosong), type (25 data kosong), dan rating (230 data kosong).

In [95]:
anime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


Menampilkan jumlah data di data anime berdasarkan anime_id yang unik

In [96]:
print('Jumlah data anime : ', len(anime.anime_id.unique()))

Jumlah data anime :  12294


Mengecek apakah ada nilai NaN di fitur name dan menampilkan jumlah data yang unik pada kolom fitur name, dari eksplorasi ada 2 data yang duplikat.

In [97]:
# Cek apakah ada nilai NaN di kolom 'name'
if anime['name'].isnull().any():
    print("Terdapat nilai NaN di kolom 'name'")
else:
    print("Tidak ada nilai NaN di kolom 'name'")

# Tampilkan jumlah dan nama anime yang unik
print('Banyak data anime: ', len(anime['name'].dropna().unique()))
print('Judul-judul anime: ', anime['name'].dropna().unique())

Tidak ada nilai NaN di kolom 'name'
Banyak data anime:  12292
Judul-judul anime:  ['Kimi no Na wa.' 'Fullmetal Alchemist: Brotherhood' 'Gintama°' ...
 'Violence Gekiga David no Hoshi'
 'Violence Gekiga Shin David no Hoshi: Inma Densetsu'
 'Yasuji no Pornorama: Yacchimae!!']


Menampilkan data yang duplikat pada kolom fitur name, nantinya ini akan kita drop karena dapat menyebabkan kerancuan data.

In [98]:
# Cek apakah ada data duplikat pada kolom 'name'
duplicate_names = anime['name'].duplicated().sum()
if duplicate_names > 0:
    print(f"Terdapat {duplicate_names} duplikat pada kolom 'name'")
    print("Contoh data duplikat:")
    print(anime[anime['name'].duplicated(keep=False)])
else:
    print("Tidak ada duplikat pada kolom 'name'")

Terdapat 2 duplikat pada kolom 'name'
Contoh data duplikat:
       anime_id                     name  \
10140     22399         Saru Kani Gassen   
10141     30059         Saru Kani Gassen   
10193     33193  Shi Wan Ge Leng Xiaohua   
10194     33195  Shi Wan Ge Leng Xiaohua   

                                            genre   type episodes  rating  \
10140                                        Kids    OVA        1    5.23   
10141                                       Drama  Movie        1    4.75   
10193                              Comedy, Parody    ONA       12    6.67   
10194  Action, Adventure, Comedy, Fantasy, Parody  Movie        1    7.07   

       members  
10140       62  
10141       76  
10193      114  
10194      110  


Mengecek dan menampilkan data yang unik sekaligus apakah ada nilai NaN di kolom fitur genre. Dan ternyata ada nilai NaN, dan 3264 genre beberapa diantaranya seperti : Genre-genre anime:  ['Drama, Romance, School, Supernatural'
 'Action, Adventure, Drama, Fantasy, Magic, Military, Shounen'
 'Action, Comedy, Historical, Parody, Samurai, Sci-Fi, Shounen' ...
 'Hentai, Sports' 'Drama, Romance, School, Yuri' 'Hentai, Slice of Life']

In [99]:
# Cek apakah ada nilai NaN di kolom 'genre'
if anime['genre'].isnull().any():
    print("Terdapat nilai NaN di kolom 'genre'")
else:
    print("Tidak ada nilai NaN di kolom 'genre'")

# Tampilkan jumlah dan jenis genre anime yang unik (tanpa NaN)
print('Banyak genre anime: ', len(anime['genre'].dropna().unique()))
print('Genre-genre anime: ', anime['genre'].dropna().unique())


Terdapat nilai NaN di kolom 'genre'
Banyak genre anime:  3264
Genre-genre anime:  ['Drama, Romance, School, Supernatural'
 'Action, Adventure, Drama, Fantasy, Magic, Military, Shounen'
 'Action, Comedy, Historical, Parody, Samurai, Sci-Fi, Shounen' ...
 'Hentai, Sports' 'Drama, Romance, School, Yuri' 'Hentai, Slice of Life']


Menampilkan nilai nilai data di kolom fitur type dan mengecek apakah ada nilai NaN di kolom fitur type. Dan hasilnya menunjukkan bahwa ada nilai NaN di kolom fitur type. Dan ini merupakan nilai-nilai data di kolom fitur type, Tipe-tipe anime:  ['Movie' 'TV' 'OVA' 'Special' 'Music' 'ONA']

In [100]:
# Cek apakah ada nilai NaN di kolom 'type'
if anime['type'].isnull().any():
    print("Terdapat nilai NaN di kolom 'type'")
else:
    print("Tidak ada nilai NaN di kolom 'type'")

# Tampilkan jumlah dan tipe-tipe anime yang unik (tanpa NaN)
print('Banyak tipe anime: ', len(anime['type'].dropna().unique()))
print('Tipe-tipe anime: ', anime['type'].dropna().unique())


Terdapat nilai NaN di kolom 'type'
Banyak tipe anime:  6
Tipe-tipe anime:  ['Movie' 'TV' 'OVA' 'Special' 'Music' 'ONA']


Menampilkan nilai nilai data di kolom fitur episodes dan mengecek apakah ada nilai NaN di kolom fitur episodes. Dan hasilnya menunjukkan bahwa tidak ada nilai NaN di kolom fitur episodes, tetapi ada nilai unknown di kolom fitur ini. Dan ini merupakan nilai-nilai data di kolom fitur episodes, Episode-episode anime:  ['1' '64' '51' '24' '10' '148' '110' '13' '201' '25' '22' '75' '4' '26'
 '12' '27' '43' '74' '37' '2' '11' '99' 'Unknown' '39' '101' '47' '50']

In [101]:
# Cek apakah ada nilai NaN di kolom 'episodes'
if anime['episodes'].isnull().any():
    print("Terdapat nilai NaN di kolom 'episodes'")
else:
    print("Tidak ada nilai NaN di kolom 'episodes'")

# Tampilkan jumlah dan nilai unik episode anime (tanpa NaN)
print('Banyak episode anime: ', len(anime['episodes'].dropna().unique()))
print('Episode-episode anime: ', anime['episodes'].dropna().unique())


Tidak ada nilai NaN di kolom 'episodes'
Banyak episode anime:  187
Episode-episode anime:  ['1' '64' '51' '24' '10' '148' '110' '13' '201' '25' '22' '75' '4' '26'
 '12' '27' '43' '74' '37' '2' '11' '99' 'Unknown' '39' '101' '47' '50'
 '62' '33' '112' '23' '3' '94' '6' '8' '14' '7' '40' '15' '203' '77' '291'
 '120' '102' '96' '38' '79' '175' '103' '70' '153' '45' '5' '21' '63' '52'
 '28' '145' '36' '69' '60' '178' '114' '35' '61' '34' '109' '20' '9' '49'
 '366' '97' '48' '78' '358' '155' '104' '113' '54' '167' '161' '42' '142'
 '31' '373' '220' '46' '195' '17' '1787' '73' '147' '127' '16' '19' '98'
 '150' '76' '53' '124' '29' '115' '224' '44' '58' '93' '154' '92' '67'
 '172' '86' '30' '276' '59' '72' '330' '41' '105' '128' '137' '56' '55'
 '65' '243' '193' '18' '191' '180' '91' '192' '66' '182' '32' '164' '100'
 '296' '694' '95' '68' '117' '151' '130' '87' '170' '119' '84' '108' '156'
 '140' '331' '305' '300' '510' '200' '88' '1471' '526' '143' '726' '136'
 '1818' '237' '1428' '365' '16

Menampilkan nilai nilai data unik di kolom fitur rating dan mengecek apakah ada nilai NaN di kolom fitur rating. Dan hasilnya menunjukkan bahwa  ada nilai NaN di kolom fitur rating. Nilai data dari kolom fitur rating berkisar antara -1 sampai dengan 10. -1 jika ditonton tetapi tidak diber rating

In [102]:
# Cek apakah ada nilai NaN di kolom 'rating'
if anime['rating'].isnull().any():
    print("Terdapat nilai NaN di kolom 'rating'")
else:
    print("Tidak ada nilai NaN di kolom 'rating'")

# Tampilkan jumlah dan nilai unik rating anime (tanpa NaN)
print('Banyak nilai rating anime: ', len(anime['rating'].dropna().unique()))
print('Nilai-nilai rating anime: ', anime['rating'].dropna().unique())


Terdapat nilai NaN di kolom 'rating'
Banyak nilai rating anime:  598
Nilai-nilai rating anime:  [ 9.37  9.26  9.25  9.17  9.16  9.15  9.13  9.11  9.1   9.06  9.05  9.04
  8.98  8.93  8.92  8.88  8.84  8.83  8.82  8.81  8.8   8.78  8.77  8.76
  8.75  8.74  8.73  8.72  8.71  8.69  8.68  8.67  8.66  8.65  8.64  8.62
  8.61  8.6   8.59  8.58  8.57  8.56  8.55  8.54  8.53  8.52  8.51  8.5
  8.49  8.48  8.47  8.46  8.45  8.44  8.43  8.42  8.41  8.4   8.39  8.38
  8.37  8.36  8.35  8.34  8.33  8.32  8.31  8.3   8.29  8.28  8.27  8.26
  8.25  8.24  8.23  8.22  8.21  8.2   8.19  8.18  8.17  8.16  8.15  8.14
  8.13  8.12  8.11  8.1   8.09  8.08  8.07  8.06  8.05  8.04  8.03  8.02
  8.01  8.    7.99  7.98  7.97  7.96  7.95  7.94  7.93  7.92  7.91  7.9
  7.89  7.88  7.87  7.86  7.85  7.84  7.83  7.82  7.81  7.8   7.79  7.78
  7.77  7.76  7.75  7.74  7.73  7.72  7.71  7.7   7.69  7.68  7.67  7.66
  7.65  7.64  7.63  7.62  7.61  7.6   7.59  7.58  7.57  7.56  7.55  7.54
  7.53  7.52  7.51  7.5   7.49

Menampilkan nilai nilai data di kolom fitur members dan mengecek apakah ada nilai NaN di kolom fitur members. Dan hasilnya menunjukkan bahwa tidak ada nilai NaN di kolom fitur members. Nilai-nilai members anime:  [200630 793665 114262 ...  27411  57355    652]

In [103]:
# Cek apakah ada nilai NaN di kolom 'rating'
if anime['members'].isnull().any():
    print("Terdapat nilai NaN di kolom 'members'")
else:
    print("Tidak ada nilai NaN di kolom 'members'")

# Tampilkan jumlah dan nilai unik rating anime (tanpa NaN)
print('Banyak nilai members anime: ', len(anime['members'].dropna().unique()))
print('Nilai-nilai members anime: ', anime['members'].dropna().unique())


Tidak ada nilai NaN di kolom 'members'
Banyak nilai members anime:  6706
Nilai-nilai members anime:  [200630 793665 114262 ...  27411  57355    652]


# **DATA PREPARATION**

Dari proses data understanding langkah-langkah yang akan kita lakukan adalah:


1.   Drop duplikat data pada kolom name
2.   Drop nilai NaN pada kolom genre
3.   Drop nilai NaN pada kolom type
4.   Drop nilai Unknown pada kolom episodes
5.   Drop nilai NaN pada kolom rating




Drop duplikat data pada kolom name

In [104]:
# 1. Drop duplikat berdasarkan kolom 'name'
anime_cleaned = anime.drop_duplicates(subset='name')


Drop nilai NaN pada kolom genre, type, dan rating

In [105]:
# 2. Drop data NaN pada kolom 'genre', 'type', dan 'rating'
anime_cleaned = anime_cleaned.dropna(subset=['genre', 'type', 'rating'])


Drop nilai unknown pada kolom episodes

In [106]:
# 3. Drop baris dengan nilai 'Unknown' pada kolom 'episodes'
anime_cleaned = anime_cleaned[anime_cleaned['episodes'].str.lower() != 'unknown']


Mereset index DataFrame setelah pembersihan data

In [107]:
# 4. Reset index setelah pembersihan
anime_cleaned.reset_index(drop=True, inplace=True)


Cek data hasil pembersihan. Data yang didapatkan setelah pembersihan sebanyak 11.828 data anime

In [108]:
# Cek hasil akhir
print("Jumlah data setelah dibersihkan:", len(anime_cleaned))
print(anime_cleaned.head())

Jumlah data setelah dibersihkan: 11828
   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  


# **CONTENT BASED FILTERING**

Mengubah data genre anime menjadi representasi numerik menggunakan TF-IDF Vectorizer dengan menghapus kata-kata umum (stop words), lalu menghasilkan vektor berdasarkan kata-kata unik dalam genre untuk digunakan dalam perhitungan kemiripan. Hasilnya adalah daftar fitur (genre) unik yang menjadi dimensi dari vektor TF-IDF ['action', 'adventure', 'ai', 'arts', 'cars', 'comedy', 'dementia',
       'demons', 'drama', 'ecchi', 'fantasy', 'fi', 'game', 'harem',
       'hentai', 'historical', 'horror', 'josei', 'kids', 'life', 'magic',
       'martial', 'mecha', 'military', 'music', 'mystery', 'parody',
       'police', 'power', 'psychological', 'romance', 'samurai', 'school',
       'sci', 'seinen', 'shoujo', 'shounen', 'slice', 'space', 'sports',
       'super', 'supernatural', 'thriller', 'vampire', 'yaoi', 'yuri']

In [109]:
# Inisialisasi TF-IDF Vectorizer dengan menghapus stop words (kata umum dalam bahasa Inggris)
tfidf = TfidfVectorizer(stop_words="english")

# Mengubah kolom 'genre' menjadi representasi numerik berbasis TF-IDF
# Setiap genre akan direpresentasikan sebagai vektor berdasarkan kata-kata unik
tfidf_matrix = tfidf.fit_transform(anime_cleaned['genre'])

# Menampilkan daftar fitur/kata unik yang diambil dari kolom 'genre'
# Ini adalah nama-nama genre individual yang digunakan sebagai dimensi vektor
tfidf.get_feature_names_out()


array(['action', 'adventure', 'ai', 'arts', 'cars', 'comedy', 'dementia',
       'demons', 'drama', 'ecchi', 'fantasy', 'fi', 'game', 'harem',
       'hentai', 'historical', 'horror', 'josei', 'kids', 'life', 'magic',
       'martial', 'mecha', 'military', 'music', 'mystery', 'parody',
       'police', 'power', 'psychological', 'romance', 'samurai', 'school',
       'sci', 'seinen', 'shoujo', 'shounen', 'slice', 'space', 'sports',
       'super', 'supernatural', 'thriller', 'vampire', 'yaoi', 'yuri'],
      dtype=object)

Melihat ukuran (dimensi) dari matriks TF-IDF yang dihasilkan, yang menunjukkan jumlah anime (baris) dan jumlah fitur unik dari genre (kolom). 11.828 baris dan 46 kolom.

In [110]:
# Melihat ukuran matrix tfidf
tfidf_matrix.shape

(11828, 46)

Mengonversi matriks TF-IDF dari bentuk sparse menjadi dense array (NumPy array) agar lebih mudah dilihat, dianalisis, atau digunakan dalam proses komputasi selanjutnya.

In [111]:
# Mengubah matriks TF-IDF dari bentuk sparse matrix ke dense array (NumPy array)
# Tujuannya agar lebih mudah dilihat atau diproses lebih lanjut
tfidf_array = tfidf_matrix.toarray()

Membuat DataFrame dari matriks TF-IDF dengan baris sebagai judul anime dan kolom sebagai genre unik, lalu menampilkan secara acak 21 genre dan 10 anime untuk memberikan gambaran isi dari representasi TF-IDF dalam bentuk yang lebih terbaca.

In [112]:
# Membuat dataframe untuk melihat tf-idf matrix
# Kolom diisi dengan genre anime
# Baris diisi dengan judul anime

pd.DataFrame(
    tfidf_matrix.todense(),
    columns=tfidf.get_feature_names_out(),
    index=anime_cleaned.name
).sample(21, axis=1).sample(10, axis=0)

Unnamed: 0_level_0,power,arts,ai,drama,supernatural,mecha,yaoi,game,yuri,shounen,...,ecchi,samurai,school,police,historical,mystery,military,life,josei,shoujo
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Pandra The Animation: Shiroki Yokubou Kuro no Kibou,0.0,0.0,0.0,0.0,0.478172,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ginga Tetsudou 999 (ONA),0.0,0.0,0.0,0.395564,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Toaru Kagaku no Railgun S: Daiji na Koto wa Zenbu Sentou ni Osowatta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hello Kitty: Ringo no Mori to Parallel Town,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Nisou no Kuzu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Why Re-Mix 2002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hakaima Sadamitsu,0.0,0.0,0.0,0.0,0.0,0.451217,0.0,0.0,0.0,0.0,...,0.0,0.0,0.42227,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Crying Freeman,0.0,0.45377,0.0,0.26272,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.480482,0.0,0.0,0.0,0.0,0.0,0.0
Durarara!!x2 Ketsu: Dufufufu!!,0.0,0.0,0.0,0.0,0.581352,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.702038,0.0,0.0,0.0,0.0
Holy Knight,0.0,0.0,0.0,0.0,0.355223,0.0,0.0,0.0,0.0,0.0,...,0.402141,0.0,0.339003,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Menghitung skor kemiripan antar semua anime berdasarkan genre menggunakan cosine similarity terhadap vektor TF-IDF, lalu menghasilkan matriks kemiripan di mana nilai mendekati 1 menunjukkan genre yang sangat mirip antar anime.

In [113]:
# Menghitung kemiripan antar semua anime berdasarkan TF-IDF dari deskripsi/fitur teks
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Menampilkan matriks kemiripan cosine (nilai antara 0 dan 1, semakin tinggi semakin mirip)
cosine_sim

array([[1.        , 0.14669079, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.14669079, 1.        , 0.17854758, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.17854758, 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 1.        ,
        1.        ],
       [0.        , 0.        , 0.        , ..., 1.        , 1.        ,
        1.        ],
       [0.        , 0.        , 0.        , ..., 1.        , 1.        ,
        1.        ]])

Membuat DataFrame dari matriks cosine similarity dengan baris dan kolom berisi judul anime, sehingga memudahkan pencarian kemiripan antar anime. Kemudian, sampel 5 kolom dan 10 baris ditampilkan secara acak untuk melihat sebagian isi matriks kemiripan tersebut.

In [114]:
# Membuat dataframe dari variabel cosine_sim dengan baris dan kolom berupa judul anime
cosine_sim_df = pd.DataFrame(cosine_sim, index=anime_cleaned['name'], columns=anime_cleaned['name'])
print('Shape:', cosine_sim_df.shape)

# Melihat similarity matrix pada setiap judul anime
cosine_sim_df.sample(5, axis=1).sample(10, axis=0)

Shape: (11828, 11828)


name,Love 2 Quad,Pokemon 3D Adventure 2: Pikachu no Kaitei Daibouken,Doraemon: Treasure of the Shinugumi Mountain,Gokudou Sakaba Denden: Gokudou Daisensou Gaiden,Shuukaku no Yoru
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Bamboo Bears,0.0,0.377824,0.329611,1.0,0.0
Pro Golfer Saru: Kouga Hikyou! Kage no Ninpou Golfer Sanjou!,0.0,0.345144,0.301101,0.0,0.0
Kyoufu Densetsu Kaiki! Frankenstein,0.0,0.0,0.0,0.0,0.0
Detective Conan Bonus File: Fantasista Flower,0.0,0.08713,0.243178,0.23061,0.0
The Baby Birds of Norman McLaren,0.0,0.0,0.0,0.0,0.0
Himitsu no Akko-chan 2,0.0,0.0,0.0,0.0,0.0
Aya Hito Shiki to Iu na no Ishi Hata,0.0,0.0,0.0,0.0,0.0
xxxHOLiC Kei,0.0,0.091685,0.079985,0.242665,0.0
JK to Inkou Kyoushi 4,1.0,0.0,0.0,0.0,1.0
Initial D Final Stage,0.0,0.0,0.0,0.0,0.0


# **EVALUATION**

Fungsi anime_recommendations akan memberikan rekomendasi anime berdasarkan judul yang diberikan, dengan mencari anime yang paling mirip menggunakan cosine similarity. Fungsi ini akan mengembalikan daftar k anime yang paling mirip, mengabaikan judul anime yang diminta, dan menampilkan informasi genre dari anime tersebut.

In [115]:
def anime_recommendations(anime_title, similarity_data=cosine_sim_df, items=anime_cleaned[['name', 'genre']], k=50):
    # Mengambil indeks anime yang paling mirip berdasarkan nilai cosine similarity tertinggi
    index = similarity_data.loc[:, anime_title].to_numpy().argpartition(
        range(-1, -k, -1))

    # Mengambil nama-nama anime yang paling mirip
    most_similar = similarity_data.columns[index[-1:-(k+2):-1]]

    # Menghapus judul anime yang diminta dari daftar hasil rekomendasi
    most_similar = most_similar.drop(anime_title, errors='ignore')

    # Menggabungkan hasil dengan data asli untuk mendapatkan informasi genre
    return pd.DataFrame(most_similar).merge(items).head(k)

Menampilkan 50 rekomendasi teratas berdasarkan judul anime

In [116]:
# Mendapatkan rekomendasi anime yang mirip dengan Gintama
anime_recommendations('Naruto')

Unnamed: 0,name,genre
0,Naruto Shippuuden: Sunny Side Battle,"Action, Comedy, Martial Arts, Shounen, Super P..."
1,Naruto Soyokazeden Movie: Naruto to Mashin to ...,"Action, Comedy, Martial Arts, Shounen, Super P..."
2,Naruto: Shippuuden Movie 4 - The Lost Tower,"Action, Comedy, Martial Arts, Shounen, Super P..."
3,Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...,"Action, Comedy, Martial Arts, Shounen, Super P..."
4,Boruto: Naruto the Movie,"Action, Comedy, Martial Arts, Shounen, Super P..."
5,Naruto x UT,"Action, Comedy, Martial Arts, Shounen, Super P..."
6,Boruto: Naruto the Movie - Naruto ga Hokage ni...,"Action, Comedy, Martial Arts, Shounen, Super P..."
7,Kyutai Panic Adventure!,"Action, Martial Arts, Shounen, Super Power"
8,Naruto: Shippuuden Movie 6 - Road to Ninja,"Action, Adventure, Martial Arts, Shounen, Supe..."
9,Rekka no Honoo,"Action, Adventure, Martial Arts, Shounen, Supe..."


Fungsi get_recommendations_by_genre memberikan rekomendasi anime berdasarkan genre yang diminta. Fungsi ini pertama-tama menyaring anime berdasarkan genre yang diberikan, kemudian memilih satu anime sebagai referensi dan menggunakan fungsi anime_recommendations untuk memberikan rekomendasi berdasarkan anime tersebut. Jika tidak ada anime yang sesuai dengan genre, fungsi ini akan mengembalikan pesan error. Hasilnya ditampilkan dalam bentuk tabel yang mencakup rekomendasi anime dan genre terkait.

In [117]:
def get_recommendations_by_genre(genre, similarity_data=cosine_sim_df, items=anime_cleaned[['name', 'genre']], k=50):
    # Filter anime berdasarkan genre yang diberikan
    filtered_items = items[items['genre'].str.contains(genre, case=False)]

    # Jika tidak ada anime dengan genre tersebut, kembalikan pesan error
    if filtered_items.empty:
        return "Tidak ada anime dengan genre tersebut."

    # Ambil satu anime dari hasil filter untuk dijadikan acuan rekomendasi
    anime_title = filtered_items['name'].iloc[0]

    # Dapatkan rekomendasi anime berdasarkan anime acuan
    recommendations = anime_recommendations(anime_title, similarity_data, items, k)

    # Tampilkan hasil rekomendasi dalam tabel
    recommendations = pd.DataFrame(recommendations)
    recommendations = recommendations.rename(columns={'name': 'Anime Recommendations'})
    recommendations = recommendations[['Anime Recommendations', 'genre']]

    # Styling tabel
    recommendations = recommendations.style.set_properties(**{'text-align': 'left'})
    recommendations = recommendations.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

    return recommendations

Menampilkan 50 rekomendasi teratas berdasarkan genre anime

In [118]:
# Mendapatkan rekomendasi anime dengan genre "Action"
recommendations = get_recommendations_by_genre("Action")

# Menampilkan hasil rekomendasi
display(recommendations)

Unnamed: 0,Anime Recommendations,genre
0,Fullmetal Alchemist,"Action, Adventure, Comedy, Drama, Fantasy, Magic, Military, Shounen"
1,Fullmetal Alchemist: The Sacred Star of Milos,"Action, Adventure, Comedy, Drama, Fantasy, Magic, Military, Shounen"
2,Fullmetal Alchemist: Brotherhood Specials,"Adventure, Drama, Fantasy, Magic, Military, Shounen"
3,Tales of Vesperia: The First Strike,"Action, Adventure, Fantasy, Magic, Military"
4,Tide-Line Blue,"Action, Adventure, Drama, Military, Shounen"
5,Fullmetal Alchemist: Reflections,"Adventure, Comedy, Drama, Fantasy, Military, Shounen"
6,Meoteoldosawa Ttomae,"Action, Adventure, Fantasy, Magic, Shounen"
7,Log Horizon Recap,"Action, Adventure, Fantasy, Magic, Shounen"
8,Dragon Quest: Dai no Daibouken Tachiagare!! Aban no Shito,"Action, Adventure, Fantasy, Magic, Shounen"
9,Magi: Sinbad no Bouken (TV),"Action, Adventure, Fantasy, Magic, Shounen"
