# **Sistem Rekomendasi**

Sistem rekomendasi untuk merekomendasikan berbagai film dengan teknik Content-Based Filtering dan Collaborative Filtering


# **Content-Based Filtering**


## **Loading Data**

#### Konversi dataset menjadi sebuah DataFrame

In [1]:
# Membuat DataFrame dari dataset
import pandas as pd
import numpy as np
df = pd.read_csv('tmdb_5000_credits.csv')
df.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [2]:
df1 = pd.read_csv('tmdb_5000_movies.csv')
df1.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


## **Data Understanding**

#### Mengecek fitur 'overview' pada dataset gabungan (df1)

In [3]:
df1['overview'].head(5)

0    In the 22nd century, a paraplegic Marine is di...
1    Captain Barbossa, long believed to be dead, ha...
2    A cryptic message from Bond’s past sends him o...
3    Following the death of District Attorney Harve...
4    John Carter is a war-weary, former military ca...
Name: overview, dtype: object

## **Data Preparation**

#### TF-IDF Vectorizer

In [15]:
from sklearn.feature_extraction.text import TfidfVectorizer
import pickle

# Inisialisasi TfidfVectorizer dan menghapus semua "english stop words" seperti 'the', 'a', dsb
tf = TfidfVectorizer(stop_words='english')

# Menggantikan data NaN dengan String kosong
df1['overview'] = df1['overview'].fillna('')

# Membuat matriks TF-IDF yang diperlukan dengan mengubah data
tf_matrix = tf.fit_transform(df1['overview'])

# Menampilkan ukuran matriks tf-idf
tf_matrix.shape

# Menyimpan matriks tf-idf dalam file pickle
pickle.dump(tf_matrix, open("tfidf_matrix.pkl", "wb"))

#### Membuat reverse map dari index dan judul film

In [5]:
# Membangun reverse map
indeks = pd.Series(df1.index, index=df1['title']).drop_duplicates()

## **Modeling**

#### Cosine Similarity

In [6]:
from sklearn.metrics.pairwise import cosine_similarity

# Menghitung cosine similarity pada matrix tf-idf
cosine_sim = cosine_similarity(tf_matrix, tf_matrix)
cosine_sim

array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.        , ..., 0.02160533, 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.01488159, 0.        ,
        0.        ],
       ...,
       [0.        , 0.02160533, 0.01488159, ..., 1.        , 0.01609091,
        0.00701914],
       [0.        , 0.        , 0.        , ..., 0.01609091, 1.        ,
        0.01171696],
       [0.        , 0.        , 0.        , ..., 0.00701914, 0.01171696,
        1.        ]])

#### Membuat sebuah function untuk mendapatkan rekomendasi

In [7]:
# Fungsi yang menerima judul film sebagai input dan output film yang paling mirip
def film_recommendations(title, cosine_sim=cosine_sim):
    # Mendapatkan indeks film yang cocok dengan judul film
    idx = indeks[title]

    # Mendapatkan skor kemiripan (similarity) dari semua film dengan film yang dipasangkan
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sortir film berdasarkan skor kemiripan
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Mendapatkan skor dari 10 film yang mirip
    sim_scores = sim_scores[1:11]

    # Mendapatkan indeks film
    indeks_film = [i[0] for i in sim_scores]

    # Mengembalikan top 10 film yang paling mirip
    return df1['title'].iloc[indeks_film]

#### Mendapatkan rekomendasi

In [8]:
# Mendapatkan rekomendasi film yang mirip dengan Spider-Man 3
film_recommendations('Spider-Man 3')

159                   Spider-Man
30                  Spider-Man 2
1534               Arachnophobia
20        The Amazing Spider-Man
38      The Amazing Spider-Man 2
1318                   The Thing
4664                     Bronson
3610           Not Easily Broken
4456       Raising Victor Vargas
4276                   Def-Con 4
Name: title, dtype: object

In [9]:
# Mendapatkan rekomendasi film yang mirip dengan Titanic
film_recommendations('Titanic')

1269                                  Raise the Titanic
2143                                         Ghost Ship
2287                         I Can Do Bad All By Myself
770                                       Event Horizon
4287                                            Niagara
3212                                           The Rose
2902                                           Triangle
4228                        The Ballad of Jack and Rose
171     Master and Commander: The Far Side of the World
104                                            Poseidon
Name: title, dtype: object

In [10]:
# Mendapatkan rekomendasi film yang mirip dengan The Dark Knight Rises
film_recommendations('The Dark Knight Rises')

65                              The Dark Knight
299                              Batman Forever
428                              Batman Returns
1359                                     Batman
3854    Batman: The Dark Knight Returns, Part 2
119                               Batman Begins
2507                                  Slow Burn
9            Batman v Superman: Dawn of Justice
1181                                        JFK
210                              Batman & Robin
Name: title, dtype: object