### Hybrid Filtering
**`Hybrid Filtering (HF)`** adalah metode *recommender* yang mengkombinasikan antara Collaborative Filtering (CF) dan Content Based Filtering (CBF), dimana pengerjaan dilakukan secara berurutan yang dimulai dengan pengerjaan CBF terlebih dahulu, lalu dilanjutkan dengan pengerjaan CF pada kasus ini.

Secara ringkas, urutan pengerjaan metode ini adalah
1. Menggunakan pendekatan **`Content Based Filtering`** untuk mendapatkan nilai/rekomendasi yang diberikan.
2. Menggunakan hasil **`rekomendasi awal`** dari pendekatan CBF untuk dilanjutkan sebagai **inputan** pada metode **`Collaborative Filtering`**.
3. Pada pemanggilan metode CF, inputan yang dimasukkan adalah inputan hasil dari **`rekomendasi awal`** yang sudah didapatkan tadi.

### Load Library and Datasets

In [53]:
import pandas as pd
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

In [54]:
film = pd.read_csv("C:\\Users\\andimu064127\\Tugas\\Use Case - Marketing Analysis\\Dataset\\movies.csv",encoding="Latin1")
rate = pd.read_csv("C:\\Users\\andimu064127\\Tugas\\Use Case - Marketing Analysis\\Dataset\\ratings.csv")

### Data Preprocessing 

In [55]:
#Mengganti format judul dengan menghilangkan tahun dan whitespace pada data
film['title'] = film['title'].str.replace('(\(\d\d\d\d\))', '')
film['title'] = film['title'].apply(lambda x: x.strip())

In [56]:
film.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji,Adventure|Children|Fantasy
2,3,Grumpier Old Men,Comedy|Romance
3,4,Waiting to Exhale,Comedy|Drama|Romance
4,5,Father of the Bride Part II,Comedy


In [57]:
rate.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,12882,1,4.0,1147195252
1,12882,32,3.5,1147195307
2,12882,47,5.0,1147195343
3,12882,50,5.0,1147185499
4,12882,110,4.5,1147195239


In [58]:
#Menghapus kolom timestamp
rate = rate.drop(columns=['timestamp'])

### Menyimpan Inputan dari Content Based Filtering

In [59]:
def rekomen_CB(penonton):
    new_film = film.copy()
    
    for index, row in film.iterrows():
        for genre in row['genres']:
            new_film.at[index, genre] = 1    
    new_film = new_film.fillna(0)
    
    rating_grouping = rate.groupby('userId')
    penonton_film = rating_grouping.get_group(penonton).head(5)
    
    
    user_encode = new_film[new_film['movieId'].isin(penonton_film['movieId'].tolist())]
    user_encode = user_encode.reset_index(drop=True)
    user_encode = user_encode.drop(columns=['movieId', 'title', 'genres'])
    
    user_decode = user_encode.transpose().dot(penonton_film['rating'].reset_index(drop=True))
    
   
    new_film.set_index(new_film['movieId'], inplace=True)
    new_film = new_film.drop('movieId', 1).drop('title', 1).drop('genres', 1)
   
    rekomen = ((new_film*user_decode).sum(axis=1))/(user_decode.sum())
    rekomen = rekomen.sort_values(ascending=False)
    rekomen_film = film.loc[film['movieId'].isin(rekomen.head(10).keys())]
    
    return rekomen_film

In [60]:
user = int(input('Masukan kode Id User: '))
rekomen_CB(user)

Masukan kode Id User: 12882


Unnamed: 0,movieId,title,genres
122,198,Strange Days,Action|Crime|Drama|Mystery|Sci-Fi|Thriller
336,594,Snow White and the Seven Dwarfs,Animation|Children|Drama|Fantasy|Musical
894,1909,"X-Files: Fight the Future, The",Action|Crime|Mystery|Sci-Fi|Thriller
2167,8972,National Treasure,Action|Adventure|Drama|Mystery|Thriller
2174,26614,"Bourne Identity, The",Action|Adventure|Drama|Mystery|Thriller
2186,27904,"Scanner Darkly, A",Animation|Drama|Mystery|Sci-Fi|Thriller
2200,32031,Robots,Adventure|Animation|Children|Comedy|Fantasy|Sc...
2378,60684,Watchmen,Action|Drama|Mystery|Sci-Fi|Thriller|IMAX
2435,79132,Inception,Action|Crime|Drama|Mystery|Sci-Fi|Thriller|IMAX
2453,85414,Source Code,Action|Drama|Mystery|Sci-Fi|Thriller


In [61]:
data_CB_to_CF = rekomen_CB(user)

### Memasukkan Data Inputan dari CB ke dalam Metode CF

In [62]:
def rekomen(list_film):
    list_film.reset_index(drop=True, inplace=True)
    rating = rate[rate['movieId'].isin(data_CB_to_CF['movieId'].tolist())].groupby('movieId').rating.mean().reset_index(drop=True)
    list_film['rating'] = rating
    
    
    list_movie = list_film
    set_user = rate[rate['movieId'].isin(list_movie['movieId'].tolist())]
    set_user_grouping = set_user.groupby(['userId'])
    
    set_user_grouping = sorted(set_user_grouping,  key=lambda x: len(x[1]), reverse=True)

    pearson_dict = {}
    for name, group in set_user_grouping:
    
        group = group.sort_values(by='movieId')
        list_movie = list_movie.sort_values(by='movieId')
     
        n_value = len(group)

        df_temp = list_movie[list_movie['movieId'].isin(group['movieId'].tolist())]
        rating_list = df_temp['rating'].tolist()
        grup_list = group['rating'].tolist()

        _xx = sum([i**2 for i in rating_list]) - pow(sum(rating_list),2)/float(n_value)
        _yy = sum([i**2 for i in grup_list]) - pow(sum(grup_list),2)/float(n_value)
        _xy = sum( i*j for i, j in zip(rating_list, grup_list)) - sum(rating_list)*sum(grup_list)/float(n_value)
        
        if _xx != 0 and _yy != 0:
            pearson_dict[name] = _xy/sqrt(_xx*_yy)
        else:
            pearson_dict[name] = 0
            
    df_pearson = pd.DataFrame.from_dict(pearson_dict, orient='index')
    df_pearson.columns = ['similarityIndex']
    df_pearson['userId'] = df_pearson.index
    df_pearson.index = range(len(df_pearson))
    

    top_50=df_pearson.sort_values(by='similarityIndex', ascending=False)[0:50]
    top_50_new = top_50.merge(rate, left_on='userId', right_on='userId', how='inner')
    top_50_new['bobot_rating'] = top_50_new['similarityIndex']*top_50_new['rating']
    
    bobot_df = top_50_new.groupby('movieId').sum()[['similarityIndex','bobot_rating']]
    bobot_df.columns = ['similarity_index_total','bobot_rating_total']
    
    bobot_df['final_recommendation_score'] = bobot_df['bobot_rating_total']/bobot_df['similarity_index_total']
    bobot_df['movieId'] = bobot_df.index
    
    rekomendasi_film = bobot_df.sort_values(by=['final_recommendation_score'], ascending=False)
    rekomendasi_film.drop(columns=['similarity_index_total', 'bobot_rating_total'], inplace=True)
    
    movie_rekomendasi = film.loc[film['movieId'].isin(rekomendasi_film.head(10)['movieId'].tolist())]
    return movie_rekomendasi

### Kesimpulan

In [63]:
rekomen(data_CB_to_CF)

Unnamed: 0,movieId,title,genres
288,501,Naked,Drama
308,535,Short Cuts,Drama
475,965,"39 Steps, The",Drama|Mystery|Thriller
618,1232,Stalker,Drama|Mystery|Sci-Fi
627,1243,Rosencrantz and Guildenstern Are Dead,Comedy|Drama
1111,2363,Godzilla (Gojira),Drama|Horror|Sci-Fi
1283,2759,Dick,Comedy
1713,4047,Gettysburg,Drama|War
1721,4102,Eddie Murphy Raw,Comedy|Documentary
1751,4291,Nine to Five (a.k.a. 9 to 5),Comedy|Crime


Diatas adalah hasil dari Hybrid Filtering terhadap user dengan id 12882 yang menunjukkan beberapa film yang direkomendasikan, mulai dari Naked hingga film Nine to Five (a.k.a 9 to 5)