# Film Bagus Content Based Recomendation


## **Soal 2 - Film Bagus 🎥**

Disediakan dataset daftar film beserta rating yang diberikan oleh penonton.
- __*movies.csv*__ : [unduh](./datasets/movies.csv)
- __*ratings.csv*__ : [unduh](./datasets/ratings.csv)

1. Dengan memanfaatkan dataset tersebut, buatlah sebuah file notebook (_.ipynb_) berisi sebuah __content-based filtering recommendation system__ berdasarkan _genre movie_. Kemudian berikan __rekomendasi 5 judul film__ kepada user berikut:

    - Joko sangat menyukai film bergenre animasi & action, terutama film _**Superman vs. The Elite (2012)**_.

2. Dengan memanfaatkan dataset tersebut, buatlah sebuah file notebook (_.ipynb_) berisi sebuah __collaborative filtering recommendation system__, kemudian berikan __rekomendasi 5 judul film__ kepada user berikut:

    - Widodo sangat menyukai film drama komedi, salah satunya bertajuk _**Being Flynn (2012)**_.

In [2]:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

In [3]:
movie = pd.read_csv('movies.csv')
display(movie.head())

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


### Cleaning Data

In [4]:
movie.isnull().sum()

movieId    0
title      0
genres     0
dtype: int64

In [5]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10329 entries, 0 to 10328
Data columns (total 3 columns):
movieId    10329 non-null int64
title      10329 non-null object
genres     10329 non-null object
dtypes: int64(1), object(2)
memory usage: 242.2+ KB


In [6]:
movie['genres'].unique

<bound method Series.unique of 0        Adventure|Animation|Children|Comedy|Fantasy
1                         Adventure|Children|Fantasy
2                                     Comedy|Romance
3                               Comedy|Drama|Romance
4                                             Comedy
                            ...                     
10324                      Animation|Children|Comedy
10325                                         Comedy
10326                                         Comedy
10327                                          Drama
10328                             (no genres listed)
Name: genres, Length: 10329, dtype: object>

In [7]:
movie[movie['genres']=='(no genres listed)']

Unnamed: 0,movieId,title,genres
10172,126929,Li'l Quinquin ( ),(no genres listed)
10260,135460,Pablo (2012),(no genres listed)
10280,138863,The Big Broadcast of 1936 (1935),(no genres listed)
10301,141305,Round Trip to Heaven (1992),(no genres listed)
10303,141472,The 50 Year Argument (2014),(no genres listed)
10317,143709,The Take (2009),(no genres listed)
10328,149532,Marco Polo: One Hundred Eyes (2015),(no genres listed)


In [8]:
movie['genres'] = movie['genres'].replace('(no genres listed)',np.NaN)
movie = movie.dropna()
movie.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10322 entries, 0 to 10327
Data columns (total 3 columns):
movieId    10322 non-null int64
title      10322 non-null object
genres     10322 non-null object
dtypes: int64(1), object(2)
memory usage: 322.6+ KB


Kolom genres terdapat value '(no genres listed)', kita drop row yang memiliki value tersebut

## Content Based Filtering

In [9]:
cv = CountVectorizer(tokenizer= lambda x: x.split('|'))
ca = cv.fit_transform(movie['genres'])
ca = ca.toarray()
print(cv.get_feature_names())
ca

['action', 'adventure', 'animation', 'children', 'comedy', 'crime', 'documentary', 'drama', 'fantasy', 'film-noir', 'horror', 'imax', 'musical', 'mystery', 'romance', 'sci-fi', 'thriller', 'war', 'western']


array([[0, 1, 1, ..., 0, 0, 0],
       [0, 1, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [10]:
Score = cosine_similarity(ca)
Score

array([[1.        , 0.77459667, 0.31622777, ..., 0.4472136 , 0.4472136 ,
        0.        ],
       [0.77459667, 1.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.31622777, 0.        , 1.        , ..., 0.70710678, 0.70710678,
        0.        ],
       ...,
       [0.4472136 , 0.        , 0.70710678, ..., 1.        , 1.        ,
        0.        ],
       [0.4472136 , 0.        , 0.70710678, ..., 1.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

### Rekomendasi untuk Joko

Joko sangat menyukai film bergenre animasi & action, terutama film _Superman vs. The Elite (2012)_

In [21]:
joko = movie[movie['title']=='Superman vs. The Elite (2012)'].index[0]
joko

9370

In [22]:
similarJoko= list(enumerate(Score[joko]))
similarJoko = sorted(similarJoko, key=lambda x: x[1], reverse=True)
similarJoko[:5]

[(6260, 0.9999999999999998),
 (8637, 0.9999999999999998),
 (9370, 0.9999999999999998),
 (9570, 0.9999999999999998),
 (10167, 0.9999999999999998)]

In [24]:
index_ = []
for i in similarJoko[:10]:
    index_.append(i[0])

for i in index_[:10]:
    if i == joko:
        index_.insert(0,i)
dfJoko = movie.iloc[index_].reset_index()
dfJoko = dfJoko.drop_duplicates(subset='title').reset_index()
dfJoko = dfJoko.drop(columns=['level_0','index'])
dfJoko

Unnamed: 0,movieId,title,genres
0,94974,Superman vs. The Elite (2012),Action|Animation
1,26913,Street Fighter II: The Animated Movie (Sutorît...,Action|Animation
2,79274,Batman: Under the Red Hood (2010),Action|Animation
3,99813,"Batman: The Dark Knight Returns, Part 2 (2013)",Action|Animation
4,124867,Justice League: Throne of Atlantis (2015),Action|Animation
5,138104,Justice League: Gods and Monsters (2015),Action|Animation
6,4850,Spriggan (Supurigan) (1998),Action|Animation|Sci-Fi
7,27441,Blood: The Last Vampire (2000),Action|Animation|Horror
8,60979,Batman: Gotham Knight (2008),Action|Animation|Crime
9,70533,Evangelion: 1.0 You Are (Not) Alone (Evangerio...,Action|Animation|Sci-Fi


__Film Favorit Joko__

In [25]:
dfJoko.iloc[[0]]

Unnamed: 0,movieId,title,genres
0,94974,Superman vs. The Elite (2012),Action|Animation


__5 Film Rekomendasi untuk Joko__

In [27]:
dfJoko.iloc[1:6]

Unnamed: 0,movieId,title,genres
1,26913,Street Fighter II: The Animated Movie (Sutorît...,Action|Animation
2,79274,Batman: Under the Red Hood (2010),Action|Animation
3,99813,"Batman: The Dark Knight Returns, Part 2 (2013)",Action|Animation
4,124867,Justice League: Throne of Atlantis (2015),Action|Animation
5,138104,Justice League: Gods and Monsters (2015),Action|Animation
