# Item-Based Tavsiye Sistemi
---

## İş Problemi
---
Online bir film izleme platformu iş birlikçi filtreleme yöntemi ile bir öneri sistemi geliştirmek istemektedir.

İçerik temelli öneri sistemlerini deneyen şirket topluluğun kanaatlerini barındıracak şekilde önerileri geliştirmek istemektedir.

Kullanıcılar bir filmi beğendiğinde o film ile benzer beğenilme örüntüsüne sahip olan diğer filmler önerilmek istenmektedir.

## Veri Seti Hikayesi
---
Veri seti MovieLens tarafından sağlanmıştır.

İçerisinde filmler ve bu filmlere verilen puanları barındırmaktadır.

Veri seti yaklaşık 27000 film için yaklaşık 2000000 derecelendirme içermektedir.

## Değişkenler
---
### movie.csv
- movieId : Eşsiz film numarası. (UniqueID)
- title : Film adı

### rating.csv
- userid : Eşsiz kullanıcı numarası. (UniqueID)
- movieID : Eşsiz film numarası. (UniqueID)
- rating : Kullanıcı tarafından filme verilen puan
- timestamp : Değerlendirme tarihi

In [1]:
import pandas as pd
pd.set_option("display.max_columns", 20)

In [2]:
movie = pd.read_csv("movie_lens_dataset/movie.csv")
rating = pd.read_csv("movie_lens_dataset/rating.csv")

In [3]:
df = movie.merge(rating, how="left", on="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


## User Movie DF'inin Oluşturulması
---

In [4]:
df.shape

(20000797, 6)

In [5]:
#Toplam kaç film var
df.title.nunique()

27262

In [6]:
# 6 bin yorum yapılan film de var 3 bin de
df["title"].value_counts().head()

Pulp Fiction (1994)                 67310
Forrest Gump (1994)                 66172
Shawshank Redemption, The (1994)    63366
Silence of the Lambs, The (1991)    63299
Jurassic Park (1993)                59715
Name: title, dtype: int64

In [7]:
#1000 den az yorum yapılan filmleri dışarda bırakalım değerlendirmeye almayalım
comment_counts = pd.DataFrame(df["title"].value_counts())


In [8]:
rare_movies = comment_counts[comment_counts["title"]<= 10000].index

In [9]:
common_movies = df[~df["title"].isin(rare_movies)]

In [10]:
common_movies.shape

(9050403, 6)

In [11]:
common_movies["title"].nunique()

462

In [12]:
user_movie_df = common_movies.pivot_table(index = ["userId"], columns=["title"], values="rating")

In [13]:
user_movie_df

title,10 Things I Hate About You (1999),12 Angry Men (1957),2001: A Space Odyssey (1968),28 Days Later (2002),300 (2007),A.I. Artificial Intelligence (2001),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),...,Wild Wild West (1999),William Shakespeare's Romeo + Juliet (1996),Willy Wonka & the Chocolate Factory (1971),Witness (1985),"Wizard of Oz, The (1939)","X-Files: Fight the Future, The (1998)",X-Men (2000),X2: X-Men United (2003),You've Got Mail (1998),Young Frankenstein (1974)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,3.5,3.5,,,,,,,...,,,,,3.5,,,4.0,,4.0
2.0,,,5.0,,,,,,,,...,,,,,,,,,,
3.0,,,5.0,,,,3.0,,,,...,,,5.0,4.0,4.0,5.0,,,,5.0
4.0,,,,,,,,,3.0,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,2.0,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138489.0,,4.5,,,,,,,,,...,,,,,,,,,,
138490.0,,,,,,,,,,,...,,,,,,,,,,
138491.0,,,,,,,,,,,...,,,,,,,,,,
138492.0,,,,,,,,,,,...,,,3.5,,,,,,,


## Item-Based Film Önerilerinin Yapılması
---

In [14]:
movie_name = "Matrix, The (1999)"

In [15]:
movie_name = user_movie_df[movie_name]

In [16]:
user_movie_df.corrwith(movie_name).sort_values(ascending=False).head()

title
Matrix, The (1999)                   1.000000
Matrix Reloaded, The (2003)          0.516906
Matrix Revolutions, The (2003)       0.449588
Blade (1998)                         0.334493
Terminator 2: Judgment Day (1991)    0.333882
dtype: float64

## Calışma Scriptinin Hazırlanması

In [20]:
def create_user_movie_df():
    import pandas as pd
    movie = pd.read_csv("movie_lens_dataset/movie.csv")
    rating = pd.read_csv("movie_lens_dataset/rating.csv")
    df = movie.merge(rating, how="left", on="movieId")
    comment_counts = pd.DataFrame(df["title"].value_counts())
    rare_movies = comment_counts[comment_counts["title"]<= 10000].index
    common_movies = df[~df["title"].isin(rare_movies)]
    user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")
    return user_movie_df

In [21]:
def item_based_recommender(movie_name, user_movie_df):
    movie_name = user_movie_df[movie_name]
    return user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)

In [22]:
df2 = create_user_movie_df()

In [24]:
item_based = item_based_recommender("12 Angry Men (1957)", df2)
item_based

title
12 Angry Men (1957)                            1.000000
To Kill a Mockingbird (1962)                   0.412703
Rear Window (1954)                             0.375445
Bridge on the River Kwai, The (1957)           0.335728
Cool Hand Luke (1967)                          0.328843
Great Escape, The (1963)                       0.320499
One Flew Over the Cuckoo's Nest (1975)         0.320060
Seven Samurai (Shichinin no samurai) (1954)    0.317647
North by Northwest (1959)                      0.312872
Casablanca (1942)                              0.310295
dtype: float64