# User-Based Tavsiye Sistemi

## İş Problemi
---

Online bir film izleme platformu daha önce hazırlamış olduğu tavsiye sistemini geliştirmek istemektedir.

İçerik temelli öneri sistemlerini ve item-based öneri sistemlerini deneyen şirket kullanıcılara **Daha Fazla Özelleştime** yapılmasını istemektedir.

Filmler özelinde benzer beğenilme yapılarına göre öneriler yapılmış fakat bu genel önerileri kullanıcıların kullanıcılara benzerliği üzerinden daha fazla özelleştirmek istemektedirler.

## Veri Seti Hikayesi
---

Veri seti MovieLens tarafından sağlanmıştır.

İçerisinde filmler ve bu filmlere verilen puanları barındırmaktadır.

Veri seti yaklaşık 27000 film için yaklaşık 2000000 derecelendirme içermektedir.

## Değişkenler
---
### movie.csv
- movieId : Eşsiz film numarası. (UniqueID)
- title : Film adı

### rating.csv
- userid : Eşsiz kullanıcı numarası. (UniqueID)
- movieID : Eşsiz film numarası. (UniqueID)
- rating : Kullanıcı tarafından filme verilen puan
- timestamp : Değerlendirme tarihi

## Veri Setinin Hazırlanması

In [1]:
import pandas as pd
pd.set_option("display.max_columns", 20)

In [2]:
movie = pd.read_csv("movie_lens_dataset/movie.csv")
rating = pd.read_csv("movie_lens_dataset/rating.csv")

In [3]:
movie.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [4]:
rating.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,2,3.5,2005-04-02 23:53:47
1,1,29,3.5,2005-04-02 23:31:16
2,1,32,3.5,2005-04-02 23:33:39
3,1,47,3.5,2005-04-02 23:32:07
4,1,50,3.5,2005-04-02 23:29:40


In [5]:
df = movie.merge(rating, how="left", on="movieId")
df.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.0,4.0,1999-12-11 13:36:47
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,6.0,5.0,1997-03-13 17:50:52
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,8.0,4.0,1996-06-05 13:37:51
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,10.0,4.0,1999-11-25 02:44:47
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,11.0,4.5,2009-01-02 01:13:41


In [6]:
comment_counts = pd.DataFrame(df["title"].value_counts())

In [7]:
rare_movies = comment_counts[comment_counts["title"]<= 10000].index

In [8]:
common_movies = df[~df["title"].isin(rare_movies)]

In [9]:
user_movie_df = common_movies.pivot_table(index = ["userId"], columns=["title"], values = "rating")

In [10]:
user_movie_df.head()

title,10 Things I Hate About You (1999),12 Angry Men (1957),2001: A Space Odyssey (1968),28 Days Later (2002),300 (2007),A.I. Artificial Intelligence (2001),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),...,Wild Wild West (1999),William Shakespeare's Romeo + Juliet (1996),Willy Wonka & the Chocolate Factory (1971),Witness (1985),"Wizard of Oz, The (1939)","X-Files: Fight the Future, The (1998)",X-Men (2000),X2: X-Men United (2003),You've Got Mail (1998),Young Frankenstein (1974)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,,,3.5,3.5,,,,,,,...,,,,,3.5,,,4.0,,4.0
2.0,,,5.0,,,,,,,,...,,,,,,,,,,
3.0,,,5.0,,,,3.0,,,,...,,,5.0,4.0,4.0,5.0,,,,5.0
4.0,,,,,,,,,3.0,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,2.0,,,,,,,


In [11]:
random_user = int(pd.Series(user_movie_df.index).sample(1, random_state = 45).values)

In [12]:
random_user

121739

In [13]:
random_user_df = user_movie_df[user_movie_df.index == random_user]
random_user_df

title,10 Things I Hate About You (1999),12 Angry Men (1957),2001: A Space Odyssey (1968),28 Days Later (2002),300 (2007),A.I. Artificial Intelligence (2001),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Ace Ventura: When Nature Calls (1995),Addams Family Values (1993),...,Wild Wild West (1999),William Shakespeare's Romeo + Juliet (1996),Willy Wonka & the Chocolate Factory (1971),Witness (1985),"Wizard of Oz, The (1939)","X-Files: Fight the Future, The (1998)",X-Men (2000),X2: X-Men United (2003),You've Got Mail (1998),Young Frankenstein (1974)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
121739.0,,,1.5,,,,4.0,2.0,,,...,,,,,4.0,,2.0,,,


In [14]:
movies_watched=random_user_df.columns[random_user_df.notna().any()].tolist()


In [15]:
user_movie_df.loc[user_movie_df.index == random_user, user_movie_df.columns == 'Wallace & Gromit: A Close Shave (1995)']

title,Wallace & Gromit: A Close Shave (1995)
userId,Unnamed: 1_level_1
121739.0,5.0


In [16]:
movies_watched_df = user_movie_df[movies_watched]

In [17]:
user_movie_count = movies_watched_df.T.notnull().sum()

In [18]:
user_movie_count.head()

userId
1.0    52
2.0    15
3.0    54
4.0    17
5.0    39
dtype: int64

In [29]:
user_movie_count = user_movie_count.reset_index()

In [30]:
user_movie_count.columns = ["userId", "movie_count"]

In [43]:
perc = len(movies_watched) * 60/100

In [44]:
user_movie_count[user_movie_count["movie_count"]>perc].sort_values("movie_count", ascending = False)

Unnamed: 0,userId,movie_count
96276,96859.0,191
92054,92616.0,191
71536,71975.0,191
29395,29575.0,191
69368,69793.0,191
...,...,...
129218,129997.0,115
94886,95466.0,115
54238,54575.0,115
54666,55008.0,115


In [45]:
user_same_movies = user_movie_count[user_movie_count["movie_count"]>perc]["userId"]

In [46]:
user_same_movies

53            54.0
57            58.0
90            91.0
115          116.0
146          147.0
            ...   
137552    138387.0
137562    138397.0
137569    138404.0
137575    138411.0
137638    138474.0
Name: userId, Length: 5741, dtype: float64

In [47]:
final_df = pd.concat([movies_watched_df[movies_watched_df.index.isin(user_same_movies)],random_user_df[movies_watched]])
final_df.head()

title,2001: A Space Odyssey (1968),"Abyss, The (1989)",Ace Ventura: Pet Detective (1994),Air Force One (1997),Airplane! (1980),Aladdin (1992),Alien (1979),Alien: Resurrection (1997),Aliens (1986),Antz (1998),...,Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Twister (1996),Wallace & Gromit: A Close Shave (1995),Wallace & Gromit: The Wrong Trousers (1993),Waterworld (1995),When Harry Met Sally... (1989),While You Were Sleeping (1995),Who Framed Roger Rabbit? (1988),"Wizard of Oz, The (1939)",X-Men (2000)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
54.0,2.0,4.0,1.0,4.0,5.0,,5.0,3.0,5.0,,...,5.0,4.0,,3.0,,4.0,4.0,5.0,,4.0
58.0,5.0,3.0,,4.5,4.0,,5.0,,4.0,,...,5.0,4.0,5.0,5.0,1.0,,,5.0,,3.0
91.0,2.5,3.0,2.5,3.0,3.0,,4.0,2.5,4.0,3.5,...,4.0,3.0,4.0,4.0,1.5,4.0,2.5,4.0,3.5,4.0
116.0,,3.0,3.5,2.0,2.0,3.0,,2.0,,3.0,...,4.0,1.0,,3.5,2.0,,,3.0,2.0,4.0
147.0,3.5,3.0,4.0,,4.0,2.5,4.0,3.0,4.0,,...,3.5,3.0,,4.5,,,,3.0,,3.0


In [48]:
corr_df = final_df.T.corr().unstack().sort_values().drop_duplicates()

In [49]:
corr_df = pd.DataFrame(corr_df, columns = ["corr"])

In [50]:
corr_df.index.names = ["user_id_1", "user_id_2"]

In [51]:
corr_df = corr_df.reset_index()

In [52]:
corr_df.head()

Unnamed: 0,user_id_1,user_id_2,corr
0,80064.0,67346.0,-0.596493
1,34954.0,89242.0,-0.587927
2,117033.0,9362.0,-0.580623
3,95254.0,117033.0,-0.566383
4,23026.0,116287.0,-0.558577


In [58]:
top_users = corr_df[(corr_df["user_id_1"]== random_user)&(corr_df["corr"]>=0.40)][["user_id_2","corr"]].reset_index(drop=True)

In [60]:
top_users

Unnamed: 0,user_id_2,corr
0,42268.0,0.400564
1,29226.0,0.401986
2,11333.0,0.402167
3,102596.0,0.403186
4,79253.0,0.404128
5,117286.0,0.410032
6,94278.0,0.413656
7,100928.0,0.417729
8,122651.0,0.41854
9,122651.0,0.41854


In [65]:
top_users.rename(columns = {"user_id_2": "userId"}, inplace=True)

In [66]:
top_users_ratings = top_users.merge(rating[["userId", "movieId", "rating"]], how="inner")

In [68]:
top_users_ratings

Unnamed: 0,userId,corr,movieId,rating
0,42268.0,0.400564,1,4.0
1,42268.0,0.400564,2,2.5
2,42268.0,0.400564,5,3.0
3,42268.0,0.400564,10,3.0
4,42268.0,0.400564,11,4.0
...,...,...,...,...
8611,52637.0,0.451806,36517,4.0
8612,52637.0,0.451806,37741,4.0
8613,52637.0,0.451806,44191,4.5
8614,52637.0,0.451806,44195,4.0


In [69]:
top_users_ratings["weighted_rating"] = top_users_ratings["corr"] * top_users_ratings["rating"]

In [71]:
recommendation_df =top_users_ratings.groupby("movieId").agg({"weighted_rating": "mean"})

In [72]:
recommendation_df.reset_index(inplace = True)

In [73]:
recommendation_df.head()

Unnamed: 0,movieId,weighted_rating
0,1,1.744523
1,2,1.316767
2,3,1.354024
3,4,0.804334
4,5,1.178402


In [74]:
recommendation_df.sort_values("weighted_rating", ascending = False)

Unnamed: 0,movieId,weighted_rating
1645,3836,2.259030
987,2237,2.259030
2493,27611,2.205772
249,531,2.205772
2799,79357,2.205772
...,...,...
844,1972,0.201084
1352,3041,0.201084
1188,2643,0.201084
1997,5248,0.201084


In [75]:
movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"]>2.1].sort_values("weighted_rating", ascending = False)

In [77]:
##Örnek kullanıcımıza önerebileceğimiz filmler

In [79]:
movies_to_be_recommend.merge(movie[["movieId", "title"]])

Unnamed: 0,movieId,weighted_rating,title
0,3836,2.259030,Kelly's Heroes (1970)
1,2237,2.259030,Without Limits (1998)
2,24,2.205772,Powder (1995)
3,1357,2.205772,Shine (1996)
4,84152,2.205772,Limitless (2011)
...,...,...,...
70,1209,2.104322,Once Upon a Time in the West (C'era una volta ...
71,1299,2.104322,"Killing Fields, The (1984)"
72,1303,2.104322,"Man Who Would Be King, The (1975)"
73,3196,2.104322,Stalag 17 (1953)
