# Kave Ar-Ge Odaklı Yapay Zeka Eğitimi Sınav Sorusu 
Bu çalışma kapsamında sizden öncelikle movielens verisetini kullanarak filmler arasındaki benzerlikleri bulmanız. Ardından bu benzerlikleri kullanarak kişilere film önerisi yapmanızdır. Sonrasında ise bu öneri yapan fonksiyonu Streamlit ile bir uygulama haline getirip kodlarını bizimle paylaşmanızı bekliyoruz.

# Önemli Not: Başvuru kabulü için size sorulan soruyu çözmenizden çok, o soruyu çözmek için ne kadar uğraştığınız önemlidir. Motivasyonu yüksek gençlerle çalışmak çok farklı, bunu biliyoruz, sizi önemsiyoruz ve bekliyoruz.



# Soru İçeriği

#### 1. MovieLens verisetini kullanarak film önerisi yapan bir algoritmanın yazılması
#### 2. Kişiden film ismi alınınca ona benzer filmleri önerebilen fonksiyonun yazılması
#### 3. Çözümün Streamlit ile bu kullanıcının kullanabileceği bir uygulama haline getirilmesi

# 1. MovieLens verisetini kullanarak film önerisi yapan bir algoritmanın yazılması

Bu bölüm kapsamında sizden ekte sunduğumuz verisetinden filmlerin arasındaki benzerliği bulabileceğiniz ve bu benzerlikler üzerinden kullanıcılara film önerebileceğiniz bir algoritma geliştirmenizi bekliyoruz. 

Bu bölümde yardım alabileceğiniz kaynaklar
- [How To Build Your First Recommender System Using Python & MovieLens Dataset](https://analyticsindiamag.com/how-to-build-your-first-recommender-system-using-python-movielens-dataset/)
- [Build Recommender Systems with Movielens Dataset in Python](https://www.codespeedy.com/build-recommender-systems-with-movielens-dataset-in-python/)
- [Collaborative Filtering for Movie Recommendations](https://www.kaggle.com/code/faressayah/collaborative-filtering-for-movie-recommendations)

In [7]:
import pandas as pd
import numpy as np
import os

In [8]:
os.listdir('datasets')

['ratings.csv', 'README.txt', 'movies.csv']

In [9]:
movies = pd.read_csv('datasets/movies.csv')
movies.head(10)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


In [10]:
ratings = pd.read_csv('datasets/ratings.csv')
ratings.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


### Verisetlerini bir araya getirelim. 

In [11]:
# MovieID üzerinden kişilerin yorumlarına film isimlerini ve genrelerini ekliyoruz. 
df = pd.merge(ratings, movies, how='left', on='movieId')

df.head(10)

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
5,1,70,3.0,964982400,From Dusk Till Dawn (1996),Action|Comedy|Horror|Thriller
6,1,101,5.0,964980868,Bottle Rocket (1996),Adventure|Comedy|Crime|Romance
7,1,110,4.0,964982176,Braveheart (1995),Action|Drama|War
8,1,151,5.0,964984041,Rob Roy (1995),Action|Drama|Romance|War
9,1,157,5.0,964984100,Canadian Bacon (1995),Comedy|War


# Feature Engineering

<h4>Average Rating</h4>

In [12]:
average_ratings = pd.DataFrame(df.groupby('title')['rating'].mean())

average_ratings.head(10)

Unnamed: 0_level_0,rating
title,Unnamed: 1_level_1
'71 (2014),4.0
'Hellboy': The Seeds of Creation (2004),4.0
'Round Midnight (1986),3.5
'Salem's Lot (2004),5.0
'Til There Was You (1997),4.0
'Tis the Season for Love (2015),1.5
"'burbs, The (1989)",3.176471
'night Mother (1986),3.0
(500) Days of Summer (2009),3.666667
*batteries not included (1987),3.285714


<h4>Total Number of Ratings</h4>

In [13]:
average_ratings['total ratings'] = pd.DataFrame(df.groupby('title')['rating'].count())

average_ratings.head(10)

Unnamed: 0_level_0,rating,total ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'71 (2014),4.0,1
'Hellboy': The Seeds of Creation (2004),4.0,1
'Round Midnight (1986),3.5,2
'Salem's Lot (2004),5.0,1
'Til There Was You (1997),4.0,2
'Tis the Season for Love (2015),1.5,1
"'burbs, The (1989)",3.176471,17
'night Mother (1986),3.0,1
(500) Days of Summer (2009),3.666667,42
*batteries not included (1987),3.285714,7


<h4>Calculating the Correlation</h4>

In [14]:
movie_user = df.pivot_table(index='userId', columns='title', values='rating')

movie_user.head(10)

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,4.0,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
6,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,1.0,,,
10,,,,,,,,,,,...,,,,,,,,,,


In [15]:
correlations_test = movie_user.corrwith(movie_user['Toy Story (1995)'])

correlations_test.head(10)

  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)


title
'71 (2014)                                      NaN
'Hellboy': The Seeds of Creation (2004)         NaN
'Round Midnight (1986)                          NaN
'Salem's Lot (2004)                             NaN
'Til There Was You (1997)                       NaN
'Tis the Season for Love (2015)                 NaN
'burbs, The (1989)                         0.240563
'night Mother (1986)                            NaN
(500) Days of Summer (2009)                0.353833
*batteries not included (1987)            -0.427425
dtype: float64

In [17]:
recommendation_test = pd.DataFrame(correlations_test, columns=['correlation'])
recommendation_test.dropna(inplace=True)
recommendation_test = recommendation_test.join(average_ratings['total ratings'])

recommendation_test.head()

Unnamed: 0_level_0,correlation,total ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
"'burbs, The (1989)",0.240563,17
(500) Days of Summer (2009),0.353833,42
*batteries not included (1987),-0.427425,7
10 Cent Pistol (2015),1.0,2
10 Cloverfield Lane (2016),-0.285732,14


<h4>Testing the Recommendation System</h4>

In [18]:
recc_test = recommendation_test[recommendation_test['total ratings']>100].sort_values('correlation', ascending=False).reset_index()
recc_test = recc_test.merge(movies, on='title', how='left')
recc_test.head(10)

Unnamed: 0,title,correlation,total ratings,movieId,genres
0,Toy Story (1995),1.0,215,1,Adventure|Animation|Children|Comedy|Fantasy
1,"Incredibles, The (2004)",0.643301,125,8961,Action|Adventure|Animation|Children|Comedy
2,Finding Nemo (2003),0.618701,141,6377,Adventure|Animation|Children|Comedy
3,Aladdin (1992),0.611892,183,588,Adventure|Animation|Children|Comedy|Musical
4,"Monsters, Inc. (2001)",0.490231,132,4886,Adventure|Animation|Children|Comedy|Fantasy
5,Mrs. Doubtfire (1993),0.446261,144,500,Comedy|Drama
6,"Amelie (Fabuleux destin d'Amélie Poulain, Le) ...",0.438237,120,4973,Comedy|Romance
7,American Pie (1999),0.420117,103,2706,Comedy|Romance
8,Die Hard: With a Vengeance (1995),0.410939,144,165,Action|Crime|Thriller
9,E.T. the Extra-Terrestrial (1982),0.409216,122,1097,Children|Drama|Sci-Fi


## 2. Kişiden film ismi alınınca ona benzer filmleri önerebilen fonksiyonun yazılması

Bundan sonrasında verisetini kullanıp çeşitli ön işlemelerden ve geliştirmelerden sonra alttaki gibi bir fonksiyon oluşturmanızı bekliyoruz. 

In [19]:
def film_oner(movie_id):
    for i in range(len(df.title)):
        if (df.movieId[i] == movie_id):
            movie_name = df.title[i]
            break
    correlations = movie_user.corrwith(movie_user[movie_name])
    
    recommendation = pd.DataFrame(correlations, columns=['Correlation'])
    recommendation.dropna(inplace=True)
    recommendation = recommendation.join(average_ratings['total ratings'])
    
    recc = recommendation[recommendation['total ratings']>100].sort_values('Correlation', ascending=False).reset_index()
    recc = recc.merge(movies, on='title', how='left')
    
    recommended_movies = []
    for i in range(1, 6):
        recommended_movies.append(recc['title'][i])
    
    return recommended_movies

In [20]:
my_movies = film_oner(1)
for i in range(len(my_movies)):
    print(my_movies[i])

  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)


Incredibles, The (2004)
Finding Nemo (2003)
Aladdin (1992)
Monsters, Inc. (2001)
Mrs. Doubtfire (1993)


In [22]:
def isimle_film_oner(movie_name):
    correlations = movie_user.corrwith(movie_user[movie_name])
    
    recommendation = pd.DataFrame(correlations, columns=['Correlation'])
    recommendation.dropna(inplace=True)
    recommendation = recommendation.join(average_ratings['total ratings'])
    
    recc = recommendation[recommendation['total ratings']>100].sort_values('Correlation', ascending=False).reset_index()
    recc = recc.merge(movies, on='title', how='left')
    
    recommended_movies = []
    for i in range(1, 6):
        recommended_movies.append(recc['title'][i])
    
    return recommended_movies

In [23]:
my_movie_name = "Forrest Gump (1994)"
my_movies = isimle_film_oner(my_movie_name)
for i in range(len(my_movies)):
    print(my_movies[i])

Good Will Hunting (1997)
Aladdin (1992)
American History X (1998)
Truman Show, The (1998)
Braveheart (1995)


## 3. Çözümün Streamlit ile bu kullanıcının kullanabileceği bir uygulama haline getirilmesi

Bu kısımda ise oluşturduğunuz fonksiyonu ektekine benzer bir arayüzde çalıştırmanızı bekliyoruz. 

![alt text](streamlit-example.png "Örnek")

Yararlanabileceğiniz kaynaklar;
- [How to Collect user inputs with Streamlit](https://www.youtube.com/watch?v=RHzjE-WBaSk)
- [8 Best Streamlit Machine Learning Web App Examples in 2022](https://omdena.com/blog/streamlit-web-app-examples/)

In [30]:
import streamlit as st

In [31]:
st.write("You can either select the name of the movie you like")
st.write("Also, you can type the id of the movie you like")

<h4>If you want to select the name of the movie you like from a selectbox</h4>

In [32]:
option = st.selectbox("What's the name of the movie you like", df['title'])

In [33]:
st.write(isimle_film_oner(option))

<h4>If you want to write the id of the movie you like</h4>

In [34]:
st.text_input("What's the id of the movie you like", key="movie_id")

''

In [35]:
st.write(film_oner(st.session_state.movie_id))

AttributeError: st.session_state has no attribute "movie_id". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization