# Recommender System for Movielens Dataset

**Import of Python modules and loading the ratings**

In [17]:
import numpy as np 
import pandas as pd
data = pd.read_csv('Unsere_Ratings.csv')
data.head(10)



Unnamed: 0,userId,movieId,rating
0,2,1287,5.0
1,3,1882,2.0
2,3,2105,3.0
3,3,27773,5.0
4,3,34150,3.5
5,3,45517,4.0
6,3,45722,4.0
7,3,46578,4.0
8,3,48774,4.0
9,3,50872,4.0


**Loading the movie dataset**

In [18]:
movies_titles_genre = pd.read_csv('Unsere_Datei_movieID.csv')
movies_titles_genre.head(10)



Unnamed: 0,movieId,Title,Genre,Director
0,112852,Guardians of the Galaxy,Action,James Gunn
1,94864,Prometheus,Adventure,Ridley Scott
2,166534,Split,Horror,M. Night Shyamalan
3,155923,Sing,Animation,Christophe Lourdelet
4,135536,Suicide Squad,Action,David Ayer
5,166918,The Great Wall,Action,Yimou Zhang
6,164909,La La Land,Comedy,Damien Chazelle
7,171881,Mindhorn,Comedy,Sean Foley
8,169900,The Lost City of Z,Action,James Gray
9,65567,Passengers,Adventure,Morten Tyldum


**The movie dataset gets merged on the ratings**

In [19]:
data = data.merge(movies_titles_genre, on='movieId', how='left')
data.head(10)



Unnamed: 0,userId,movieId,rating,Title,Genre,Director
0,2,1287,5.0,Ben-Hur,Action,Timur Bekmambetov
1,3,1882,2.0,Godzilla,Action,Gareth Edwards
2,3,2105,3.0,Tron,Action,Joseph Kosinski
3,3,27773,5.0,Old Boy,Action,Spike Lee
4,3,34150,3.5,Fantastic Four,Action,Josh Trank
5,3,45517,4.0,Cars,Animation,John Lasseter
6,3,45722,4.0,Pirates of the Caribbean: Dead Man's Chest,Action,Gore Verbinski
7,3,46578,4.0,Little Miss Sunshine,Comedy,Jonathan Dayton
8,3,48774,4.0,Children of Men,Drama,Alfonso Cuarón
9,3,50872,4.0,Ratatouille,Animation,Brad Bird


**For each movie the average rating is calculated (not used further on)**

In [20]:
Average_ratings = pd.DataFrame(data.groupby('Title')['rating'].mean())
Average_ratings.head(10)



Unnamed: 0_level_0,rating
Title,Unnamed: 1_level_1
(500) Days of Summer,3.772921
10 Cloverfield Lane,3.811644
12 Years a Slave,3.885965
13 Hours,4.181818
2012,2.744635
20th Century Women,3.375
21 Jump Street,3.667279
22 Jump Street,3.546012
3 Idiots,3.925926
300,3.490053


**For each movie the number of total ratings is counted.**

In [21]:
Average_ratings['Total Ratings'] = pd.DataFrame(data.groupby('Title')['rating'].count())
Average_ratings.head(10)



Unnamed: 0_level_0,rating,Total Ratings
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
(500) Days of Summer,3.772921,469
10 Cloverfield Lane,3.811644,146
12 Years a Slave,3.885965,228
13 Hours,4.181818,33
2012,2.744635,233
20th Century Women,3.375,8
21 Jump Street,3.667279,272
22 Jump Street,3.546012,163
3 Idiots,3.925926,54
300,3.490053,754


**A pivot table with one row for each user is created. It contains the ratings each user has given for all movies. NaN= no rating** 

In [22]:
movie_user= data.pivot_table(index='userId',columns='Title', values='rating')
movie_user.head(10)



Title,(500) Days of Summer,10 Cloverfield Lane,12 Years a Slave,13 Hours,2012,20th Century Women,21 Jump Street,22 Jump Street,3 Idiots,300,...,Wreck-It Ralph,X-Men Origins: Wolverine,X-Men: Apocalypse,X-Men: Days of Future Past,Youth,Zero Dark Thirty,Zodiac,Zombieland,Zoolander 2,Zootopia
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,2.5,,,3.0,,4.0,...,,4.0,3.5,4.0,,,,4.0,,
4,,,,,,,4.0,4.0,,,...,3.5,,,,,,,,,4.5
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,
12,3.0,,,,,,,,,,...,,,,,,,,,,
13,,,,,2.0,,,,,4.0,...,,3.0,,,,,,4.0,,
14,4.0,,,,,,,,,,...,,,,,,,,,,
16,,,,,,,,,4.0,,...,,,,,,,,,,
17,,,,,,,,,,2.5,...,,,,,,,,,,


**The input box where the user selects the movie he/she wants recommandations for.**

In [23]:
movie_name = input("Enter your movie name: ") #input your movie name



Enter your movie name: Iron Man


**Calculation of correlations between the selected movie column and each pivot table column.**

In [24]:
correlations= movie_user.corrwith(movie_user[movie_name])
correlations.head()



  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


Title
(500) Days of Summer    0.263133
10 Cloverfield Lane     0.332604
12 Years a Slave        0.254918
13 Hours                0.229918
2012                    0.289931
dtype: float64

**All empty values get removed and the Total ratings get merged on the correlation table**

In [25]:
recommendation = pd.DataFrame(correlations, columns=['Correlation'])
recommendation.dropna(inplace=True)
recommendation= recommendation.join(Average_ratings['Total Ratings'])
recommendation.head()

Unnamed: 0_level_0,Correlation,Total Ratings
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
(500) Days of Summer,0.263133,469
10 Cloverfield Lane,0.332604,146
12 Years a Slave,0.254918,228
13 Hours,0.229918,33
2012,0.289931,233


**All movies with less then 100 ratings get removed and the correlation table gets sortet descending.**

In [26]:
recc = recommendation[recommendation['Total Ratings']>100].sort_values('Correlation', ascending=False).reset_index()



**Final Result: The movie dataset gets merged on the correlation table** 

In [27]:
recc= recc.merge(movies_titles_genre,on='Title',how='left')
recc.head(10)



Unnamed: 0,Title,Correlation,Total Ratings,movieId,Genre,Director
0,Iron Man,1.0,1059,59315,Action,Jon Favreau
1,Iron Man 2,0.705148,486,77561,Action,Jon Favreau
2,Captain America: The Winter Soldier,0.700581,369,110102,Action,Anthony Russo
3,Captain America: The First Avenger,0.632302,335,88140,Action,Joe Johnston
4,Ant-Man,0.628074,276,122900,Action,Peyton Reed
5,Captain America: Civil War,0.626837,266,122920,Action,Anthony Russo
6,Transformers: Dark of the Moon,0.589099,125,87520,Action,Michael Bay
7,Thor,0.584232,398,86332,Action,Kenneth Branagh
8,Avengers: Age of Ultron,0.563166,322,122892,Action,Joss Whedon
9,Transformers,0.548154,385,53996,Action,Michael Bay


**Loop of the whole system, so that multiple movies can be selected. Enter 'exit' for quitting.**

In [None]:
continue_asking = True

# using while loop for repeat searching
while continue_asking:
    # asking user for input movie name
    movie_name = input("Enter your movie name: ")
    # if user enters "exit", then exit loop
    if movie_name == "exit":
        continue_asking = False
    # checking if entered movie name is in list of movies
    elif movie_name in movie_user.columns.values:
        correlations= movie_user.corrwith(movie_user[movie_name])
        correlations.head()

        recommendation = pd.DataFrame(correlations, columns=['Correlation'])
        recommendation.dropna(inplace=True)
        recommendation= recommendation.join(Average_ratings['Total Ratings'])
        recommendation.head()

        recc = recommendation[recommendation['Total Ratings']>100].sort_values('Correlation', ascending=False).reset_index()

        recc= recc.merge(movies_titles_genre,on='Title',how='left')
        print(recc.head(10))
    # if movie is not in list
    else:
        print("Entered movie not found in database. Please try again.")

Enter your movie name: 2012


  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


                            Title  Correlation  Total Ratings  movieId  \
0                            2012     1.000000            233    72378   
1                   Pete's Dragon     0.717430            156     1030   
2  Transformers: Dark of the Moon     0.647157            125    87520   
3                     Ghost Rider     0.633538            130    51077   
4                           Split     0.605497            126   166534   
5                Now You See Me 2     0.597798            122   159093   
6                   Pitch Perfect     0.595450            133    96588   
7                         Shooter     0.591815            187    51935   
8                    Transformers     0.590692            385    53996   
9                  Jurassic World     0.574336            248   117529   

       Genre             Director  
0     Action      Roland Emmerich  
1  Adventure         David Lowery  
2     Action          Michael Bay  
3     Action  Mark Steven Johnson  
4    