# What is Recommendation System ?
Recommender/recommendation system is a subclass of information filtering system that seeks to predict the rating/ preference a user would give to an item.

They are primarily used in applications where a person/ entity is involved with a product/ service. To further improve their experience with this product, we try to personalize it to their needs. For this we have to look up at their past interactions with this product.

*In one line* -> **Specialized content for everyone.**

*For further info, [Wiki](https://en.wikipedia.org/wiki/Recommender_system#:~:text=A%20recommender%20system%2C%20or%20a,would%20give%20to%20an%20item.)*

Here we will learn the foll.

* 1. Popularity based Recommender System
* 2. Cosine Similarity (code from scratch)


# Popularity based recommender system
As the name suggests it recommends based on what is currently trending/ popular across the site. This is particularly useful when you don't have past data as a reference to recommend product to the user. It is not tailor fit for any particular group of audience or movie.

# Import packages and dataset

In [1]:
import numpy as np
import pandas as pd

We will recommend movies based on ratings they have got. For the huge dataset, we will use just movies & ratings data.

In [2]:
#Import movie ratings
data_ratings = pd.read_csv('../input/movie-lens-small-latest-dataset/ratings.csv')
data_ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [3]:
#Import movie data
data_movies = pd.read_csv('../input/movie-lens-small-latest-dataset/movies.csv')
data_movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [6]:
#Merge both the datasets
movie_ratings = pd.merge(data_movies, data_ratings, on = 'movieId')
print(movie_ratings.shape)
movie_ratings.head()

(100836, 6)


Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,964982703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,847434962
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1106635946
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1510577970
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1305696483


# Recommend Popular Movies

This dataset doesn't need much of data preprocessing also there are no NaN values so we can directly proceed over to recommending popular movies based on ratings.

**Things to do:**

* Groupby all movie titles together and find their mean ratings
* Sort movies based on ratings from highest to lowest
* Recommend top n popular movies

In [9]:
#Groupby all movie titles together and find their mean ratings
movie_ratings.groupby('title')['rating'].mean().head()

title
'71 (2014)                                 4.0
'Hellboy': The Seeds of Creation (2004)    4.0
'Round Midnight (1986)                     3.5
'Salem's Lot (2004)                        5.0
'Til There Was You (1997)                  4.0
Name: rating, dtype: float64

In [12]:
#Sort movies based on ratings from highest to lowest
movie_ratings.groupby('title')['rating'].mean().sort_values(ascending = False)

title
Karlson Returns (1970)                           5.0
Winter in Prostokvashino (1984)                  5.0
My Love (2006)                                   5.0
Sorority House Massacre II (1990)                5.0
Winnie the Pooh and the Day of Concern (1972)    5.0
                                                ... 
The Beast of Hollow Mountain (1956)              0.5
Follow Me, Boys! (1966)                          0.5
The Butterfly Effect 3: Revelations (2009)       0.5
The Emoji Movie (2017)                           0.5
Rust and Bone (De rouille et d'os) (2012)        0.5
Name: rating, Length: 9719, dtype: float64

In [13]:
#Recommend top n popular movies
n = 10

movie_ratings.groupby('title')['rating'].mean().sort_values(ascending = False).head(n)

title
Karlson Returns (1970)                           5.0
Winter in Prostokvashino (1984)                  5.0
My Love (2006)                                   5.0
Sorority House Massacre II (1990)                5.0
Winnie the Pooh and the Day of Concern (1972)    5.0
Sorority House Massacre (1986)                   5.0
Bill Hicks: Revelations (1993)                   5.0
My Man Godfrey (1957)                            5.0
Hellbenders (2012)                               5.0
In the blue sea, in the white foam. (1984)       5.0
Name: rating, dtype: float64

**How many users have rated a given movie ?**

In [14]:
movie_ratings['title'].value_counts()
#movie_ratings.groupby('title')['rating'].count().sort_values(ascending = False).head() either of the 2 gives same output

Forrest Gump (1994)                                                  329
Shawshank Redemption, The (1994)                                     317
Pulp Fiction (1994)                                                  307
Silence of the Lambs, The (1991)                                     279
Matrix, The (1999)                                                   278
                                                                    ... 
Kizumonogatari Part 1: Tekketsu (2016)                                 1
Lola Montès (1955)                                                     1
The Escort (2015)                                                      1
Night Guards (2016)                                                    1
Power of Nightmares, The: The Rise of the Politics of Fear (2004)      1
Name: title, Length: 9719, dtype: int64

**What is movie rating and how many people voted for this ?**

In [19]:
#First create a DataFrame
data = pd.DataFrame(movie_ratings.groupby('title')['rating'].mean())
data['rating_counts'] = pd.DataFrame(movie_ratings['title'].value_counts())
#data['rating_counts'] = pd.DataFrame(movie_ratings.groupby('title')['rating'].count()) #either of the 2 codes
data.head()

Unnamed: 0_level_0,rating,rating_counts
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'71 (2014),4.0,1
'Hellboy': The Seeds of Creation (2004),4.0,1
'Round Midnight (1986),3.5,2
'Salem's Lot (2004),5.0,1
'Til There Was You (1997),4.0,2


# Calculating Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors, that measures the cosine of the angle between them. Here we would a write code for cosine similarity from scratch.

*For more info -> [Wiki](https://en.wikipedia.org/wiki/Cosine_similarity#:~:text=Cosine%20similarity%20is%20a%20measure,to%20both%20have%20length%201.)*

**Things to do:**

* Import math package
* Create square root and cosine similarity function

In [21]:
#load packages
from math import *

#Creating 2 functions, square root and cosine similarity just like the formula

def square_rooted(x):
    return round(sqrt(sum([a*a for a in x])),3)

def cosine_similarity(x,y):
    numerator = sum(a*b for a,b in zip(x,y))
    denominator = square_rooted(x) * square_rooted(y)
    return round(numerator/ float(denominator),3)

print(cosine_similarity([3,45,7,2],[2,54,13,15]))

0.972


**0.972 implies a good cosine score.**