<a href="https://colab.research.google.com/github/Anujp87/Data-Analysis-python/blob/main/Recommender_Systems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Recommender System
In this example, we will develop a simple recommendation systems using pandas by suggesting items that are most similar to an item you choose. This is not a truly robust recommendation system - it merely shows what items are most similar to your item choice.
 

In [None]:
# All we need are these two, no scikit necessary
import numpy as np
import pandas as pd

In [None]:
df = pd.read_excel('movieData.xlsx')

In [None]:
df.head()

In [None]:
#get the movie titiles
movie_titles = pd.read_excel("MovieTitles.xlsx")
movie_titles.head()

In [None]:
#Here is how we "join" two tables in Python
df = pd.merge(df,movie_titles,on='itemID')
df.head()

In [None]:
#Time for some visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
%matplotlib inline

In [None]:
#Can you see what I am trying to do here?
ratings = pd.DataFrame(df.groupby('Title')['Rating'].mean())
ratings.head()

In [None]:
# One more practice of the "groupby" method
ratings['Number of Ratings'] = pd.DataFrame(df.groupby('Title')['Rating'].count())
ratings.head()

In [None]:
#Some histograms 
ratings['Number of Ratings'].hist(bins=15)

In [None]:
#If it is too small, you can customize the figure size
plt.figure(figsize=(15,6))
ratings['Number of Ratings'].hist(bins=15)

In [None]:
#Let's make a histogram about the Rating now:


In [None]:
# Now a scatter plot:
plt.figure(figsize=(10,10))
sns.scatterplot(x='Rating',y='Number of Ratings',data=ratings)

For the recommender system to work, we need a product-customer matrix (remember the wine data?) 
We will create a matrix that has the user ids on one axis and the movie title on another axis. 
Each cell will then be the rating the user gave to that movie. Note there will be a lot of NaN values, because most people have not seen most of the movies.

In [None]:
moviemat = df.pivot_table(index='UserID',columns='Title',values='Rating')
moviemat.head()

In [None]:
#Let's take a look at two movies:
df[df.Title == ('Titanic (1997)')]
df[df.Title == ('Forrest Gump (1994)')]

In [None]:
rating1 = moviemat['Titanic (1997)']
rating2 = moviemat['Forrest Gump (1994)']
print(rating1.head())
type(rating1)

In [None]:
#Step 1: Use corrwith() method to get correlations between two pandas series
similar_to_Titanic = moviemat.corrwith(rating1)
similar_to_Forrest = moviemat.corrwith(rating2)

In [None]:
print(similar_to_Titanic)
type(similar_to_Titanic)

In [None]:
#Step 2: Remove all the na's
corr_Titanic = pd.DataFrame(similar_to_Titanic,columns=['Correlation'])
corr_Titanic.dropna(inplace=True)
print(corr_Titanic.head(20))

corr_Forrest = pd.DataFrame(similar_to_Forrest,columns=['Correlation'])
corr_Forrest.dropna(inplace=True)
print(corr_Forrest.head(20))

In [None]:
#Step 3: Check the results 
corr_Titanic.sort_values('Correlation',ascending=False).head(10)
# Why?

In [None]:
corr_Titanic = corr_Titanic.join(ratings['Number of Ratings'])
corr_Titanic.head()

In [None]:
corr_Titanic[corr_Titanic['Number of Ratings']>100].sort_values('Correlation',ascending=False).head(20)

In [None]:
corr_Forrest = corr_Forrest.join(ratings['Number of Ratings'])
corr_Forrest.head()
corr_Forrest[corr_Forrest['Number of Ratings']>100].sort_values('Correlation',ascending=False).head(20)