# Recommendation Systems
## Agenda 
- what are Recommendation Systems
- Importance
- Applications
- Types
- Collaborative Filtering
  - Memory Based
    - User Based
    - Item Based
  - Model Based
    - Matrix factorization
    - Singular Value Decomposition
- Content Based Filtering
- Hybrid Filtering

### What are Recommendation Systems

Recommendation Systems are software tools and algorithms that provide suggestions for products, services, or information to users. The suggestions are tailored to the user’s preferences, based on data about their previous behavior and preferences.The underlying function of a recommendation system can be represented as:$$ f: \text{User} \times \text{Item} \rightarrow \text{Rating} $$- **f**: represents the predictive function that estimates the utility (or rating) of an item for a particular user.- **User**: represents the set of all users in the system. Each user is a potential recipient of recommendations.- **Item**: represents the set of all items available for recommendation. These can include products, services, media content, or any other entities that the system recommends.- **Rating**: represents predicted rating or preference score that a user would give to an item, which can be used to generate personalized recommendations

![link text](http://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/ML/Lesson_07/recommendation_system_edited.png)

## Types of Recommendation Systems
There are three types of recommendation systems, each utilizing different techniques to generate personalized recommendations:- Collaborative filtering- Content filtering- Hybrid filtering

### __1.3.1 Collaborative Filtering__Collaborative filtering algorithms recommend items by analyzing user preferences collected from numerous users. They predict future behavior by identifying patterns in historical data, such as which movies users have enjoyed, allowing the system to suggest items with a high likelihood of user agreement.For example, if two users have similar tastes in movies, the system might recommend a new movie to one user that the other has favorably rated.## Mathematical Concept:To quantify the similarity between user preferences, collaborative filtering uses cosine similarity, where user ratings are vectors in a multidimensional space:$$\text{similarity} = \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\|\|\mathbf{B}\|}
$$Where:- $\mathbf{A}$ and $\mathbf{B}$ are vectors representing user ratings.- $\theta$ is the angle between these vectors, which indicates the degree of similarity in user preferences.A smaller angle (or higher cosine similarity) indicates more closely aligned preferences, suggesting that users will likely enjoy similar items. This approach enables more precise recommendations based on shared user interests

There are mainly two types of collaborative filtering techniques:- Memory-based collaborative filtering- Model-based collaborative filtering

### __Memory-Based Collaborative Filtering__Memory-based collaborative filtering is a fundamental approach within recommendation systems that generates predictions based on the entire database of user-item interactions. This method leverages historical data from user ratings to recommend new items or predict user ratings. It operates on the assumption that those who agreed in the past will agree again in the future.There are two primary strategies within memory-based collaborative filtering:- User-based collaborative filtering- Item-based collaborative filtering

### __User-Based Collaborative Filtering__User-based collaborative filtering is a recommendation technique that suggests items or content to a target user based on the preferences and behaviors of similar users. It operates under the assumption that users who have interacted with similar items in the past will continue to have similar preferences in the future.

#### To measure similarity between two users \(u\) and \(v\), cosine similarity can be used:The similarity between two users, \( u \) and \( v \), can be quantified using cosine similarity, which is particularly useful in user-based collaborative filtering. The formula for cosine similarity is as follows:$$\text{similarity}(u, v) = \cos(\theta) = \frac{\sum_{i \in I} r_{ui} \cdot r_{vi}}{\sqrt{\sum_{i \in I} r_{ui}^2} \cdot \sqrt{\sum_{i \in I} r_{vi}^2}}$$#### Variables:- $r_{ui}$ and $r_{vi}$: Ratings given by users u and v to item i.- $I$: The set of items that both users have rated.- theta: The angle between the rating vectors of u and v, where a smaller angle indicates a higher similarity.This approach calculates the cosine of the angle between the two users' rating vectors, effectively measuring their similarity based on the items they have both rated.

In [1]:
# Implement User based Collaborative filtering 

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity
import operator

In [3]:
animes = pd.read_csv('anime.csv')
ratings = pd.read_csv('rating.csv')

In [4]:
animes.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [5]:
animes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [6]:
animes.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [7]:
animes.episodes.unique()

array(['1', '64', '51', '24', '10', '148', '110', '13', '201', '25', '22',
       '75', '4', '26', '12', '27', '43', '74', '37', '2', '11', '99',
       'Unknown', '39', '101', '47', '50', '62', '33', '112', '23', '3',
       '94', '6', '8', '14', '7', '40', '15', '203', '77', '291', '120',
       '102', '96', '38', '79', '175', '103', '70', '153', '45', '5',
       '21', '63', '52', '28', '145', '36', '69', '60', '178', '114',
       '35', '61', '34', '109', '20', '9', '49', '366', '97', '48', '78',
       '358', '155', '104', '113', '54', '167', '161', '42', '142', '31',
       '373', '220', '46', '195', '17', '1787', '73', '147', '127', '16',
       '19', '98', '150', '76', '53', '124', '29', '115', '224', '44',
       '58', '93', '154', '92', '67', '172', '86', '30', '276', '59',
       '72', '330', '41', '105', '128', '137', '56', '55', '65', '243',
       '193', '18', '191', '180', '91', '192', '66', '182', '32', '164',
       '100', '296', '694', '95', '68', '117', '151', '130',

In [9]:
# Handling missing  Values

animes['genre'] =  animes['genre'].fillna('Unknown')
animes['type'] =  animes['type'].fillna('Unknown')
animes['rating']=  animes['rating'].fillna(animes['rating'].mean())

# Replace 'Unknown' in episodes by 0 and convert to int64

animes['episodes'] = animes['episodes'].replace('Unknown', 0).astype('int64') 

In [10]:
animes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12294 non-null  object 
 3   type      12294 non-null  object 
 4   episodes  12294 non-null  int64  
 5   rating    12294 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(3), object(3)
memory usage: 672.5+ KB


In [11]:
ratings.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [12]:
ratings =  ratings[ratings.rating != -1]
ratings.head()

Unnamed: 0,user_id,anime_id,rating
47,1,8074,10
81,1,11617,10
83,1,11757,10
101,1,15451,10
153,2,11771,10


In [13]:
ratings.shape

(6337241, 3)