# Collaborative Filtering
This notebook will contain the steps to create a collaborative fitering recommendation system for the anime and ratings datasets.

In [2]:
# Import libraries
import pandas as pd
import numpy as np

# I will need to square root values so I will use the math library
from math import sqrt

## Preprocessing the Data
### Anime Dataset
Let's read the anime dataset into a pandas dataframe

In [7]:
anime_df = pd.read_csv("datasets/cleaned_anime.csv")
anime_df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64.0,9.26,793665
1,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.25,114262
2,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24.0,9.17,673572
3,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.16,151266
4,32935,Haikyuu!!: Karasuno Koukou VS Shiratorizawa Ga...,"Comedy, Drama, School, Shounen, Sports",TV,10.0,9.15,93351


With collaborative filtering, we actually don't need a lot of the animes' metadata as the recommendations are primarily dictated by the users who have watched a certain anime and not the actual contents. This contrasts with content based filtering which uses the anime information to predict what a user would like. As a result of this, we can remove a lot of information that we don't need from the anime data both to make the process clearer and save memory.

In [8]:
# Remove information we don't need
anime_df = anime_df.loc[:, ["anime_id", "name", "rating"]]
anime_df.head()

Unnamed: 0,anime_id,name,rating
0,5114,Fullmetal Alchemist: Brotherhood,9.26
1,28977,Gintama°,9.25
2,9253,Steins;Gate,9.17
3,9969,Gintama&#039;,9.16
4,32935,Haikyuu!!: Karasuno Koukou VS Shiratorizawa Ga...,9.15


### Ratings Dataset
Read in the data into a dataframe

In [9]:
rating_df = pd.read_csv("datasets/cleaned_rating.csv")
rating_df.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,
1,1,24,
2,1,79,
3,1,226,
4,1,241,
