<a href="https://colab.research.google.com/github/Praveengovianalytics/50DaysofRecomSystem/blob/main/Day_7_Collabrative_Filtering_recommendation_systems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/Praveengovianalytics/50DaysofRecomSystem/blob/main/Day4_Content_based_filtering_RecSys.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a id='CBRS'></a>
# Collabrative filtering Recommender System


Let us understand this with an example. If person A likes 3 movies, say Interstellar, Inception and Predestination, and person B likes Inception, Predestination and The Prestige, then they have almost similar interests. We can say with some certainty that A should like The Prestige and B should like Interstellar. The collaborative filtering algorithm uses “User Behavior” for recommending items. This is one of the most commonly used algorithms in the industry as it is not dependent on any additional information. There are different types of collaborating filtering techniques and we shall look at them in detail below. 


### User-User collaborative filtering

This algorithm first finds the similarity score between users. Based on this similarity score, it then picks out the most similar users and recommends products which these similar users have liked or bought previously.


In terms of our movies example from earlier, this algorithm finds the similarity between each user based on the ratings they have previously given to different movies. The prediction of an item for a user u is calculated by computing the weighted sum of the user ratings given by other users to an item i.


### Item-Item collaborative filtering

In this algorithm, we compute the similarity between each pair of items.

So in our case we will find the similarity between each movie pair and based on that, we will recommend similar movies which are liked by the users in the past. This algorithm works similar to user-user collaborative filtering with just a little change – instead of taking the weighted sum of ratings of “user-neighbors”, we take the weighted sum of ratings of “item-neighbors”. 

# Import necessary data

In [None]:
# Download datasets
!wget https://datasets.towardsai.net/combined_data_4.txt
!wget https://raw.githubusercontent.com/towardsai/tutorials/master/recommendation_system_tutorial/movie_titles.csv
!wget https://raw.githubusercontent.com/towardsai/tutorials/master/recommendation_system_tutorial/new_features.csv

--2020-11-17 16:10:08--  https://datasets.towardsai.net/combined_data_4.txt
Resolving datasets.towardsai.net (datasets.towardsai.net)... 104.28.12.7, 104.28.13.7, 172.67.128.100, ...
Connecting to datasets.towardsai.net (datasets.towardsai.net)|104.28.12.7|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://towardsai.net/pcombined_data_4.txt [following]
--2020-11-17 16:10:09--  https://towardsai.net/pcombined_data_4.txt
Resolving towardsai.net (towardsai.net)... 172.67.128.100, 104.28.12.7, 104.28.13.7, ...
Connecting to towardsai.net (towardsai.net)|172.67.128.100|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-11-17 16:10:11 ERROR 404: Not Found.

--2020-11-17 16:10:11--  https://raw.githubusercontent.com/towardsai/tutorials/master/recommendation_system_tutorial/movie_titles.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting t

In [None]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 3.7MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1670922 sha256=1471c4cfb238725f01e0baaf4e293b33f85316ccb29b283f1463396ea171b9fa
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
#Common Utils 
from datetime import datetime
import os
import random

# Data processing packages 
import pandas as pd
import numpy as np

# Data Vizualisation packages

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

# Metrics 
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import mean_squared_error

# Recommedation & ML packages 
import xgboost as xgb
from surprise import Reader, Dataset
from surprise import BaselineOnly
from surprise import KNNBaseline
from surprise import SVD
from surprise import SVDpp
from surprise.model_selection import GridSearchCV

In [None]:
def load_data():
    netflix_csv_file = open("netflix_rating.csv", mode = "w")
    rating_files = ['combined_data_4.txt'] 
    for file in rating_files:
        with open(file) as f:
            for line in f:
                line = line.strip()
                if line.endswith(":"):
                    movie_id = line.replace(":", "")
                else:
                    row_data = []
                    row_data = [item for item in line.split(",")]
                    row_data.insert(0, movie_id)
                    netflix_csv_file.write(",".join(row_data))  
                    netflix_csv_file.write('\n')
                    
    netflix_csv_file.close()

    
    df = pd.read_csv('netflix_rating.csv', sep=",", names = ["movie_id","customer_id", "rating", "date"])
    return df

In [None]:
netflix_rating_df = pd.read_csv('netflix_rating.csv', sep=",", names = ["movie_id","customer_id", "rating", "date"])
netflix_rating_df
netflix_rating_df.head()

Unnamed: 0,movie_id,customer_id,rating,date


In [None]:
!wc -l *.csv

 17770 movie_titles.csv
     0 netflix_rating.csv
    74 new_features.csv
 17844 total


In [None]:
!wget https://datasets.towardsai.net/combined_data_4.txt

--2020-11-17 16:16:58--  https://datasets.towardsai.net/combined_data_4.txt
Resolving datasets.towardsai.net (datasets.towardsai.net)... 104.28.13.7, 104.28.12.7, 172.67.128.100, ...
Connecting to datasets.towardsai.net (datasets.towardsai.net)|104.28.13.7|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://towardsai.net/pcombined_data_4.txt [following]
--2020-11-17 16:16:59--  https://towardsai.net/pcombined_data_4.txt
Resolving towardsai.net (towardsai.net)... 104.28.12.7, 104.28.13.7, 172.67.128.100, ...
Connecting to towardsai.net (towardsai.net)|104.28.12.7|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-11-17 16:17:00 ERROR 404: Not Found.

