## Recommendation System Using SVD collapurative filtering 

Collaborative Filtering:-

This system matches persons with similar interests and provides recommendations based on this matching. Collaborative filters do not require item metadata like its content-based counterparts.
It is basically of two types:-



*   User based filtering:- 
These systems recommend products to a user that similar users have liked. For measuring the similarity between two users we can either use pearson correlation or cosine similarity.
*   Item Based Collaborative Filtering:-Instead of measuring the similarity between users, the item-based CF recommends items based on their similarity with the items that the target user rated. Likewise, the similarity can be computed with Pearson Correlation or Cosine Similarity. The major difference is that, with item-based collaborative filtering, we fill in the blank vertically, as oppose to the horizontal manner that user-based CF does. 


*  Single Value Decomposition:- One way to handle the scalability and sparsity issue created by CF is to leverage a latent factor model to capture the similarity between users and items. Essentially, we want to turn the recommendation problem into an optimization problem. We can view it as how good we are in predicting the rating for items given a user. One common metric is Root Mean Square Error (RMSE). The lower the RMSE, the better the performance.

In [1]:
!pip install scikit-surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 KB[0m [31m24.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp39-cp39-linux_x86_64.whl size=3193690 sha256=a47d0ec15b8d0d8444ba4422b3f8e462eb6bdf65d18a0bfe3cf9e6cd5d4e01f2
  Stored in directory: /root/.cache/pip/wheels/c6/3a/46/9b17b3512bdf283c6cb84f59929cdd5199d4e754d596d22784
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.3


In [2]:
# importing necessary library
import numpy as np
import pandas  as pd
from surprise import Reader, Dataset ,SVD
from surprise.model_selection import cross_validate

# Warnings
import warnings 
%matplotlib inline
warnings.filterwarnings('ignore')

# Drive connect to get the data.
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
df = pd.read_csv('/content/drive/MyDrive/Ecommerce Behavior Project/data_preprocessed.csv')

In [None]:
# collecting data that we will enter the model
matrix=df[df['event_type'] == 'purchase'].groupby(['user_id','product_id'],as_index=False).agg(no_of_purchases=('event_type','count')).sort_values(by='no_of_purchases',ascending=False)


In [None]:
matrix

Unnamed: 0,user_id,product_id,no_of_purchases
74608,401020145,5842139,8
74590,401020145,5751383,8
191631,488080335,5751422,7
253493,525207529,5700052,7
56133,374211459,5700046,7
...,...,...,...
186274,484176343,5809341,1
186273,484176343,5798750,1
186272,484176343,5773361,1
186271,484176343,5759913,1


In [None]:
reader= Reader()
df = Dataset.load_from_df(matrix[['user_id', 'product_id', 'no_of_purchases']], reader)


In [None]:
model = SVD()
cross_validate(model,df,measures=['rmse','mae'], cv=10)

{'test_rmse': array([0.17174204, 0.17978975, 0.17314306, 0.17054866, 0.1774804 ,
        0.17526279, 0.17860892, 0.18163963, 0.17716846, 0.1783324 ]),
 'test_mae': array([0.07135926, 0.0721283 , 0.07087772, 0.07087345, 0.07184023,
        0.07112518, 0.0715546 , 0.07158231, 0.07172135, 0.07226375]),
 'fit_time': (25.363463163375854,
  17.538280487060547,
  17.52564764022827,
  17.63918399810791,
  17.216487407684326,
  17.132529735565186,
  18.577789306640625,
  19.08642888069153,
  18.992698669433594,
  20.240205764770508),
 'test_time': (0.9571421146392822,
  0.6101076602935791,
  0.941192626953125,
  0.9743354320526123,
  1.0478434562683105,
  0.955009937286377,
  0.6132416725158691,
  1.330143928527832,
  0.9906461238861084,
  0.5961718559265137)}

We get a mean Root Mean Sqaure Error of 0.18 approx which is more than good enough for our case. Let us now train on our dataset and arrive at predictions.

let us remmemper:-

Lower RMSE values indicate better performance.

In [None]:
# Train the model using SVD
trainset =df.build_full_trainset()
model.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fa076534100>

let us pick user with user Id 401020145 and check the ratings she/he has given.

In [None]:

matrix[matrix['user_id'] == 401020145] 

Unnamed: 0,user_id,product_id,no_of_purchases
74608,401020145,5842139,8
74590,401020145,5751383,8
74591,401020145,5751422,4
74579,401020145,5622078,2
74588,401020145,5724608,2
74592,401020145,5754852,2
74593,401020145,5754853,2
74594,401020145,5773607,2
74599,401020145,5786834,2
74605,401020145,5809910,2


In [None]:
# Generate predictions for the test set
model.predict(484176343,5751422,2)

Prediction(uid=484176343, iid=5751422, r_ui=2, est=1.034990832214889, details={'was_impossible': False})

If the SVD collaborative filter predicts a nuber_of_purchase is 1.01 for the product with ID 5788423 and user with ID 374211459, it means that the algorithm believes that this user who has not yet purchased this product is likely to purchase it around 1 times

The predicted rating is based on the patterns in the ratings data observed for other users and products. Specifically, the model has learned a set of latent factors that capture different aspects of user preferences and product characteristics. Based on the ratings patterns it has observed, the model has estimated the importance of each factor for predicting a rating for a given user and product. The predicted rating is then calculated by combining the user's preferences with the product's characteristics as represented by these latent factors.

It's important to note that the predicted rating is not necessarily an exact prediction of how this user will rate the product. It's an estimate based on patterns in the data, and there may be other factors that the model is not capturing. However, collaborative filtering algorithms like SVD have been shown to be effective at making accurate predictions in many cases, and can be useful for providing personalized recommendations to users.





---


<surprise.dataset.DatasetAutoFolds at 0x7fa09eb9a310>