# Real-time Recommendation System for E-commerce


*In this notebook, we will create a real-time recommendation system for an e-commerce platform. The system will use collaborative filtering to provide product recommendations based on a user's recent browsing or purchase history.*

## 1. Data Collection and Preprocessing

In [1]:
import pandas as pd

# Load dataset with specified dtype to avoid DtypeWarning
data = pd.read_csv('ratings_Electronics.csv', dtype={'column_name': str}, low_memory=False)

In [2]:
# Preview Dataset
data.head()

Unnamed: 0,AKM1MP6P0OYPR,0132793040,5.0,1365811200
0,A2CX7LUOHB2NDG,321732944,5.0,1341100800
1,A2NWSAGRHCP8N5,439886341,1.0,1367193600
2,A2WNBOD3WNDNKT,439886341,3.0,1374451200
3,A1GI0U4ZRJA8WN,439886341,1.0,1334707200
4,A1QGNMC6O1VW39,511189877,5.0,1397433600


In [3]:
# Add headers manually
data.columns = ['userId', 'itemId', 'rating', 'timestamp']

In [4]:
# Ensure userId and itemId are treated as strings
data['userId'] = data['userId'].astype(str)
data['itemId'] = data['itemId'].astype(str)

In [5]:
# Save preprocessed data
data.to_csv('preprocessed_data.csv', index=False)

In [6]:
# Show a preview of the dataset
print(data.head())

           userId      itemId  rating   timestamp
0  A2CX7LUOHB2NDG  0321732944     5.0  1341100800
1  A2NWSAGRHCP8N5  0439886341     1.0  1367193600
2  A2WNBOD3WNDNKT  0439886341     3.0  1374451200
3  A1GI0U4ZRJA8WN  0439886341     1.0  1334707200
4  A1QGNMC6O1VW39  0511189877     5.0  1397433600


## 2. Building the Recommendation Model

In [10]:
# Train the model using the surprise library.
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split, cross_validate
import joblib

In [12]:
# Load preprocessed data with specified dtype to avoid DtypeWarning
data = pd.read_csv('preprocessed_data.csv', dtype={'timestamp': str})

In [13]:
# Sample 10% of the data for quick iterations
data_sample = data.sample(frac=0.01, random_state=42)  # Reduced sample size to 1% to avoid MemoryError

In [14]:
# Load data into Surprise's format
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(data_sample[['userId', 'itemId', 'rating']], reader)

In [15]:
# Split the data into training and test sets
trainset, testset = train_test_split(data, test_size=0.2)

In [16]:
# Train the model using SVD algorithm
algo = SVD()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x24b35e47dc0>

In [17]:
# Evaluate the model
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.3731  1.3750  1.3441  1.3628  1.3777  1.3665  0.0123  
MAE (testset)     1.0894  1.0865  1.0688  1.0852  1.0931  1.0846  0.0083  
Fit time          2.35    2.24    2.16    2.13    2.32    2.24    0.08    
Test time         0.83    0.21    0.22    0.22    0.25    0.35    0.24    


{'test_rmse': array([1.37306724, 1.37500978, 1.34414426, 1.36283959, 1.37768155]),
 'test_mae': array([1.08944339, 1.08647884, 1.06883478, 1.08516367, 1.09310232]),
 'fit_time': (2.346205472946167,
  2.2405850887298584,
  2.161440372467041,
  2.134302854537964,
  2.3206117153167725),
 'test_time': (0.833310604095459,
  0.20900344848632812,
  0.21633267402648926,
  0.22399663925170898,
  0.24569296836853027)}

In [18]:
# Save the trained model
joblib.dump(algo, 'svd_model.pkl')

['svd_model.pkl']