Live website: https://ebuss-recommender.onrender.com/

Repo: https://github.com/TheNamesRai/sentiment_product_recommendation/tree/main



## 1  Setup and data loading

Import standard libraries, define the data path, and load the dataset.

In [1]:
from google.colab import files
uploaded = files.upload()
DATA_PATH = 'product_review_dataset.csv'


Saving product_review_dataset.csv to product_review_dataset.csv


In [12]:

import pandas as pd
import numpy as np
import json
import re
from datetime import datetime


In [35]:
# Loading the dataset
df = pd.read_csv(DATA_PATH)

# Display basic information
display(df.head())
df.info()
df.shape

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
0,AV13O1A8GV-KLJ3akUyj,Universal Music,"Movies, Music & Books,Music,R&b,Movies & TV,Mo...",Universal Music Group / Cash Money,Pink Friday: Roman Reloaded Re-Up (w/dvd),2012-11-30T06:21:45.000Z,,,5,i love this album. it's very good. more to the...,Just Awesome,Los Angeles,,joshua,Positive
1,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor. This review was collected as part...,Good,,,dorothy w,Positive
2,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor.,Good,,,dorothy w,Positive
3,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",K-Y,K-Y Love Sensuality Pleasure Gel,2016-01-06T00:00:00.000Z,False,False,1,I read through the reviews on here before look...,Disappointed,,,rebecca,Negative
4,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",K-Y,K-Y Love Sensuality Pleasure Gel,2016-12-21T00:00:00.000Z,False,False,1,My husband bought this gel for us. The gel cau...,Irritation,,,walker557,Negative


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   id                    30000 non-null  object
 1   brand                 30000 non-null  object
 2   categories            30000 non-null  object
 3   manufacturer          29859 non-null  object
 4   name                  30000 non-null  object
 5   reviews_date          29954 non-null  object
 6   reviews_didPurchase   15932 non-null  object
 7   reviews_doRecommend   27430 non-null  object
 8   reviews_rating        30000 non-null  int64 
 9   reviews_text          30000 non-null  object
 10  reviews_title         29810 non-null  object
 11  reviews_userCity      1929 non-null   object
 12  reviews_userProvince  170 non-null    object
 13  reviews_username      29937 non-null  object
 14  user_sentiment        29999 non-null  object
dtypes: int64(1), object(14)
memory usage

(30000, 15)

## 2  Data cleaning

Address missing values, drop duplicates, parse dates, and convert datatypes.  Unnecessary columns are dropped.  Ratings and boolean fields are coerced to numeric.  After cleaning, we confirm the shape of the dataframe.

In [36]:

print("Original shape:", df.shape)

print("Columns:", list(df.columns))

# Drop exact duplicate rows (full-row duplicates)
before = len(df)
df = df.drop_duplicates().copy()
print("Dropped exact dup rows:", before - len(df))

# Parse review date; keep NaT (don’t drop)
df['reviews_date'] = pd.to_datetime(df['reviews_date'], errors='coerce', utc=True)
print("NaT date count:", df['reviews_date'].isna().sum())

# Ratings: numeric coercion
df['reviews_rating'] = pd.to_numeric(df['reviews_rating'], errors='coerce')
print("NaN ratings:", df['reviews_rating'].isna().sum())

# Booleans to 0/1 (leave unknowns as NaN)
for col in ['reviews_didPurchase', 'reviews_doRecommend']:
    if col in df.columns:
        df[col] = (
            df[col]
            .astype(str).str.strip().str.upper()
            .map({'TRUE':1, 'FALSE':0, '1':1, '0':0})
            .astype('float')
        )

# Drop irrelevant columns
cols_to_drop = [c for c in ['reviews_userCity','reviews_userProvince'] if c in df.columns]
df = df.drop(columns=cols_to_drop)
print("Dropped columns:", cols_to_drop)

# Reset index
df.reset_index(drop=True, inplace=True)

print("Unique products (by id):", df['id'].nunique() if 'id' in df.columns else 'N/A')
print("Unique users (by reviews_username):", df['reviews_username'].nunique() if 'reviews_username' in df.columns else 'N/A')


# Basic shape after cleaning
df.shape


Original shape: (30000, 15)
Columns: ['id', 'brand', 'categories', 'manufacturer', 'name', 'reviews_date', 'reviews_didPurchase', 'reviews_doRecommend', 'reviews_rating', 'reviews_text', 'reviews_title', 'reviews_userCity', 'reviews_userProvince', 'reviews_username', 'user_sentiment']
Dropped exact dup rows: 0
NaT date count: 745
NaN ratings: 0
Dropped columns: ['reviews_userCity', 'reviews_userProvince']
Unique products (by id): 271
Unique users (by reviews_username): 24914


(30000, 13)

## 3  Text preprocessing

1. Lowercasing
2. Removing HTML tags and URLs
3. Replacing punctuation with spaces
4. Removing non‑alphanumeric characters
5. Removing stopwords
6. Lemmatizing words

Applying the above functions to the `reviews_text` column and store the result in a new column called `clean_review`.

In [16]:
import re, string, unicodedata, nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)

True

In [37]:
STOP = set(stopwords.words('english'))
STOP -= {'no', 'not', 'nor'}  # keep negation words

LEMM = WordNetLemmatizer()
PUNCT = str.maketrans({ch: " " for ch in string.punctuation})
URL_RE  = re.compile(r'https?://\S+|www\.\S+', re.I)
HTML_RE = re.compile(r'<[^>]+>')

NEG_TRIGGERS = {"not", "no", "never", "n't"}
NEG_WINDOW = 3

In [38]:
def _basic_normalize(s: str) -> str:
    s = unicodedata.normalize("NFKC", s).lower()
    s = URL_RE.sub(" ", s)
    s = HTML_RE.sub(" ", s)
    s = s.translate(PUNCT)
    s = re.sub(r'[^a-z0-9 ]+', ' ', s)
    return re.sub(r'\s+', ' ', s).strip()

def clean_and_tag(text: str) -> str:
    if not isinstance(text, str): return ""
    text = _basic_normalize(text)
    toks = text.split()
    out, neg = [], 0
    for w in toks:
        if w in STOP and w not in NEG_TRIGGERS:
            if neg > 0: neg -= 1
            continue
        if w in NEG_TRIGGERS:
            out.append(w); neg = NEG_WINDOW; continue
        ww = LEMM.lemmatize(w)
        out.append(f"NOT_{ww}" if neg > 0 else ww)
        if neg > 0: neg -= 1
    out = [t for t in out if len(t) > 1 or t.isdigit()]
    return " ".join(out)

def title_plus_text(row):
    return f"{row.get('reviews_title','') or ''} {row.get('reviews_text','') or ''}".strip()

In [39]:
# Apply the preprocessing to review text

df['text_all'] = df.apply(title_plus_text, axis=1)
df['clean'] = df['text_all'].apply(clean_and_tag)
df[['reviews_text','clean']].head(3)

Unnamed: 0,reviews_text,clean
0,i love this album. it's very good. more to the...,awesome love album good hip hop side current p...
1,Good flavor. This review was collected as part...,good good flavor review collected part promotion
2,Good flavor.,good good flavor


## 4  Split data into training and testing sets

 Train on 80 % of the earliest reviews and test on the most recent 20 %.

In [40]:
# time-based split (rows with NaT dates go to TRAIN)
df_valid = df.dropna(subset=['reviews_date']).copy()
split_ts = df_valid['reviews_date'].quantile(0.80)

train_mask = df['reviews_date'].isna() | (df['reviews_date'] <= split_ts)
test_mask  = df['reviews_date'].notna() & (df['reviews_date'] >  split_ts)

train_df = df.loc[train_mask].copy()
test_df  = df.loc[test_mask].copy()

print("train/test:", train_df.shape, test_df.shape)

train/test: (24149, 15) (5851, 15)


## 5  Feature extraction and sentiment model training

We vectorize the cleaned reviews using both **TF‑IDF** and **Bag‑of‑Words** models.  Four classification models are trained on the TF‑IDF features:

1. **Logistic Regression** (with class weights)
2. **Multinomial Naive Bayes**
3. **Random Forest**
4. **XGBoost**

We compute accuracy, F1 score and ROC‑AUC on the test set and select the best performing model.  Finally, we save the chosen model and vectorizer for deployment.

In [41]:
import os, pickle, re, string, unicodedata, numpy as np, pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, classification_report
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier

In [42]:
def make_label(frame: pd.DataFrame) -> pd.Series:
    if "user_sentiment" in frame.columns:
        s = frame["user_sentiment"].astype(str).str.strip().str.lower().map({"positive":1, "negative":0})
        if s.notna().mean() > 0.8:
            return s.fillna((frame["reviews_rating"]>=4).astype(int)).astype(int)
    return (frame["reviews_rating"]>=4).astype(int)

y_train = make_label(train_df)
y_test  = make_label(test_df)
print("pos-rate train/test:", round(y_train.mean(),3), round(y_test.mean(),3))

pos-rate train/test: 0.89 0.88


In [43]:
# TF-IDF (1–2 grams)
tfidf = TfidfVectorizer(ngram_range=(1,2), min_df=2, max_df=0.95, max_features=200_000)
Xtr = tfidf.fit_transform(train_df["clean"])
Xte = tfidf.transform(test_df["clean"])

# Four classic models
candidates = [
    ("LogReg", LogisticRegression(
        C=3.0, class_weight="balanced", max_iter=800,
        solver="saga", n_jobs=-1, random_state=42
    )),
    ("MultinomialNB", MultinomialNB()),
    ("RandomForest", RandomForestClassifier(
        n_estimators=400, n_jobs=-1, random_state=42
    )),
    ("XGB", XGBClassifier(
        n_estimators=400, max_depth=6, learning_rate=0.1,
        subsample=0.9, colsample_bytree=0.9,
        eval_metric="logloss", tree_method="hist", n_jobs=4, random_state=42
    )),
]

scores = []
for name, clf in candidates:
    clf.fit(Xtr, y_train)
    p = clf.predict_proba(Xte)[:,1] if hasattr(clf,"predict_proba") else None
    if p is None:
        raw = clf.decision_function(Xte)
        p = (raw - raw.min())/(raw.max() - raw.min() + 1e-9)
    yhat = (p >= 0.5).astype(int)
    acc = accuracy_score(y_test, yhat)
    f1  = f1_score(y_test, yhat)
    auc = roc_auc_score(y_test, p)
    print(f"{name}: Acc={acc:.3f}  F1={f1:.3f}  AUC={auc:.3f}")
    print(classification_report(y_test, yhat, digits=3))
    scores.append((auc, f1, name, clf))

# Pick by AUC then F1
scores.sort(reverse=True)
best_auc, best_f1, best_name, best_model = scores[0]
print("\nSelected →", best_name, f"(AUC={best_auc:.3f}, F1={best_f1:.3f})")

LogReg: Acc=0.887  F1=0.934  AUC=0.904
              precision    recall  f1-score   support

           0      0.523     0.671     0.588       705
           1      0.953     0.916     0.934      5146

    accuracy                          0.887      5851
   macro avg      0.738     0.794     0.761      5851
weighted avg      0.901     0.887     0.893      5851

MultinomialNB: Acc=0.880  F1=0.936  AUC=0.819
              precision    recall  f1-score   support

           0      0.000     0.000     0.000       705
           1      0.880     1.000     0.936      5146

    accuracy                          0.880      5851
   macro avg      0.440     0.500     0.468      5851
weighted avg      0.774     0.880     0.823      5851



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


RandomForest: Acc=0.886  F1=0.939  AUC=0.854
              precision    recall  f1-score   support

           0      0.776     0.074     0.135       705
           1      0.887     0.997     0.939      5146

    accuracy                          0.886      5851
   macro avg      0.832     0.535     0.537      5851
weighted avg      0.874     0.886     0.842      5851

XGB: Acc=0.906  F1=0.948  AUC=0.900
              precision    recall  f1-score   support

           0      0.710     0.374     0.490       705
           1      0.920     0.979     0.948      5146

    accuracy                          0.906      5851
   macro avg      0.815     0.677     0.719      5851
weighted avg      0.894     0.906     0.893      5851


Selected → LogReg (AUC=0.904, F1=0.934)


In [44]:
os.makedirs("deployment", exist_ok=True)

In [45]:
with open("deployment/sentiment.pkl", "wb") as f:
    pickle.dump({"vectorizer": tfidf, "model": best_model}, f)

print("\nSaved → deployment/sentiment.pkl")


Saved → deployment/sentiment.pkl


## 6  Build user‑ and item‑based recommenders

We construct a user–item matrix from the training data, using ratings as explicit feedback.  We then implement user‑based and item‑based k‑nearest neighbors collaborative filtering.  For each approach, we compute **Hit Rate** and **NDCG** at 10 on the test set, then select the better performer.  Finally, we precompute the Top‑20 recommended items for each user and save them to disk for deployment.

In [46]:
import numpy as np, pandas as pd, json, os
from scipy.sparse import csr_matrix
from sklearn.preprocessing import normalize
from collections import defaultdict

In [49]:
# 1) Ratings slice and strict sanitation for IDs/usernames
rs = df[['reviews_username','id','reviews_rating','reviews_date']].copy()
rs['reviews_rating'] = pd.to_numeric(rs['reviews_rating'], errors='coerce')

# must have rating, user, and id
rs = rs.dropna(subset=['reviews_rating','reviews_username','id']).copy()

# normalize to strings and strip
rs['reviews_username'] = rs['reviews_username'].astype(str).str.strip()
rs['id']              = rs['id'].astype(str).str.strip()

# remove empty / literal 'nan'
bad_user = rs['reviews_username'].str.len().eq(0) | rs['reviews_username'].str.lower().eq('nan')
bad_item = rs['id'].str.len().eq(0) | rs['id'].str.lower().eq('nan')
pre = len(rs)
rs = rs[~(bad_user | bad_item)].copy()
print(f"[sanitize] dropped {pre - len(rs)} rows with empty/nan user or id")

# 2) Align with time split from Step 4
rs_train = rs.loc[train_mask].copy()
rs_test  = rs.loc[test_mask].copy()
print("[split] train/test rows:", len(rs_train), len(rs_test))

# 3) Build index maps only from TRAIN (serving users/items)
users = rs_train['reviews_username'].unique().tolist()
items = rs_train['id'].unique().tolist()
uid = {u:i for i,u in enumerate(users)}
iid = {p:j for j,p in enumerate(items)}
inv_iid = {j:p for p,j in iid.items()}

print("[index] users:", len(uid), "items:", len(iid))

# 4) Map to integer indices with a safety mask
row_idx = rs_train['reviews_username'].map(uid)
col_idx = rs_train['id'].map(iid)
mask = row_idx.notna() & col_idx.notna()
dropped = int((~mask).sum())
if dropped:
    print(f"[index] dropping {dropped} train rows with unmapped indices")

rows = row_idx[mask].astype(int).to_numpy()
cols = col_idx[mask].astype(int).to_numpy()
vals = rs_train.loc[mask, 'reviews_rating'].to_numpy()

if len(rows) == 0 or len(users) == 0 or len(items) == 0:
    raise RuntimeError("Training matrix empty after filtering — check id/username integrity.")

# 5) Build sparse (user x item) matrix
R = csr_matrix((vals, (rows, cols)), shape=(len(uid), len(iid))).tocsr()
print("[matrix] shape:", R.shape, "nnz:", R.nnz)

# 6) User-kNN (cosine on normalized rows)
U = normalize(R, norm='l2', axis=1)
def rec_user_knn(u_idx, topk=20, k=50):
    if u_idx is None: return []
    sims = U[u_idx].dot(U.T).toarray().ravel()
    sims[u_idx] = 0.0
    nn = sims.argsort()[::-1][:k]
    scores = sims[nn] @ R[nn].toarray()
    seen = set(R[u_idx].indices.tolist())
    cand = np.argsort(scores)[::-1]
    return [inv_iid[i] for i in cand if i not in seen][:topk]

# 7) Item-kNN (cosine on normalized items)
I = normalize(R.T.tocsr(), norm='l2', axis=1)
sim_mat = I @ I.T
sim_lil = sim_mat.tolil()
item_knn = defaultdict(list)
TOP_NEI = 200
for j in range(I.shape[0]):
    idxs, vals = sim_lil.rows[j], sim_lil.data[j]
    neigh = sorted([(i,v) for i,v in zip(idxs, vals) if i!=j],
                   key=lambda x:x[1], reverse=True)[:TOP_NEI]
    item_knn[j] = neigh

def rec_item_knn(u_idx, topk=20):
    if u_idx is None: return []
    seen = set(R[u_idx].indices.tolist())
    if not seen: return []
    scores = defaultdict(float)
    for it in seen:
        for nb, sim in item_knn.get(it, []):
            if nb in seen: continue
            scores[nb] += sim
    cand = sorted(scores.items(), key=lambda kv: kv[1], reverse=True)
    return [inv_iid[i] for i,_ in cand][:topk]

# 8) Eval on users present in TRAIN only
def eval_rs(rec_fn, K=10):
    hits=0.0; nd=0.0; cnt=0
    test_seen = rs_test[rs_test['reviews_username'].isin(users)]
    gt = test_seen.groupby('reviews_username')['id'].apply(set)
    for u in gt.index:
        u_idx = uid.get(u)
        if u_idx is None: continue
        recs = rec_fn(u_idx, topk=K)
        truth = gt[u]
        if truth & set(recs): hits += 1
        rel = [1 if r in truth else 0 for r in recs]
        dcg  = sum(r/np.log2(i+2) for i,r in enumerate(rel))
        ideal = sorted(rel, reverse=True)
        idcg = sum(ideal[i]/np.log2(i+2) for i in range(len(ideal))) or 1.0
        nd += dcg/idcg
        cnt += 1
    return (hits/cnt if cnt else 0.0), (nd/cnt if cnt else 0.0)

hr_u, nd_u = eval_rs(rec_user_knn, 10)
hr_i, nd_i = eval_rs(rec_item_knn, 10)
print(f"User-kNN: HR@10={hr_u:.3f}  NDCG@10={nd_u:.3f}")
print(f"Item-kNN: HR@10={hr_i:.3f}  NDCG@10={nd_i:.3f}")

best_rs = 'item' if (hr_i, nd_i) >= (hr_u, nd_u) else 'user'
print("selected recommender →", best_rs)

def top20_for_user(u):
    u_idx = uid.get(u)
    if u_idx is None: return []
    return rec_item_knn(u_idx, 20) if best_rs=='item' else rec_user_knn(u_idx, 20)

user_top20 = {u: top20_for_user(u) for u in users}
json.dump(user_top20, open('deployment/user_top20.json','w'))
json.dump({'best_rs':best_rs,
           'hr10': float(hr_i if best_rs=='item' else hr_u),
           'ndcg10': float(nd_i if best_rs=='item' else nd_u)},
          open('deployment/recommender_meta.json','w'))

print("saved → deployment/user_top20.json, deployment/recommender_meta.json")
print("example:", next(iter(user_top20.items())) if user_top20 else "no users")

[sanitize] dropped 0 rows with empty/nan user or id
[split] train/test rows: 24091 5846
[index] users: 20539 items: 230
[matrix] shape: (20539, 230) nnz: 22416
User-kNN: HR@10=0.039  NDCG@10=0.022
Item-kNN: HR@10=0.139  NDCG@10=0.055
selected recommender → item
saved → deployment/user_top20.json, deployment/recommender_meta.json
example: ('joshua', ['AVpf5FF71cnluZ0-tHAV', 'AVpfUJu_ilAPnD_xZdDr', 'AVpf385g1cnluZ0-s0_t', 'AVpfNWbPilAPnD_xXPR7', 'AVpfozgyilAPnD_xfe0r', 'AVpf0eb2LJeJML43EVSt', 'AVpfDA6wilAPnD_xTxdg', 'AVpe8gsILJeJML43y6Ed', 'AVpf5olc1cnluZ0-tPrO', 'AVpf2tw1ilAPnD_xjflC', 'AVpfPaoqLJeJML435Xk9', 'AVpfazX31cnluZ0-kbdl', 'AVpfAgSp1cnluZ0-b2-K', 'AVpf0thK1cnluZ0-r8vR', 'AVpf5Z1zLJeJML43FpB-', 'AVpfRTh1ilAPnD_xYic2', 'AVpfM_ytilAPnD_xXIJb', 'AVpe5c23LJeJML43xybi', 'AVpfJP1C1cnluZ0-e3Xy', 'AVpf3VOfilAPnD_xjpun'])


## 7  Sentiment‑based refinement of the Top‑20 lists

Combine the collaborative filtering results with the sentiment classifier.  Compute a sentiment score for each product using only the training reviews (to avoid leaking test information), then rank each user’s Top‑20 list by the percentage of positive sentiments per product and keep the top five.  We save the final Top‑5 recommendations as JSON for deployment.

In [50]:
# load trained sentiment artifacts
with open('deployment/sentiment.pkl','rb') as f:
    art = pickle.load(f)
vec, clf = art['vectorizer'], art['model']

# load Top-20 from Step 6
with open('deployment/user_top20.json','r') as f:
    user_top20 = json.load(f)

# score TRAIN-period reviews only (no leakage)
train_only = df.loc[train_mask].copy()
probs = clf.predict_proba(vec.transform(train_only['clean']))[:,1]

tmp = train_only[['id']].copy()
tmp['id'] = tmp['id'].astype(str)
tmp['p'] = probs

# aggregate positivity per item + Wilson lower bound
agg = tmp.groupby('id')['p'].agg(['count','mean']).reset_index()
agg['pos'] = agg['mean'] * agg['count']

def wilson_lb(pos, n, z=1.96):
    if n <= 0: return 0.0
    phat = pos/n
    denom = 1 + z*z/n
    centre = phat + z*z/(2*n)
    margin = z*np.sqrt((phat*(1-phat) + z*z/(4*n))/n)
    return (centre - margin) / denom

agg['wilson'] = agg.apply(lambda r: wilson_lb(r['pos'], r['count']), axis=1)
item_sent = dict(zip(agg['id'], agg['wilson']))

# refine each user's Top-20 → Top-5
user_top5 = {}
for u, lst in user_top20.items():
    ranked = sorted(lst, key=lambda pid: item_sent.get(str(pid), 0.0), reverse=True)
    user_top5[u] = ranked[:5]

json.dump(user_top5, open('deployment/user_top5.json','w'))
print("saved → deployment/user_top5.json")
print("example:", next(iter(user_top5.items())))

saved → deployment/user_top5.json
example: ('joshua', ['AVpf2tw1ilAPnD_xjflC', 'AVpfJP1C1cnluZ0-e3Xy', 'AVpfRTh1ilAPnD_xYic2', 'AVpf3VOfilAPnD_xjpun', 'AVpfPaoqLJeJML435Xk9'])


## 8  Save product name and ID mappings

For ease of display in the web app, we record a mapping between product names and their unique identifiers.  This allows us to show human‑readable names in the recommendations.

In [51]:
id2name = df.drop_duplicates('id').set_index('id')['name'].astype(str).to_dict()
json.dump({'id_to_name': id2name}, open('deployment/mappings.json','w'))
print("saved → deployment/mappings.json")

# quick artifact ping
for p in ["deployment/sentiment.pkl","deployment/user_top20.json",
          "deployment/user_top5.json","deployment/mappings.json","deployment/recommender_meta.json"]:
    print(p, "OK" if (os.path.exists(p) and os.path.getsize(p)>0) else "MISSING")

saved → deployment/mappings.json
deployment/sentiment.pkl OK
deployment/user_top20.json OK
deployment/user_top5.json OK
deployment/mappings.json OK
deployment/recommender_meta.json OK


Deployed Website is available here - https://ebuss-recommender.onrender.com/