## **PROBLEM STATEMENT**

**BUSINESS PROBLEM:**

With the aim at increasing sales and enhancing customer experiences, Amazon's product recommendation system suggests products to users that are most likeliy to purchase. However, biased recommendation systems bring disrepute to one or more users or products due to unfair exposure, thereby losing revenues and damaging the brand reputation happens.

**WHY IT'S IMPORTANT:**

1. Customer Trust: Any form of bias in recommendation will bring loss of trust especially if user feels that they are being targeted or ignored unfairly.
2. Vendor Fairness: Due to algorithm bias, some products might not have an opportunity to be equally discovered on Amazon.
3. Legal & Ethical Risks: Unfair algorithms may infringe upon anti discrimination laws and ethical considerations.

**DATA COLLECTION:**

Data is been collect from kaggle platform.(https://www.kaggle.com/datasets/saurav9786/amazon-product-reviews/data)

**OBJECTIVE:**

To create a recommendation system that will tell that products recommended to user should be diversed and relevant to users. For this I am using re-ranking approach.


In [None]:
# Downgrade numpy to a compatible version
%%capture
!pip install numpy==1.24.3

In [None]:
%%capture
!pip install scikit-surprise

In [None]:
#all libraries are loaded
import pandas as pds
import numpy as npy
from surprise import Reader, Dataset, SVD
from surprise.model_selection import train_test_split
from collections import Counter
from collections import defaultdict

In [None]:
# Reload the CSV with specified column names
Amazon = pds.read_csv('/content/ratings_Electronics_(1)[2].csv')
print(Amazon.head())

    AKM1MP6P0OYPR  0132793040  5.0    1365811200
0  A2CX7LUOHB2NDG  0321732944  5.0  1.341101e+09
1  A2NWSAGRHCP8N5  0439886341  1.0  1.367194e+09
2  A2WNBOD3WNDNKT  0439886341  3.0  1.374451e+09
3  A1GI0U4ZRJA8WN  0439886341  1.0  1.334707e+09
4  A1QGNMC6O1VW39  0511189877  5.0  1.397434e+09


In [None]:
print(Amazon.columns)


Index(['AKM1MP6P0OYPR', '0132793040', '5.0', '1365811200'], dtype='object')


In [None]:
Amazon = Amazon.rename(columns={
    'AKM1MP6P0OYPR': 'userId',
    '0132793040': 'productId',
    '5.0': 'rating',
    '1365811200': 'Time_Stamp'
})
# I renamed the columns

In [None]:
# Simple collaborative filtering
reader = Reader(rating_scale=(1, 5))
Amazon_data = Dataset.load_from_df(Amazon[['userId', 'productId', 'rating']], reader)

In [None]:
# Train/test split
trainset, testset = train_test_split(Amazon_data, test_size=0.2)
algo = SVD()
algo.fit(trainset)
predictions = algo.test(testset)


# Fairness Analysis

In [None]:
# Here I was firstly Finding most recommended products
product_recommend_counts = Counter([pred.iid for pred in predictions if pred.est >= 4])
print("Top 10 recommended products:", product_recommend_counts.most_common(10))

# Coverage: % of all products recommended at least once
coverage = len(product_recommend_counts) / Amazon['productId'].nunique()
print(f"Coverage: {coverage:.2%}")


Top 10 recommended products: [('B0019EHU8G', 2480), ('B003ELYQGG', 2340), ('B003ES5ZUU', 2081), ('B0002L5R78', 1881), ('B000LRMS66', 1772), ('B003LR7ME6', 1741), ('B002MAPRYU', 1305), ('B002WE6D44', 1246), ('B0012S4APK', 1128), ('B0001FTVEK', 1091)]
Coverage: 53.33%


These are the ID's for top 10 products which are most often recommended to users.

This implies that the subsequent fairness issue belongs to this case:

1. Unfair Exposure: Existing popular products are promoted more in terms of exposing visibility, while a less popular existing product that can be just as good is remained unnoticed.

2. Business Risk: This seems that new products have lower discovery probabilities, negatively impacting small merchants and reducing catalog diversity.

3. User experiences:This means that users do not get presented with a variety of products, hence reducing discovery and recommendation.

In [None]:
# Find the top-10 most popular products
top_popular = [p for p, _ in product_recommend_counts.most_common(10)]

# For each user, collect their recommended products
user_recs = defaultdict(list)
for pred in predictions:
    if pred.est >= 4:
        user_recs[pred.uid].append((pred.iid, pred.est))

# Re-rank: limit each user's top-N recommendations to include at least some non-popular items
def rerank(recs, top_popular, n=5):
    diverse = [item for item in recs if item[0] not in top_popular]
    popular = [item for item in recs if item[0] in top_popular]
    final = (diverse[:2] + popular[:(n-2)])[:n]
    return final

# Apply re-ranking
reranked_recs = {uid: rerank(sorted(recs, key=lambda x: -x[1]), top_popular) for uid, recs in user_recs.items()}

# Calculate new coverage
all_reranked = [pid for recs in reranked_recs.values() for pid, _ in recs]
coverage_reranked = len(set(all_reranked)) / Amazon['productId'].nunique()
print(f"Coverage after re-ranking: {coverage_reranked:.2%}")


Coverage after re-ranking: 51.40%


After applying simple re-ranking method, it's been observed that the coverage changed slightly.

The next thing we are required to accomplish is to further refine the re-ranking logic order to enhance coverage and fairness.

1. 3 non-popular products and 2 popular products

In [None]:
def rerank(recs, top_popular, n=5):
    diverse = [item for item in recs if item[0] not in top_popular]
    popular = [item for item in recs if item[0] in top_popular]
    # Recommend 3 non-popular and 2 popular (adjust numbers as needed)
    final = (diverse[:3] + popular[:(n-3)])[:n]
    return final

# Apply re-ranking again
reranked_recs = {uid: rerank(sorted(recs, key=lambda x: -x[1]), top_popular) for uid, recs in user_recs.items()}

# Calculate new coverage
all_reranked = [pid for recs in reranked_recs.values() for pid, _ in recs]
coverage_reranked = len(set(all_reranked)) / Amazon['productId'].nunique()
print(f"Coverage after re-ranking (3 non-popular, 2 popular): {coverage_reranked:.2%}")


Coverage after re-ranking (3 non-popular, 2 popular): 52.34%


In this case it will increase the appearance of non-popular products within the top 5 of the user. But for every usage they will at all times be given 3 non popular and 2 popular products no matter where they are relevant with them. But again, the diversity is increased for relevance.

2. Increased Number of recommendation like (n=10)

In [None]:
def rerank(recs, top_popular, n=10):
    diverse = [item for item in recs if item[0] not in top_popular]
    popular = [item for item in recs if item[0] in top_popular]
    # For n=10, maybe 6 non-popular, 4 popular
    final = (diverse[:6] + popular[:(n-6)])[:n]
    return final

# Apply new re-ranking
reranked_recs = {uid: rerank(sorted(recs, key=lambda x: -x[1]), top_popular, n=10) for uid, recs in user_recs.items()}

# Calculate coverage
all_reranked = [pid for recs in reranked_recs.values() for pid, _ in recs]
coverage_reranked = len(set(all_reranked)) / Amazon['productId'].nunique()
print(f"Coverage after re-ranking (n=10): {coverage_reranked:.2%}")


Coverage after re-ranking (n=10): 53.02%


Coverage gets better with the suggestion of increased numbers of unique items per user. But again, in this case, either too many or few popular/non-popular items are suggested depending upon the preference of the user.

3. Full Diversity Tuning

In [None]:
def rerank(recs, top_popular, n=5):
    diverse = [item for item in recs if item[0] not in top_popular]
    final = diverse[:n]
    return final

# Apply full diversity
reranked_recs = {uid: rerank(sorted(recs, key=lambda x: -x[1]), top_popular) for uid, recs in user_recs.items()}

all_reranked = [pid for recs in reranked_recs.values() for pid, _ in recs]
coverage_reranked = len(set(all_reranked)) / Amazon['productId'].nunique()
print(f"Coverage after re-ranking (all non-popular): {coverage_reranked:.2%}")


Coverage after re-ranking (all non-popular): 52.90%


In this apprach, we find that the catalog coverage is maximized such as that a high variety of district items are observed in the recommendations. However, the satisfactionand relevance may be lower because the users mqay be getting lower rated or relatively less relevant items.

# xQuAD Re-ranking

In this method, for each slot in the top-N recommendation list, the object is chosen by optimizing the objective function:
λ * predicted relevance +(1-lambda) * diversity/novelty.

λ (lambda) is utilized to handle the trade-off:

λ=1:the model cares only about relevance(just as regular SVD, no thought regarding diversity).

λ=0: The model considers only diversity(as full diversity-based re-ranking).

0 < λ < 1: offers a flexible compromise between relevanceand diversity.

In [None]:
def xquad_rerank(user_recs, product_counts, n=5, lamb=0.7):
    rec_list = []
    candidate_items = user_recs.copy()
    recommended_items = set()
    while len(rec_list) < n and candidate_items:
        best_score = -float('inf')
        best_item = None
        for item, score in candidate_items:
            diversity = -npy.log(product_counts.get(item, 1) + 1)
            xquad_score = lamb * score + (1 - lamb) * diversity
            if xquad_score > best_score:
                best_score = xquad_score
                best_item = (item, score)
        rec_list.append(best_item)
        recommended_items.add(best_item[0])
        candidate_items.remove(best_item)
    return rec_list


In [None]:
# Collect each user's predicted (item, score)
user_recs = defaultdict(list)
for pred in predictions:
    if pred.est >= 4:
        user_recs[pred.uid].append((pred.iid, pred.est))

# Apply xQuAD re-ranking for each user
lamb = 0.7
n = 5        # Top-N per user

xquad_reranked = {
    uid: xquad_rerank(sorted(recs, key=lambda x: -x[1]), product_recommend_counts, n=n, lamb=lamb)
    for uid, recs in user_recs.items()
}

In [None]:
all_xquad = [pid for recs in xquad_reranked.values() for pid, _ in recs]
coverage_xquad = len(set(all_xquad)) / Amazon['productId'].nunique()
print(f"Coverage after xQuAD re-ranking: {coverage_xquad:.2%}")


Coverage after xQuAD re-ranking: 53.30%


In this case, for all users with a large volume of good non-popular products, the model will recommend broader lists. And for other users, it will keel the relevance in the lists intact. This approach shifts the proportion smoothly but not with a fixed pattern.

# Conclusion:

In all experiments performed in this case, we tried a variety of simple re-ranking strategies such as fixing constant ratios between popular and non-popular items and producing entirely diverse lists in a bid to improve fairness. While these tactics ensured that lesss popular products are viewed at least a little while, they did not learn what users liked and did not permit us to adjust the trade-off between relevance and diversity.

Subsequently, we toop up xQuAD re-ranking, a more formalized method that selects recommendations for each user striking a balance between relevance and diversity with the flexibility parameter lambda. This allowed us to adjust the system to find the optimal trade-ff between fairness (product discoverability) and recommendation quality for our business needs.

## Reference:

Santos, R.L.T., Macdonald, C. & Ounis, I., 2010. Exploiting query reformulations for web search result diversification. Proceedings of the 19th international conference on World Wide Web, pp.881–890.

Ekstrand, M.D., Tian, M., Azpiazu, I.M., Ekstrand, J.D., Anuyah, O., McNeill, D. & Pera, M.S., 2018. All the cool kids, how do they fit in? Popularity and demographic biases in recommender evaluation and effectiveness. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, pp.172-186.