# Problem Statement

Consumers often face multiple products with similar average ratings.However, average rating alone can be misleading. I decided to buy a whiteboard film from Amazon, but the problem is I do not know which one to buy. Few have better reviews but the sample size is low, few others are higher but the variation seems high.This notebook answers the question:
> **Which product should be chosen when ratings differ in both level and consistency?**

>**Dataset**

In [None]:
import pandas as pd
import numpy as np

data = {
    "brand": ["HAMIGAR", "AFMAT", "WARASEE", "JUMOBOARD", "NORTIX", "DUMOS"],
    "reviews": [2914, 2077, 714, 444, 486, 253],
    "width_ft": [6.5, 4, 4, 4, 4, 4],
    "height_ft": [2, 3, 3, 3, 3, 3],
    "price_usd": [27.00, 36.49, 29.00, 60.00, 35.99, 62.00],
    "p_5star": [0.71, 0.71, 0.73, 0.70, 0.71, 0.78],
    "p_4star": [0.14, 0.15, 0.10, 0.15, 0.13, 0.12],
    "p_3star": [0.06, 0.06, 0.08, 0.09, 0.06, 0.05],
    "p_2star": [0.03, 0.03, 0.03, 0.01, 0.03, 0.00],
    "p_1star": [0.06, 0.05, 0.06, 0.05, 0.07, 0.05]
}

df = pd.DataFrame(data)


## Objective 1: CALCULATE MEAN REVIEWS FOR EACH PRODUCT

If we add the sum of probability of each star rating multiplied by the star rating itself, we can get the expected star rating for each product. However, this does not take into account the number of reviews for each product. A product with a high expected star rating but only a few reviews may not be as reliable as a product with a slightly lower expected star rating but many more reviews.

In [3]:
# Compute mean star rating
df["mean_star"] = (
    5*df["p_5star"] +
    4*df["p_4star"] +
    3*df["p_3star"] +
    2*df["p_2star"] +
    1*df["p_1star"]
)

# Show result
df[["brand", "mean_star"]]


Unnamed: 0,brand,mean_star
0,HAMIGAR,4.41
1,AFMAT,4.44
2,WARASEE,4.41
3,JUMOBOARD,4.44
4,NORTIX,4.38
5,DUMOS,4.58


>**Look at the dataset to get familiar with it**

In [8]:
df #display the dataframe

Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58


## Objective 2: CALCULATE CV (satisfaction consistency) TO UNDERSTAND RELIABILITY OF MEAN REVIEWS

In [10]:
# Calculate standard deviation for each product
df['std_dev'] = np.sqrt(
    ( (5 - df['mean_star'])**2 * df['p_5star'] ) +
    ( (4 - df['mean_star'])**2 * df['p_4star'] ) +
    ( (3 - df['mean_star'])**2 * df['p_3star'] ) +
    ( (2 - df['mean_star'])**2 * df['p_2star'] ) +
    ( (1 - df['mean_star'])**2 * df['p_1star'] )
) 

In [None]:
df

Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star,std_dev
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41,1.123343
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44,1.070701
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41,1.141008
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44,1.042305
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38,1.17286
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58,0.971391


In [12]:
df["CV"] = df["std_dev"] / df["mean_star"]
df

Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star,std_dev,CV
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41,1.123343,0.254726
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44,1.070701,0.241149
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41,1.141008,0.258732
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44,1.042305,0.234753
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38,1.17286,0.267776
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58,0.971391,0.212094


Results:

From the CV results, we can see that products with lower CV values are more reliable as they have less variability in their reviews. Products with higher CV values are less reliable as they have more variability in their reviews. Therefore, we should consider both the mean star rating and the CV value when making a decision on which product to buy.

## Objective 3: CALCULATE VALUE FOR MONEY FOR EACH PRODUCT

Value= Mean Star Rating/Price

In [13]:
df["value_for_money"] = df["mean_star"] / df["price_usd"]
df

Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star,std_dev,CV,value_for_money
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41,1.123343,0.254726,0.163333
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44,1.070701,0.241149,0.121677
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41,1.141008,0.258732,0.152069
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44,1.042305,0.234753,0.074
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38,1.17286,0.267776,0.1217
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58,0.971391,0.212094,0.073871


## Objective 4: Risk indicator based on reviews and price
Risk = (CV * Price) / Mean Star Rating

In [14]:
risk_indicator = (df["CV"] * df["price_usd"]) / df["mean_star"]
df["risk_indicator"] = risk_indicator
df

Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star,std_dev,CV,value_for_money,risk_indicator
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41,1.123343,0.254726,0.163333,1.559549
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44,1.070701,0.241149,0.121677,1.981874
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41,1.141008,0.258732,0.152069,1.701413
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44,1.042305,0.234753,0.074,3.172343
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38,1.17286,0.267776,0.1217,2.200289
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58,0.971391,0.212094,0.073871,2.871142


## Objective 5: Risk indicator based low rating reviews

In [16]:
low_rating_risk = (df['p_1star'] + df['p_2star'] + df['p_3star']) / df['mean_star']
df['low_rating_risk'] = low_rating_risk
df

Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star,std_dev,CV,value_for_money,risk_indicator,low_rating_risk
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41,1.123343,0.254726,0.163333,1.559549,0.034014
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44,1.070701,0.241149,0.121677,1.981874,0.031532
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41,1.141008,0.258732,0.152069,1.701413,0.038549
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44,1.042305,0.234753,0.074,3.172343,0.033784
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38,1.17286,0.267776,0.1217,2.200289,0.03653
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58,0.971391,0.212094,0.073871,2.871142,0.021834


**To prevents dominance by very large samples, we can use logarithmic scaling for the number of reviews.**

In [17]:
df["review_weight"] = np.log(df["reviews"])
df


Unnamed: 0,brand,reviews,width_ft,height_ft,price_usd,p_5star,p_4star,p_3star,p_2star,p_1star,mean_star,std_dev,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
0,HAMIGAR,2914,6.5,2,27.0,0.71,0.14,0.06,0.03,0.06,4.41,1.123343,0.254726,0.163333,1.559549,0.034014,7.977282
1,AFMAT,2077,4.0,3,36.49,0.71,0.15,0.06,0.03,0.05,4.44,1.070701,0.241149,0.121677,1.981874,0.031532,7.63868
2,WARASEE,714,4.0,3,29.0,0.73,0.1,0.08,0.03,0.06,4.41,1.141008,0.258732,0.152069,1.701413,0.038549,6.570883
3,JUMOBOARD,444,4.0,3,60.0,0.7,0.15,0.09,0.01,0.05,4.44,1.042305,0.234753,0.074,3.172343,0.033784,6.095825
4,NORTIX,486,4.0,3,35.99,0.71,0.13,0.06,0.03,0.07,4.38,1.17286,0.267776,0.1217,2.200289,0.03653,6.186209
5,DUMOS,253,4.0,3,62.0,0.78,0.12,0.05,0.0,0.05,4.58,0.971391,0.212094,0.073871,2.871142,0.021834,5.533389


In [18]:
criteria = ["CV", "value_for_money", "risk_indicator", "low_rating_risk", "review_weight"]
benefit = ["value_for_money", "review_weight"]
cost = ["CV", "risk_indicator", "low_rating_risk"]

X = df.set_index("brand")[criteria].copy()
X


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.254726,0.163333,1.559549,0.034014,7.977282
AFMAT,0.241149,0.121677,1.981874,0.031532,7.63868
WARASEE,0.258732,0.152069,1.701413,0.038549,6.570883
JUMOBOARD,0.234753,0.074,3.172343,0.033784,6.095825
NORTIX,0.267776,0.1217,2.200289,0.03653,6.186209
DUMOS,0.212094,0.073871,2.871142,0.021834,5.533389


> **WE TRIED TO DEVELOP POSITIVE (VfM, Review_weight) AND NEGATIVE INDICATORS(CV,Risk Indicator, Low rating risk) BASED ON LOW RATING REVIEWS (1 AND 2 STAR REVIEWS). NOW ITS TIME TO COMBINE THEM TO DEVELOP A COMPOSITE RISK INDICATOR.THERE ARE BOTH SUBJECTIVE AND OBJECTIVE BASED WEIGHING INDICATORS**


 | Method                  | Weights        | Nature                    |
| ----------------------- | -------------- | ------------------------- |
| **AHP**                | Subjective     | Expert judgment           |
| **Entropy weighting**   | Objective      | Data dispersion           |
| **CRITIC**              | Objective      | Variability + correlation |
| **TOPSIS**              | Semi-objective | Distance to ideal         |
| **PCA-based weighting** | Objective      | Variance explained        |
| **CV-based scoring**    | Objective      | Statistical dispersion    |

## Objective 6: Develop AHP based composite risk indicator

**Step 1 — Build the AHP pairwise comparison matrix (criteria vs criteria)**

In [19]:
A = np.array([
    [1,   1/2, 1,   1,   2],   # CV
    [2,   1,   2,   2,   3],   # value_for_money
    [1,   1/2, 1,   2,   2],   # risk_indicator
    [1,   1/2, 1/2, 1,   2],   # low_rating_risk
    [1/2, 1/3, 1/2, 1/2, 1]    # review_weight
], dtype=float)

A_df = pd.DataFrame(A, index=criteria, columns=criteria)
A_df


Unnamed: 0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
CV,1.0,0.5,1.0,1.0,2.0
value_for_money,2.0,1.0,2.0,2.0,3.0
risk_indicator,1.0,0.5,1.0,2.0,2.0
low_rating_risk,1.0,0.5,0.5,1.0,2.0
review_weight,0.5,0.333333,0.5,0.5,1.0


**Step 2 — Compute criteria weights (eigenvector method)**

In [20]:
eigvals, eigvecs = np.linalg.eig(A)
max_idx = np.argmax(eigvals.real)

w = eigvecs[:, max_idx].real
w = w / w.sum()

weights = pd.Series(w, index=criteria, name="weight")
weights


CV                 0.182317
value_for_money    0.345528
risk_indicator     0.214120
low_rating_risk    0.161195
review_weight      0.096840
Name: weight, dtype: float64

**Step 3 — Consistency check (CI, CR)**

In [21]:
n = A.shape[0]
lambda_max = eigvals.real[max_idx]

CI = (lambda_max - n) / (n - 1)
RI = 1.12
CR = CI / RI

lambda_max, CI, CR


(np.float64(5.0685037410245295),
 np.float64(0.017125935256132374),
 np.float64(0.015291013621546761))

**Step 4 — Build the decision matrix**

In [None]:
X

Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.254726,0.163333,1.559549,0.034014,7.977282
AFMAT,0.241149,0.121677,1.981874,0.031532,7.63868
WARASEE,0.258732,0.152069,1.701413,0.038549,6.570883
JUMOBOARD,0.234753,0.074,3.172343,0.033784,6.095825
NORTIX,0.267776,0.1217,2.200289,0.03653,6.186209
DUMOS,0.212094,0.073871,2.871142,0.021834,5.533389


**Step 5 — Normalize the decision matrix (AHP performance scores)**

In [23]:
N = X.copy()

# Benefit criteria normalization
for c in benefit:
    N[c] = N[c] / N[c].sum()

# Cost criteria normalization (invert then normalize)
for c in cost:
    inv = 1 / N[c]
    N[c] = inv / inv.sum()

N


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.159282,0.231137,0.224865,0.154875,0.199421
AFMAT,0.16825,0.172189,0.176948,0.167066,0.190956
WARASEE,0.156816,0.215197,0.206116,0.136654,0.164263
JUMOBOARD,0.172834,0.104719,0.110546,0.155929,0.152387
NORTIX,0.151519,0.172221,0.159383,0.144208,0.154646
DUMOS,0.191299,0.104537,0.122142,0.241268,0.138327


**Step 6 — Create the weighted normalized table (full AHP table)**

In [24]:
W = N.mul(weights, axis=1)
W


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.02904,0.079864,0.048148,0.024965,0.019312
AFMAT,0.030675,0.059496,0.037888,0.02693,0.018492
WARASEE,0.02859,0.074357,0.044134,0.022028,0.015907
JUMOBOARD,0.031511,0.036183,0.02367,0.025135,0.014757
NORTIX,0.027625,0.059507,0.034127,0.023245,0.014976
DUMOS,0.034877,0.03612,0.026153,0.038891,0.013396


**Step 7 — Compute final AHP score and rank alternatives**

In [28]:
scores = W.sum(axis=1).sort_values(ascending=False)
scores


brand
HAMIGAR      0.201329
WARASEE      0.185016
AFMAT        0.173481
NORTIX       0.159481
DUMOS        0.149437
JUMOBOARD    0.131256
dtype: float64

In [29]:
final_ahp = pd.DataFrame({
    "AHP_score": scores,
    "Rank": range(1, len(scores)+1)
})

final_ahp


Unnamed: 0_level_0,AHP_score,Rank
brand,Unnamed: 1_level_1,Unnamed: 2_level_1
HAMIGAR,0.201329,1
WARASEE,0.185016,2
AFMAT,0.173481,3
NORTIX,0.159481,4
DUMOS,0.149437,5
JUMOBOARD,0.131256,6


In [30]:
report = pd.concat(
    {
        "Raw": X,
        "Normalized": N,
        "Weighted": W
    },
    axis=1
)

report["AHP_score"] = W.sum(axis=1)
report["Rank"] = report["AHP_score"].rank(ascending=False, method="min").astype(int)
report.sort_values(("Rank"), ascending=True)


Unnamed: 0_level_0,Raw,Raw,Raw,Raw,Raw,Normalized,Normalized,Normalized,Normalized,Normalized,Weighted,Weighted,Weighted,Weighted,Weighted,AHP_score,Rank
Unnamed: 0_level_1,CV,value_for_money,risk_indicator,low_rating_risk,review_weight,CV,value_for_money,risk_indicator,low_rating_risk,review_weight,CV,value_for_money,risk_indicator,low_rating_risk,review_weight,Unnamed: 16_level_1,Unnamed: 17_level_1
brand,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2
HAMIGAR,0.254726,0.163333,1.559549,0.034014,7.977282,0.159282,0.231137,0.224865,0.154875,0.199421,0.02904,0.079864,0.048148,0.024965,0.019312,0.201329,1
WARASEE,0.258732,0.152069,1.701413,0.038549,6.570883,0.156816,0.215197,0.206116,0.136654,0.164263,0.02859,0.074357,0.044134,0.022028,0.015907,0.185016,2
AFMAT,0.241149,0.121677,1.981874,0.031532,7.63868,0.16825,0.172189,0.176948,0.167066,0.190956,0.030675,0.059496,0.037888,0.02693,0.018492,0.173481,3
NORTIX,0.267776,0.1217,2.200289,0.03653,6.186209,0.151519,0.172221,0.159383,0.144208,0.154646,0.027625,0.059507,0.034127,0.023245,0.014976,0.159481,4
DUMOS,0.212094,0.073871,2.871142,0.021834,5.533389,0.191299,0.104537,0.122142,0.241268,0.138327,0.034877,0.03612,0.026153,0.038891,0.013396,0.149437,5
JUMOBOARD,0.234753,0.074,3.172343,0.033784,6.095825,0.172834,0.104719,0.110546,0.155929,0.152387,0.031511,0.036183,0.02367,0.025135,0.014757,0.131256,6


## Objective 7: Develop Entropy weighting based composite risk indicator

Step 1 — Build the decision matrix

In [31]:
criteria = [
    "CV",
    "value_for_money",
    "risk_indicator",
    "low_rating_risk",
    "review_weight"
]

X = df.set_index("brand")[criteria].copy()
X

Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.254726,0.163333,1.559549,0.034014,7.977282
AFMAT,0.241149,0.121677,1.981874,0.031532,7.63868
WARASEE,0.258732,0.152069,1.701413,0.038549,6.570883
JUMOBOARD,0.234753,0.074,3.172343,0.033784,6.095825
NORTIX,0.267776,0.1217,2.200289,0.03653,6.186209
DUMOS,0.212094,0.073871,2.871142,0.021834,5.533389


Step 2: Normalize the decision matrix

In [32]:
P = X / X.sum(axis=0)
P


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.173374,0.231137,0.115637,0.173325,0.199421
AFMAT,0.164133,0.172189,0.146951,0.160677,0.190956
WARASEE,0.1761,0.215197,0.126156,0.196435,0.164263
JUMOBOARD,0.15978,0.104719,0.235222,0.172154,0.152387
NORTIX,0.182256,0.172221,0.163146,0.186147,0.154646
DUMOS,0.144357,0.104537,0.212888,0.111261,0.138327


Step 3: Compute entropy for each criterion

In [33]:
n = P.shape[0]
k = 1 / np.log(n)

entropy = -k * (P * np.log(P + 1e-12)).sum(axis=0)
entropy


CV                 0.998420
value_for_money    0.975213
risk_indicator     0.981155
low_rating_risk    0.991973
review_weight      0.995331
dtype: float64

Step 4: Degree of diversification (information utility)

In [None]:
diversification = 1 - entropy
diversification

CV                 0.001580
value_for_money    0.024787
risk_indicator     0.018845
low_rating_risk    0.008027
review_weight      0.004669
dtype: float64

Step 5: Compute entropy weights (THIS is the key result)

In [35]:
entropy_weights = diversification / diversification.sum()

entropy_weights = pd.Series(entropy_weights, index=criteria, name="entropy_weight")
entropy_weights


CV                 0.027284
value_for_money    0.428043
risk_indicator     0.325425
low_rating_risk    0.138613
review_weight      0.080635
Name: entropy_weight, dtype: float64

In [38]:
Z = X.copy()

# Benefit criteria
for c in ["value_for_money", "review_weight"]:
    Z[c] = Z[c] / Z[c].max()

# Cost criteria
for c in ["CV", "risk_indicator", "low_rating_risk"]:
    Z[c] = Z[c].min() / Z[c]

Z


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.832635,1.0,1.0,0.641921,1.0
AFMAT,0.879515,0.744962,0.786906,0.692452,0.957554
WARASEE,0.819744,0.931034,0.91662,0.566401,0.823699
JUMOBOARD,0.903476,0.453061,0.491608,0.646288,0.764148
NORTIX,0.792057,0.745105,0.708793,0.597707,0.775478
DUMOS,1.0,0.452271,0.543181,1.0,0.693643


In [39]:
entropy_weights = pd.Series(
    {
        "CV": 0.027284,
        "value_for_money": 0.428043,
        "risk_indicator": 0.325425,
        "low_rating_risk": 0.138613,
        "review_weight": 0.080635
    }
)

entropy_score = (Z * entropy_weights).sum(axis=1)
entropy_score.sort_values(ascending=False)


brand
HAMIGAR      0.945799
WARASEE      0.864109
AFMAT        0.772147
NORTIX       0.716587
DUMOS        0.592185
JUMOBOARD    0.529763
dtype: float64

## Objective 8: Finding score with TOPSIS method

In [None]:
Step 1 — Build the decision matrix

In [40]:
import numpy as np
import pandas as pd

criteria = ["CV","value_for_money","risk_indicator","low_rating_risk","review_weight"]
cost = ["CV","risk_indicator","low_rating_risk"]
benefit = ["value_for_money","review_weight"]

X = df.set_index("brand")[criteria].astype(float)
X


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.254726,0.163333,1.559549,0.034014,7.977282
AFMAT,0.241149,0.121677,1.981874,0.031532,7.63868
WARASEE,0.258732,0.152069,1.701413,0.038549,6.570883
JUMOBOARD,0.234753,0.074,3.172343,0.033784,6.095825
NORTIX,0.267776,0.1217,2.200289,0.03653,6.186209
DUMOS,0.212094,0.073871,2.871142,0.021834,5.533389


Step 2: Vector normalization of the decision matrix

In [41]:
denom = np.sqrt((X**2).sum(axis=0))
R = X / denom
R


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.4235,0.543383,0.273971,0.419007,0.484398
AFMAT,0.400926,0.4048,0.348162,0.388431,0.463837
WARASEE,0.430159,0.505908,0.298892,0.474875,0.398998
JUMOBOARD,0.390293,0.246186,0.557295,0.416176,0.370152
NORTIX,0.445196,0.404877,0.386532,0.450002,0.37564
DUMOS,0.352621,0.245756,0.504382,0.26897,0.336


Step 3 — Apply weights to get weighted matrix V


In [42]:
w = pd.Series({
    "CV": 0.027284,
    "value_for_money": 0.428043,
    "risk_indicator": 0.325425,
    "low_rating_risk": 0.138613,
    "review_weight": 0.080635
})

V = R.mul(w, axis=1)
V


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,0.011555,0.232591,0.089157,0.05808,0.039059
AFMAT,0.010939,0.173272,0.113301,0.053842,0.037402
WARASEE,0.011736,0.21655,0.097267,0.065824,0.032173
JUMOBOARD,0.010649,0.105378,0.181358,0.057687,0.029847
NORTIX,0.012147,0.173305,0.125787,0.062376,0.03029
DUMOS,0.009621,0.105194,0.164139,0.037283,0.027093


Step 4 — Determine Ideal Best 𝐴+ and Ideal Worst 𝐴−

In [43]:
A_plus = pd.Series(index=criteria, dtype=float)
A_minus = pd.Series(index=criteria, dtype=float)

for c in criteria:
    if c in benefit:
        A_plus[c] = V[c].max()
        A_minus[c] = V[c].min()
    else:  # cost
        A_plus[c] = V[c].min()
        A_minus[c] = V[c].max()

A_plus, A_minus


(CV                 0.009621
 value_for_money    0.232591
 risk_indicator     0.089157
 low_rating_risk    0.037283
 review_weight      0.039059
 dtype: float64,
 CV                 0.012147
 value_for_money    0.105194
 risk_indicator     0.181358
 low_rating_risk    0.065824
 review_weight      0.027093
 dtype: float64)

Step 5 — Compute distances to 𝐴+ and 𝐴−

In [44]:
D_plus = np.sqrt(((V - A_plus)**2).sum(axis=1))
D_minus = np.sqrt(((V - A_minus)**2).sum(axis=1))
D_plus, D_minus

(brand
 HAMIGAR      0.020887
 AFMAT        0.066185
 WARASEE      0.034490
 JUMOBOARD    0.158702
 NORTIX       0.074630
 DUMOS        0.148309
 dtype: float64,
 brand
 HAMIGAR      0.157907
 AFMAT        0.097558
 WARASEE      0.139633
 JUMOBOARD    0.008721
 NORTIX       0.088030
 DUMOS        0.033429
 dtype: float64)

Step 6 — Closeness coefficient 𝐶𝑖 and ranking

In [45]:
C = D_minus / (D_plus + D_minus)

topsis_result = pd.DataFrame({
    "D_plus": D_plus,
    "D_minus": D_minus,
    "Closeness_C": C
}).sort_values("Closeness_C", ascending=False)

topsis_result["Rank"] = range(1, len(topsis_result)+1)
topsis_result


Unnamed: 0_level_0,D_plus,D_minus,Closeness_C,Rank
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HAMIGAR,0.020887,0.157907,0.883179,1
WARASEE,0.03449,0.139633,0.801921,2
AFMAT,0.066185,0.097558,0.595801,3
NORTIX,0.07463,0.08803,0.541191,4
DUMOS,0.148309,0.033429,0.18394,5
JUMOBOARD,0.158702,0.008721,0.052092,6


## Objective 9: PCA based composite risk indicator

Step 1 — Prepare the PCA matrix

In [46]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np

criteria = [
    "CV",
    "value_for_money",
    "risk_indicator",
    "low_rating_risk",
    "review_weight"
]

cost = ["CV", "risk_indicator", "low_rating_risk"]
benefit = ["value_for_money", "review_weight"]

X = df.set_index("brand")[criteria].copy()

# Invert cost criteria
for c in cost:
    X[c] = 1 / X[c]

X


Unnamed: 0_level_0,CV,value_for_money,risk_indicator,low_rating_risk,review_weight
brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
HAMIGAR,3.925781,0.163333,0.641211,29.4,7.977282
AFMAT,4.146817,0.121677,0.504573,31.714286,7.63868
WARASEE,3.865002,0.152069,0.587747,25.941176,6.570883
JUMOBOARD,4.259789,0.074,0.315224,29.6,6.095825
NORTIX,3.734462,0.1217,0.454486,27.375,6.186209
DUMOS,4.714889,0.073871,0.348293,45.8,5.533389


Step 2 — Standardize the data

In [47]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


Step 3 — Run PCA

In [48]:
pca = PCA()
PC = pca.fit_transform(X_scaled)

explained_var = pca.explained_variance_ratio_
explained_var


array([7.55373274e-01, 1.73772197e-01, 6.09590481e-02, 9.86575841e-03,
       2.97223781e-05])

Step 4: Decide number of principal components to retain

In [49]:
np.cumsum(explained_var)


array([0.75537327, 0.92914547, 0.99010452, 0.99997028, 1.        ])

Step 5 — Examine PCA loadings (very important)

In [50]:
loadings = pd.DataFrame(
    pca.components_.T,
    index=criteria,
    columns=[f"PC{i+1}" for i in range(len(criteria))]
)

loadings


Unnamed: 0,PC1,PC2,PC3,PC4,PC5
CV,-0.45032,0.484401,0.121239,0.722367,0.161418
value_for_money,0.492233,0.193675,-0.412618,0.081527,0.737087
risk_indicator,0.471898,0.354974,-0.387884,0.267641,-0.655149
low_rating_risk,-0.410384,0.597249,-0.345935,-0.595899,-0.010616
review_weight,0.404743,0.495109,0.738189,-0.211691,0.036266


In [51]:
# Keep first K PCs (e.g., first 2 or 3)
K = 2

PC_scores = PC[:, :K]
weights = explained_var[:K]

pca_score = PC_scores @ weights

pca_result = pd.DataFrame(
    {
        "PCA_score": pca_score
    },
    index=X.index
).sort_values("PCA_score", ascending=False)

pca_result


Unnamed: 0_level_0,PCA_score
brand,Unnamed: 1_level_1
HAMIGAR,1.931608
WARASEE,1.130419
AFMAT,0.555588
NORTIX,0.184873
JUMOBOARD,-1.398066
DUMOS,-2.404422


In [52]:
pca_result["PCA_score_norm"] = (
    pca_result["PCA_score"] - pca_result["PCA_score"].min()
) / (
    pca_result["PCA_score"].max() - pca_result["PCA_score"].min()
)

pca_result


Unnamed: 0_level_0,PCA_score,PCA_score_norm
brand,Unnamed: 1_level_1,Unnamed: 2_level_1
HAMIGAR,1.931608,1.0
WARASEE,1.130419,0.815225
AFMAT,0.555588,0.682655
NORTIX,0.184873,0.597158
JUMOBOARD,-1.398066,0.232092
DUMOS,-2.404422,0.0


## Objective 10: CV based composite risk indicator

In [53]:
# CV-based scoring

df_cv = df.set_index("brand").copy()

# Benefit term
df_cv["benefit"] = (
    df_cv["value_for_money"] *
    df_cv["review_weight"]
)

# Risk term
df_cv["risk"] = (
    df_cv["CV"] *
    df_cv["risk_indicator"] *
    df_cv["low_rating_risk"]
)

# CV-based score
df_cv["CV_score"] = df_cv["benefit"] / df_cv["risk"]

# Normalize scores (optional but recommended)
df_cv["CV_score_norm"] = (
    df_cv["CV_score"] - df_cv["CV_score"].min()
) / (
    df_cv["CV_score"].max() - df_cv["CV_score"].min()
)

# Ranking
cv_ranking = df_cv[["CV_score", "CV_score_norm"]].sort_values(
    "CV_score", ascending=False
)

cv_ranking


Unnamed: 0_level_0,CV_score,CV_score_norm
brand,Unnamed: 1_level_1,Unnamed: 2_level_1
HAMIGAR,96.428223,1.0
AFMAT,61.676713,0.557299
WARASEE,58.88357,0.521717
NORTIX,34.979957,0.217209
DUMOS,30.743076,0.163235
JUMOBOARD,17.929319,0.0
