In [1]:
import pandas as pd
import numpy as np

In [2]:
import pandas as pd

# Correct way to read CSV file
df = pd.read_csv("adidas_data_15-09-2023.csv", encoding="ISO-8859-1")

df


Unnamed: 0.1,Unnamed: 0,Title,Price,ColoursAvailable,ReviewTopic,Reviews,UserID,Date,VerifiedPurchaser,IncentivizedReview
0,0,Samba OG Shoes,100,6,Nice quality shoes,They match practically with any outfit that I...,abdubs35,"September 13, 2023",False,False
1,1,Samba OG Shoes,100,6,Nice shoes,"Very nice shoes , just not as green as the pic...",THEMAN,"September 13, 2023",False,True
2,2,Samba OG Shoes,100,6,Buy the shoes!,"Bang on trend, comfy and cool. Would recommend...",Huggsy,"September 12, 2023",True,True
3,3,Samba OG Shoes,100,6,ALL GOODS SOLID! WORTH THE PRICE!,THAT WAS DOPE! medyo mahaba lng ng very little...,TOYOTABOY,"September 12, 2023",False,False
4,4,Samba OG Shoes,100,6,Never out of style,Love how light on your feet they are and comfo...,SangeBo,"September 12, 2023",False,False
...,...,...,...,...,...,...,...,...,...,...
1101,127,Adifom Stan Smith Mule Shoes,70,4,I love these they are so comfortable and stylish.,They are good to wear if you are a nurse and w...,,,False,False
1102,128,Adifom Stan Smith Mule Shoes,70,4,Very Very Comfortable love the color,Goes well with what I had on had me feeling my...,,,False,False
1103,129,Adifom Stan Smith Mule Shoes,70,4,You feel like your walking on air.,"Just love my new adidas gear, never fails me t...",,,False,False
1104,130,Adifom Stan Smith Mule Shoes,70,4,I absolutely love them,"Perfect color! Nice, comfortable shoe. I wasnâ...",,,False,False


In [3]:
df["Title"].value_counts()

Title
NMD_R1 Shoes                    370
Samba OG Shoes                  243
Samba Classic                   163
Nizza Platform Shoes            153
Adifom Stan Smith Mule Shoes     90
Campus 00s Shoes                 64
NMD_S1 Shoes                     19
Samba ADV Shoes                   3
Gazelle Indoor Shoes              1
Name: count, dtype: int64

In [4]:
df["Title"].nunique()

9

In [5]:
from textblob import TextBlob

# Function to classify sentiment
def get_sentiment(text):
    if not isinstance(text, str) or text.strip() == "":
        return "Neutral"
    polarity = TextBlob(text).sentiment.polarity
    if polarity > 0.1:
        return "Positive"
    elif polarity < 0.1:
        return "Negative"
    else:
        return "Neutral"

# Apply to the verified_reviews column
df['Sentiment'] = df['Reviews'].apply(get_sentiment)

# Count sentiment labels
print(df['Sentiment'].value_counts())



Sentiment
Positive    889
Negative    211
Neutral       6
Name: count, dtype: int64


In [6]:
import re
from sklearn.feature_extraction import text

stop_words = text.ENGLISH_STOP_WORDS

# Text cleaning function
def clean_text(text_input):
    text_input = str(text_input).lower()
    text_input = re.sub(r"[^a-z\s]", "", text_input)
    words = text_input.split()
    words = [word for word in words if word not in stop_words]
    return ' '.join(words)

# Apply to Review column
df['Cleaned_Review'] = df['Reviews'].apply(clean_text)

# Preview cleaned text
print(df[['Reviews', 'Cleaned_Review']].head())


                                             Reviews  \
0  They match practically with  any outfit that I...   
1  Very nice shoes , just not as green as the pic...   
2  Bang on trend, comfy and cool. Would recommend...   
3  THAT WAS DOPE! medyo mahaba lng ng very little...   
4  Love how light on your feet they are and comfo...   

                                      Cleaned_Review  
0                      match practically outfit wear  
1                      nice shoes just green picture  
2                    bang trend comfy cool recommend  
3  dope medyo mahaba lng ng little lang naman kas...  
4  love light feet comfortable ive samba wearer y...  


In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Filter positive reviews
positive_reviews = df[df['Sentiment'] == 'Positive']['Cleaned_Review']

# Apply TF-IDF
vectorizer_pos = TfidfVectorizer(max_features=20)
tfidf_matrix_pos = vectorizer_pos.fit_transform(positive_reviews)

# Get top terms
tfidf_df_pos = pd.DataFrame(tfidf_matrix_pos.toarray(), columns=vectorizer_pos.get_feature_names_out())
average_tfidf_pos = tfidf_df_pos.mean().sort_values(ascending=False).reset_index()
average_tfidf_pos.columns = ['Term', 'TFIDF_Score']

print("Top Positive Keywords:")
print(average_tfidf_pos)


Top Positive Keywords:
           Term  TFIDF_Score
0   comfortable     0.166916
1          love     0.136554
2         shoes     0.124242
3         great     0.122281
4          shoe     0.088949
5          good     0.078476
6          size     0.066150
7          wear     0.064002
8         color     0.063253
9          like     0.063248
10          fit     0.061671
11      perfect     0.061627
12        comfy     0.055212
13      stylish     0.051238
14        super     0.050036
15         nice     0.049999
16         look     0.049605
17      quality     0.043959
18       really     0.039762
19         pair     0.038493


### 🔍 Top Positive Keywords Analysis

Using TF-IDF on positive reviews, we identified the top 20 keywords mentioned most frequently and strongly. These terms give insight into what customers are praising about the product.

#### 👟 Comfort & Fit
- **comfortable**, **comfy**, **fit**, **wear**, **size**, **perfect**  
  → Customers appreciate that the shoes are comfortable to wear, fit well, and are true to size.

#### 💬 Overall Satisfaction
- **love**, **great**, **good**, **nice**, **really**, **super**, **like**  
  → General expressions of high satisfaction, reflecting a positive overall experience.

#### 🎨 Style & Appearance
- **stylish**, **look**, **color**  
  → Highlights appreciation for the product’s design, style, and visual appeal.

#### 🧵 Product Quality
- **quality**  
  → Indicates that customers find the shoes to be well-made and durable.

#### 👞 Product Mentions
- **shoe**, **shoes**, **pair**  
  → Reinforces that reviewers are speaking directly about the product itself, often with praise.

---

**🧠 Summary:**  
Positive reviews frequently highlight aspects like **comfort**, **fit**, **style**, and **overall quality**, showing that these are the key drivers of customer satisfaction.


In [8]:
# Filter negative reviews
negative_reviews = df[df['Sentiment'] == 'Negative']['Cleaned_Review']

# Apply TF-IDF
vectorizer_neg = TfidfVectorizer(max_features=20)
tfidf_matrix_neg = vectorizer_neg.fit_transform(negative_reviews)

# Get top terms
tfidf_df_neg = pd.DataFrame(tfidf_matrix_neg.toarray(), columns=vectorizer_neg.get_feature_names_out())
average_tfidf_neg = tfidf_df_neg.mean().sort_values(ascending=False).reset_index()
average_tfidf_neg.columns = ['Term', 'TFIDF_Score']

print("Top Negative Keywords:")
print(average_tfidf_neg)


Top Negative Keywords:
           Term  TFIDF_Score
0          size     0.131642
1          shoe     0.116578
2         shoes     0.103346
3          wear     0.094960
4   comfortable     0.074568
5          like     0.074476
6     recommend     0.065583
7          feet     0.061488
8           big     0.057032
9          pair     0.051667
10      wearing     0.051636
11        color     0.047243
12       adidas     0.046923
13       little     0.045697
14         love     0.045057
15       narrow     0.041912
16         hard     0.041639
17       return     0.038361
18          fit     0.031691
19         foot     0.027372


### 🧨 Top Negative Keywords Analysis

Using TF-IDF on negative reviews, we identified the top 20 keywords most frequently and strongly associated with customer complaints. Here's what the feedback suggests:

#### 👟 Sizing & Fit Issues
- **size**, **big**, **narrow**, **fit**, **foot**, **feet**, **little**  
  → Indicates shoes are often **too large, too narrow, or inconsistent in sizing**, leading to poor fit.

#### 😣 Comfort Problems
- **comfortable**, **wear**, **wearing**, **hard**  
  → Highlights issues with **comfort during use**, suggesting shoes may be painful or stiff.

#### ❌ Dissatisfaction & Return Intent
- **return**, **recommend**, **like**, **love**  
  → Used in a negative context (e.g., "would not recommend", "don’t love"), suggesting **frustration or regret** about the purchase.

#### 🎨 Appearance or Brand Mentions
- **color**, **adidas**, **pair**, **shoe**, **shoes**  
  → Mentions of the product itself likely refer to **design flaws, brand mismatch, or visual dissatisfaction**.

---

**🧠 Summary:**  
Negative reviews commonly cite **fit problems, discomfort, and dissatisfaction**, often leading to returns or negative brand impressions.


In [9]:
# Count sentiment by model
product_sentiment = df.groupby(['Title', 'Sentiment']).size().unstack(fill_value=0)
product_sentiment['Total'] = product_sentiment.sum(axis=1)
product_sentiment['% Positive'] = round((product_sentiment['Positive'] / product_sentiment['Total']) * 100, 2)
product_sentiment['% Negative'] = round((product_sentiment['Negative'] / product_sentiment['Total']) * 100, 2)

print("Product-level Sentiment Summary:")
print(product_sentiment.sort_values('% Positive', ascending=False))


Product-level Sentiment Summary:
Sentiment                     Negative  Neutral  Positive  Total  % Positive  \
Title                                                                          
Samba OG Shoes                      35        1       207    243       85.19   
Campus 00s Shoes                    10        0        54     64       84.38   
Nizza Platform Shoes                25        1       127    153       83.01   
Samba Classic                       33        0       130    163       79.75   
NMD_R1 Shoes                        80        2       288    370       77.84   
Adifom Stan Smith Mule Shoes        21        1        68     90       75.56   
NMD_S1 Shoes                         5        0        14     19       73.68   
Samba ADV Shoes                      2        0         1      3       33.33   
Gazelle Indoor Shoes                 0        1         0      1        0.00   

Sentiment                     % Negative  
Title                                     


### 📊 Product-Level Sentiment Analysis

Based on the sentiment distribution per product, we can identify which products are well-received and which ones are commonly criticized.

---

### ✅ Products with Strong Positive Sentiment

These models have a high percentage of positive reviews and low negative sentiment, indicating overall customer satisfaction.

| **Product**                 | **% Positive** | **Highlights from Positive Keywords**             |
|----------------------------|----------------|---------------------------------------------------|
| Samba OG Shoes             | 85.19%         | Comfortable, stylish, perfect fit                |
| Campus 00s Shoes           | 84.38%         | Good quality, comfy, nice design                 |
| Nizza Platform Shoes       | 83.01%         | Great look, stylish, comfy                       |
| Samba Classic              | 79.75%         | Comfortable fit, classic look                   |
| NMD_R1 Shoes               | 77.84%         | Lightweight, stylish, comfortable               |

➡️ These products are frequently praised for their **comfort**, **fit**, and **style**.

---

### 🚨 Products with Notable Negative Feedback

These products have a higher percentage of negative reviews, indicating issues that may need attention.

| **Product**                    | **% Negative** | **Common Complaints (Negative Keywords)**         |
|-------------------------------|----------------|---------------------------------------------------|
| Samba ADV Shoes               | 66.67%         | Poor fit, too narrow                             |
| NMD_S1 Shoes                  | 26.32%         | Hard to wear, sizing issues                      |
| Adifom Stan Smith Mule Shoes | 23.33%         | Uncomfortable, return issues                     |
| NMD_R1 Shoes                  | 21.62%         | Fit problems, stiff design                       |
| Samba Classic                | 20.25%         | Tight fit, discomfort                            |

➡️ Negative reviews commonly mention **fit issues**, **lack of comfort**, and **return dissatisfaction**.

---

### 🧠 Summary:

- **Highly rated products** are favored for their **comfort, good fit, and stylish design**.
- **Low-rated products** often face criticism due to **sizing problems**, **discomfort**, or **product mismatch**.
- These insights can guide **product improvements**, **marketing messages**, or even **personalized recommendations** (e.g., "runs narrow — consider sizing up").


In [10]:
# ✅ Total positive and negative sentiments overall
total_positive = product_sentiment['Positive'].sum()
total_negative = product_sentiment['Negative'].sum()

# ✅ Get top 5 most positively-reviewed models
top_positive_models = product_sentiment.sort_values(by='Positive', ascending=False).head(2)
top_positive_contribution = (top_positive_models['Positive'].sum() / total_positive) * 100

# ✅ Get top 5 most negatively-reviewed models
top_negative_models = product_sentiment.sort_values(by='Negative', ascending=False).head(2)
top_negative_contribution = (top_negative_models['Negative'].sum() / total_negative) * 100

# ✅ Print results
print("🔹 Top 2 Variants Contribution to Total Positive Sentiment: {:.2f}%".format(top_positive_contribution))
print("🔸 Top 2 Variants Contribution to Total Negative Sentiment: {:.2f}%".format(top_negative_contribution))


🔹 Top 2 Variants Contribution to Total Positive Sentiment: 55.68%
🔸 Top 2 Variants Contribution to Total Negative Sentiment: 54.50%


In [11]:
def get_pros_cons_by_product(group_column, top_n=10):
    from sklearn.feature_extraction.text import TfidfVectorizer

    unique_products = df[group_column].dropna().unique()

    for product in unique_products:
        print(f"\n Product: {product}")
        sub_df = df[df[group_column] == product]

        # POSITIVE
        pos_reviews = sub_df[sub_df['Sentiment'] == 'Positive']['Cleaned_Review']
        if not pos_reviews.empty:
            tfidf_pos = TfidfVectorizer(max_features=top_n)
            pos_matrix = tfidf_pos.fit_transform(pos_reviews)
            pos_scores = pos_matrix.mean(axis=0).A1
            pos_terms = tfidf_pos.get_feature_names_out()
            top_pos = sorted(zip(pos_terms, pos_scores), key=lambda x: -x[1])
            print("👍 Pros:", [term for term, score in top_pos])
        else:
            print("👍 Pros: No positive reviews")

        # NEGATIVE
        neg_reviews = sub_df[sub_df['Sentiment'] == 'Negative']['Cleaned_Review']
        if not neg_reviews.empty:
            tfidf_neg = TfidfVectorizer(max_features=top_n)
            neg_matrix = tfidf_neg.fit_transform(neg_reviews)
            neg_scores = neg_matrix.mean(axis=0).A1
            neg_terms = tfidf_neg.get_feature_names_out()
            top_neg = sorted(zip(neg_terms, neg_scores), key=lambda x: -x[1])
            print("👎 Cons:", [term for term, score in top_neg])
        else:
            print("👎 Cons: No negative reviews")

# Run the function
get_pros_cons_by_product('Title', top_n=10)



 Product: Samba OG Shoes
👍 Pros: ['comfortable', 'great', 'love', 'shoes', 'size', 'shoe', 'comfy', 'wear', 'fit', 'good']
👎 Cons: ['size', 'shoe', 'little', 'comfortable', 'love', 'shoes', 'style', 'look', 'goes', 'amp']

 Product: Campus 00s Shoes
👍 Pros: ['comfortable', 'great', 'shoes', 'good', 'love', 'price', 'shoe', 'recommend', 'quality', 'really']
👎 Cons: ['big', 'shoes', 'sneakers', 'size', 'shoe', 'look', 'point', 'stopped', 'wide', 'son']

 Product: Gazelle Indoor Shoes
👍 Pros: No positive reviews
👎 Cons: No negative reviews

 Product: Samba ADV Shoes
👍 Pros: ['like', 'gives', 'irritate', 'isnt', 'leather', 'long', 'look', 'love', 'overall', 'really']
👎 Cons: ['bought', 'staple', 'lot', 'offers', 'shoe', 'style', 'terms', 'thinking', 'variety', 'wearing']

 Product: NMD_R1 Shoes
👍 Pros: ['comfortable', 'love', 'shoes', 'great', 'color', 'like', 'shoe', 'super', 'fit', 'good']
👎 Cons: ['shoes', 'shoe', 'hard', 'size', 'feet', 'wear', 'adidas', 'big', 'like', 'hours']

 Prod

### 📝 Product-wise Pros, Cons, and Insights (Narrative Summary)

---

#### **Samba OG Shoes**
- **Pros:** Customers frequently mention the shoes are *comfortable*, *great*, and *fit well*. Terms like *comfy* and *love* suggest high satisfaction with wearability and style.
- **Cons:** Some complaints center around *size issues*, and a few users mention the shoes feel a *little uncomfortable* or didn't *look as expected*.
- **Insight:** While customers largely love the design and comfort, there may be occasional fit inconsistencies or expectations mismatch.

---

#### 📱 **Campus 00s Shoes**
- **Pros:** Highly appreciated for being *comfortable*, *great value for money*, and *good quality*. Words like *recommend* and *really* reinforce satisfaction.
- **Cons:** Complaints revolve around *size being too big* and issues like *wide fit* or *design not meeting expectations*.
- **Insight:** The shoe is valued for quality and comfort, but size-related concerns suggest a potential need for fit guidance.

---

#### 📱 **Gazelle Indoor Shoes**
- **Pros:** No positive reviews available.
- **Cons:** No negative reviews available.
- **Insight:** The product may have limited data, so further reviews are needed to draw conclusions.

---

#### 📱 **Samba ADV Shoes**
- **Pros:** Words like *look*, *love*, and *overall* suggest customers liked the *aesthetic* and *leather material*.
- **Cons:** Some mentioned issues with *wearing comfort*, *style expectations*, and general *lack of versatility*.
- **Insight:** Customers like the appearance but may be disappointed by comfort or styling in real use.

---

#### 📱 **NMD_R1 Shoes**
- **Pros:** Strongly praised for *comfort*, *color*, and *style*. Words like *super* and *fit* suggest overall satisfaction.
- **Cons:** Complaints relate to *hardness*, *sizing*, and *discomfort during extended wear*.
- **Insight:** While stylish and comfortable for many, a few users experience issues during prolonged use — especially with sizing and sole stiffness.

---

#### 📱 **Samba Classic**
- **Pros:** Appreciated for being *good*, *classic*, and *comfortable*. Keywords like *nice* and *quality* suggest solid construction.
- **Cons:** Criticism includes *fit*, *wear comfort*, and mixed feedback on *recommendations*.
- **Insight:** A well-regarded classic, but some users find it less comfortable or hard to wear over time.

---

#### 📱 **NMD_S1 Shoes**
- **Pros:** Customers highlight *comfort*, *color options*, and *style*. Terms like *feel* and *recommend* show positive sentiment.
- **Cons:** Negatives include *pain*, *foot discomfort*, and *lack of support*.
- **Insight:** While visually appealing and trendy, the shoe may not suit users seeking high support or prolonged wear.

---

#### 📱 **Nizza Platform Shoes**
- **Pros:** Loved for being *cute*, *stylish*, and *perfect for outfits*. Words like *super* and *look* indicate satisfaction with aesthetics.
- **Cons:** Common issues include *size*, *fit*, and some users mentioning they had to *return* the product.
- **Insight:** Highly fashionable and well-loved for design, but may run small — making size guidance important.

---

#### 📱 **Adifom Stan Smith Mule Shoes**
- **Pros:** Noted for *comfort*, *ease of wear*, and being *perfect for casual use*. Keywords like *super* and *fit* indicate positive experiences.
- **Cons:** Issues include *walking discomfort*, *size*, and terms like *need* and *feet* hinting at practical limitations.
- **Insight:** Great for casual wear, but may not suit long walks or users with specific foot support needs.
