### Prompts for review analysis
- Overall Sentiment:Identify the most common positive and negative themes
- Product-Specific Sentiment:Determine the sentiment distribution for each PRODUCT_ROLL_UP_NAME category.
    Highlight categories with the most negative or positive sentiment

- Pricing Group Sentiment: Compare customer sentiments for products grouped by pricing group and Identify pricing groups that receive the best and worst feedback
- Value Perception:Analyze reviews to understand if customers feel the products are worth their price across different categories

- Compare sentiment for categories such as Core, Marketing and Seasonal. Highlight differences in customer satisfaction core vs seasonal
- Identify feedback related to marketing strategies from 'Marketing' reviews. Suggest improvements.

- ### Identify mentions of specific hair issues in reviews and how customers perceive the effectiveness of Olaplex products in addressing them

- Analyze reviews for each PRODUCT_ROLL_UP_NAME category and highlight features customers frequently mention

- Identify the top-performing and least-performing categories in terms of customer sentiment. Suggest areas of improvement


- Analyze reviews with low ratings (e.g., 1 or 2 stars) to identify common complaints-- done

- Based on negative reviews, identify actionable areas for product improvement and suggest enhancements
- ** Identify products or categories frequently praised in reviews and suggest strategies for upselling

- ** Extract mentions of competitors in reviews and analyze how customers compare Olaplex with similar brands
- ** Analyze reviews for mentions of brand loyalty and reasons why customers stick with or switch from Olaplex

- ### Group reviews with similar themes and summarize common topics discussed by customers
- Identify patterns in how customers use Olaplex products (e.g., frequency, combinations with other products) based on reviews


In [1]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df=pd.read_csv('Combined_customer_reviews.csv')
df.head()

Unnamed: 0,PRODUCT_ID,REVIEW_LASTMODIFICATIONTIME,REVIEW_RATING,REVIEW_REVIEWTEXT,PRODUCT_PRICING_GROUP_NAME,PRODUCT_GROUP_NAME,PRODUCT_ROLL_UP_NAME,Is_English,Theme,Sentiment
0,172639621,00:49.3,5,"Sorry, I can't assist with that.",,,,1,,
1,172639621,15:28.2,5,"Sorry, I can't assist with that.",,,,1,,
2,172639621,30:38.7,5,Excellent shampoo leaves hair super soft and y...,,,,1,,
3,172639621,00:49.3,5,"Sorry, I can't assist with that.",,,,1,,
4,172639621,04:38.4,5,"Lovely, highly concentrated shampoo. It has a ...",,,,1,,


In [3]:
df.columns

Index(['PRODUCT_ID', 'REVIEW_LASTMODIFICATIONTIME', 'REVIEW_RATING',
       'REVIEW_REVIEWTEXT', 'PRODUCT_PRICING_GROUP_NAME', 'PRODUCT_GROUP_NAME',
       'PRODUCT_ROLL_UP_NAME', 'Is_English', 'Theme', 'Sentiment'],
      dtype='object')

In [14]:
import pandas as pd

def extract_product_effectiveness_positive(row):
    # Handle missing values safely
    theme_str = str(row['Theme']).lower() if pd.notna(row['Theme']) else ''
    sentiment_str = str(row['Sentiment']).lower() if pd.notna(row['Sentiment']) else ''

    themes = [t.strip() for t in theme_str.split(',')]
    sentiments = [s.strip() for s in sentiment_str.split(',')]

    if 'product effectiveness' in themes:
        indices = [i for i, t in enumerate(themes) if t == 'product effectiveness']
        
        # Ensure the index exists in sentiments before checking
        if any(i < len(sentiments) and sentiments[i] == 'positive' for i in indices):
            return True
    return False

# 🔹 **Filter DataFrame for positive sentiment about product effectiveness (includes all ratings 1-5)**
filtered_PP_df = df[df.apply(extract_product_effectiveness_positive, axis=1)]

# 🔹 **Keep only relevant columns**
filtered_PP_df = filtered_PP_df[['REVIEW_REVIEWTEXT','REVIEW_RATING','Theme','Sentiment']]
filtered_PP_df['Theme'] = 'product effectiveness'  # Keep only relevant theme
filtered_PP_df['Sentiment'] = 'positive'  # Keep only relevant sentiment

# Display the filtered reviews count
filtered_PP_df.head(10)

Unnamed: 0,REVIEW_REVIEWTEXT,REVIEW_RATING,Theme,Sentiment
832,I love the results make hair feel so good and ...,4,product effectiveness,positive
833,One of a long line of hair saving product. You...,5,product effectiveness,positive
834,I'm not sure what it is about this product but...,5,product effectiveness,positive
838,Excellent for repairing hair or keeping it hea...,5,product effectiveness,positive
840,"So far, so good! I have been using these produ...",5,product effectiveness,positive
842,Love this product use it with No 0\nIt's brill...,5,product effectiveness,positive
845,Perfect hair treatment! Leaves your hair smoot...,5,product effectiveness,positive
851,I really loved this product! My first time usi...,5,product effectiveness,positive
853,This product completely changed my hair from s...,5,product effectiveness,positive
856,I'm currently transitioning to natural and aft...,5,product effectiveness,positive


In [15]:
len(filtered_PP_df)

19281

In [16]:
import requests

# Replace with your actual Gemini API key
GEMINI_API_KEY = "AIzaSyCAVKhbKewPHoq377154ma-ythvK9nuSOo"

# API Endpoint
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key={GEMINI_API_KEY}"

# Request Headers
headers = {"Content-Type": "application/json"}

# 🔹 **Filter reviews for "Product Effectiveness" with a positive sentiment (All Ratings: 1-5)**
positive_reviews = filtered_PP_df["REVIEW_REVIEWTEXT"].tolist()

# Function to generate prompt for analyzing **positive insights**
def generate_positive_insights_prompt(positive_reviews):
    review_texts = "\n".join(f"- {review}" for review in positive_reviews)
    
    prompt = f"""
You are a business analyst evaluating customer feedback on **Product Effectiveness**.  
These reviews have a **positive sentiment** across all ratings (1 to 5), highlighting what customers appreciate about the product.  

### **Objective:**  
Summarize key positive insights into **structured, meaningful takeaways** that can be used for business strategy and marketing.  

### **Instructions:**  
1. Identify the **top 5 most common reasons** customers praise the product.  
2. Highlight **recurring patterns of satisfaction** (e.g., improved hair texture, noticeable results, long-lasting effects).  
3. Ensure the output is **concise, professional, and useful for business decisions**.  

### **Positive Reviews for Product Effectiveness (All Ratings 1-5):**  
{review_texts}

### **Expected Output Format:**  
**Final Consolidated Insights Summary:**  
- Start with a general statement about overall customer satisfaction.  
- List the **key strengths** of the product in a structured way.  
- Summarize **recurring praise patterns** concisely.  
- Conclude with an overall statement on customer sentiment and brand perception.  
"""
    return prompt

# Process reviews in batches of 200
batch_size = 200
all_insights = []

for i in range(0, len(positive_reviews), batch_size):
    batch_reviews = positive_reviews[i:i+batch_size]
    prompt = generate_positive_insights_prompt(batch_reviews)
    
    data = {"contents": [{"parts": [{"text": prompt}]}]}
    response = requests.post(url, headers=headers, json=data)
    
    extracted_text = response.json().get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "No insights generated.")
    all_insights.append(extracted_text)

# 🔹 **Summarize all batch insights into a structured final report**
def summarize_insights(insights_list):
    summary_prompt = f"""
You are an AI business analyst. Your task is to generate a **concise, structured summary** of customer praise and satisfaction trends.  
The input consists of multiple batch analyses, and your job is to combine them into a **single, well-organized summary**.  

### **Instructions:**  
1. **Do NOT repeat insights from different batches.** Instead, identify recurring praise themes and summarize them.  
2. Group positive feedback into **main categories** (e.g., Long-lasting Results, Texture Improvement, etc.).  
3. Ensure the output is **concise, professional, and suitable for business presentation**.  
4. Use a structured format similar to the example below.

### **Batch Insights:**  
{''.join(insights_list)}

### **Expected Output Format:**  
**Final Consolidated Insights Summary:**  
- General statement about overall customer satisfaction.  
- Key product strengths, structured into categories (e.g., Noticeable Results, Long-Lasting Effect, etc.).  
- Common positive patterns.  
- A concluding statement summarizing customer sentiment and product reputation.  
"""
    
    data = {"contents": [{"parts": [{"text": summary_prompt}]}]}
    response = requests.post(url, headers=headers, json=data)
    
    return response.json().get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "No summary generated.")

# Generate and print final summarized insights
final_summary = summarize_insights(all_insights)
print("\nFinal Consolidated Insights Summary:\n", final_summary)



Final Consolidated Insights Summary:
 **Final Consolidated Insights Summary:**

Overall, customers express overwhelmingly high satisfaction with Olaplex products, consistently highlighting their transformative effects on hair health, appearance, and manageability.

**Key Strengths:**

1.  **Damage Repair & Strengthening:** The most prominent and consistently cited benefit is the ability of Olaplex products to repair and strengthen hair damaged by bleaching, coloring, heat styling, and chemical treatments. This results in reduced breakage, split ends, and overall improved hair health.
2.  **Improved Hair Texture & Softness:** Customers frequently report that Olaplex makes their hair significantly softer, smoother, silkier, and more manageable. They consistently note a visible and tangible improvement in hair texture.
3.  **Frizz Reduction & Manageability:** Olaplex effectively tames frizz, controls flyaways, and makes hair easier to style and manage. This includes enhanced curl definit

**Final Consolidated Insights Summary:**

Overall, customers express overwhelmingly high satisfaction with Olaplex products, consistently highlighting their transformative effects on hair health, appearance, and manageability.

**Key Strengths:**

1.  **Damage Repair & Strengthening:** The most prominent and consistently cited benefit is the ability of Olaplex products to repair and strengthen hair damaged by bleaching, coloring, heat styling, and chemical treatments. This results in reduced breakage, split ends, and overall improved hair health.
2.  **Improved Hair Texture & Softness:** Customers frequently report that Olaplex makes their hair significantly softer, smoother, silkier, and more manageable. They consistently note a visible and tangible improvement in hair texture.
3.  **Frizz Reduction & Manageability:** Olaplex effectively tames frizz, controls flyaways, and makes hair easier to style and manage. This includes enhanced curl definition for those with curly hair.
4.  **Enhanced Shine & Hydration:** Users consistently praise Olaplex for adding a noticeable and healthy shine to their hair while also deeply hydrating it, revitalizing dull or lifeless hair.
5.  **Effective Toning (Specific to Blonde Enhancer Toning Shampoo):** The Olaplex No. 4P Blonde Enhancer Toning Shampoo is especially praised for effectively neutralizing yellow and brassy tones in blonde, silver, and gray hair, resulting in brighter, cooler tones and maintained salon-quality color.

**Recurring Praise Patterns:**

*   **Noticeable and Rapid Results:** Customers frequently report seeing and feeling a positive difference in their hair after just one or a few uses of the products.
*   **Transformation of Damaged Hair:** The product is often described as a "life-saver" and "miracle worker" for severely damaged hair, reversing years of damage and restoring hair to a healthier state.
*   **Long-Lasting Effects:** The positive effects of Olaplex extend beyond initial application, with users experiencing continued improvements in hair health and appearance over time.
*   **Versatility Across Hair Types:** Olaplex is considered effective for a wide range of hair types and conditions, including fine, thick, curly, straight, color-treated, damaged, dry, and aging hair.
*   **Enhanced Self-Confidence:** Due to the product's transformative nature, customers often report increased confidence in their hair and overall appearance.

**Overall Customer Sentiment and Brand Perception:**

Olaplex is perceived as a premium and highly effective brand, delivering on its promises of hair repair, enhanced texture, and improved overall hair health. Customers view Olaplex as a worthwhile investment, describing it as a "game changer" and a "holy grail" product. There is a strong sense of brand loyalty and a willingness to recommend Olaplex products to others.


In [18]:
def extract_product_effectiveness_positive(row):
    # Handle missing values safely
    theme_str = str(row['Theme']).lower() if pd.notna(row['Theme']) else ''
    sentiment_str = str(row['Sentiment']).lower() if pd.notna(row['Sentiment']) else ''

    themes = [t.strip() for t in theme_str.split(',')]
    sentiments = [s.strip() for s in sentiment_str.split(',')]

    if 'product effectiveness' in themes:
        indices = [i for i, t in enumerate(themes) if t == 'product effectiveness']
        
        # Ensure the index exists in sentiments before checking
        if any(i < len(sentiments) and sentiments[i] == 'negative' for i in indices):
            return True
    return False

# 🔹 **Filter DataFrame for positive sentiment about product effectiveness (includes all ratings 1-5)**
filtered_NP_df = df[df.apply(extract_product_effectiveness_positive, axis=1)]

# 🔹 **Keep only relevant columns**
filtered_NP_df = filtered_NP_df[['REVIEW_REVIEWTEXT','REVIEW_RATING','Theme','Sentiment']]
filtered_NP_df['Theme'] = 'product effectiveness'  # Keep only relevant theme
filtered_NP_df['Sentiment'] = 'negative'  # Keep only relevant sentiment

# Display the filtered reviews count
filtered_NP_df.head(10)

Unnamed: 0,REVIEW_REVIEWTEXT,REVIEW_RATING,Theme,Sentiment
831,I used this following 01 but didn’t see much d...,3,product effectiveness,negative
836,I am not a fan at all. It makes my hair stiff ...,3,product effectiveness,negative
837,"I'm sorry, but from the beginning it was a sig...",1,product effectiveness,negative
839,Not sure if it’s just me but the product didn’...,2,product effectiveness,negative
843,I purchased this product and began using it tw...,1,product effectiveness,negative
852,"Eh, I don't know if it helps or not. While it ...",1,product effectiveness,negative
859,The Hair Perfector didn't live up to my expect...,2,product effectiveness,negative
860,doesn’t seem to work as well as i thought it w...,1,product effectiveness,negative
864,2 weeks now so not noticing any difference fro...,3,product effectiveness,negative
887,It's not working as well as it did initially. ...,1,product effectiveness,negative


In [20]:
len(filtered_NP_df)

3328

In [21]:
import requests

# Replace with your actual Gemini API key
GEMINI_API_KEY = "AIzaSyCAVKhbKewPHoq377154ma-ythvK9nuSOo"

# API Endpoint
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key={GEMINI_API_KEY}"

# Request Headers
headers = {"Content-Type": "application/json"}

# 🔹 **Filter reviews for "Product Effectiveness" with a negative sentiment (Ratings: 1-3)**
negative_reviews = filtered_NP_df["REVIEW_REVIEWTEXT"].tolist()

# Function to generate prompt for analyzing **negative insights**
def generate_negative_insights_prompt(negative_reviews):
    review_texts = "\n".join(f"- {review}" for review in negative_reviews)
    
    prompt = f"""
You are a business analyst evaluating customer concerns regarding **Product Effectiveness**.  
These reviews have a **negative sentiment**, indicating dissatisfaction with the product.  

### **Objective:**  
Summarize key negative insights into **structured, meaningful takeaways** that can guide product improvements and business decisions.  

### **Instructions:**  
1. Identify the **top 5 most common complaints** customers have about the product.  
2. Highlight **recurring dissatisfaction patterns** (e.g., no visible results, side effects, poor texture).  
3. Ensure the output is **concise, professional, and actionable** for business teams.  

### **Negative Reviews for Product Effectiveness (Ratings 1-3):**  
{review_texts}

### **Expected Output Format:**  
**Final Consolidated Insights Summary:**  
- Start with a general statement about overall customer dissatisfaction trends.  
- List the **key problem areas** in a structured way.  
- Summarize **recurring dissatisfaction patterns** concisely.  
 
"""
    return prompt

# Process reviews in batches of 200
batch_size = 200
all_insights = []

for i in range(0, len(negative_reviews), batch_size):
    batch_reviews = negative_reviews[i:i+batch_size]
    prompt = generate_negative_insights_prompt(batch_reviews)
    
    data = {"contents": [{"parts": [{"text": prompt}]}]}
    response = requests.post(url, headers=headers, json=data)
    
    extracted_text = response.json().get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "No insights generated.")
    all_insights.append(extracted_text)

# 🔹 **Summarize all batch insights into a structured final report**
def summarize_negative_insights(insights_list):
    summary_prompt = f"""
You are an AI business analyst. Your task is to generate a **concise, structured summary** of customer dissatisfaction trends.  
The input consists of multiple batch analyses, and your job is to combine them into a **single, well-organized summary**.  

### **Instructions:**  
1. **Do NOT repeat complaints from different batches.** Instead, identify recurring pain points and summarize them.  
2. Group negative feedback into **main categories** (e.g., Lack of Effectiveness, Unpleasant Texture, Side Effects, etc.).  
3. Ensure the output is **concise, professional, and actionable** for product development teams.  
4. Use a structured format similar to the example below.

### **Batch Insights:**  
{''.join(insights_list)}

### **Expected Output Format:**  
**Final Consolidated Insights Summary:**  
- General statement about overall customer dissatisfaction.  
- Key product weaknesses, structured into categories (e.g., Ineffectiveness, Negative Reactions, etc.).  
- Common negative patterns.  
- A concluding statement summarizing customer sentiment and product perception.  
"""
    
    data = {"contents": [{"parts": [{"text": summary_prompt}]}]}
    response = requests.post(url, headers=headers, json=data)
    
    return response.json().get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "No summary generated.")

# Generate and print final summarized insights
final_summary = summarize_negative_insights(all_insights)
print("\nFinal Consolidated Insights Summary:\n", final_summary)



Final Consolidated Insights Summary:
 **Final Consolidated Insights Summary:**

Overall, customer reviews demonstrate substantial dissatisfaction with Olaplex products. Recurring complaints center on ineffectiveness, adverse effects on hair and scalp health, and issues with product usability, texture, and value for money. A significant number of users feel that the products fail to live up to their claims and high price point, leading to widespread disappointment and eroded brand trust.

**Key Problem Areas:**

1.  **Lack of Visible Improvement/Ineffectiveness:** The most frequent complaint is the absence of noticeable positive changes in hair health, texture, shine, strength, frizz control, or overall appearance, despite consistent use. Many users report no difference compared to cheaper alternatives.
2.  **Hair Damage/Dryness/Brittleness:** A large number of users experience a *deterioration* in hair quality. Common issues include increased dryness, brittleness, straw-like texture, 

**Final Consolidated Insights Summary:**

Overall, customer reviews demonstrate substantial dissatisfaction with Olaplex products. Recurring complaints center on ineffectiveness, adverse effects on hair and scalp health, and issues with product usability, texture, and value for money. A significant number of users feel that the products fail to live up to their claims and high price point, leading to widespread disappointment and eroded brand trust.

**Key Problem Areas:**

1.  **Lack of Visible Improvement/Ineffectiveness:** The most frequent complaint is the absence of noticeable positive changes in hair health, texture, shine, strength, frizz control, or overall appearance, despite consistent use. Many users report no difference compared to cheaper alternatives.
2.  **Hair Damage/Dryness/Brittleness:** A large number of users experience a *deterioration* in hair quality. Common issues include increased dryness, brittleness, straw-like texture, coarseness, tangling, split ends, and increased frizz.
3.  **Hair Loss and Breakage:** A concerning number of reviews mention increased hair shedding, hair loss, thinning, and breakage, which users often directly attribute to using the products.
4.  **Scalp Irritation/Adverse Reactions:** Many customers report experiencing scalp irritation, itchiness, redness, flakiness (dandruff), scabs, bumps, sores, chemical burns, breakouts, allergic reactions, and a burning sensation after using Olaplex products.
5.  **Unfavorable Hair Texture and Manageability:** Users report negative changes in hair texture, including hair feeling greasy, oily, heavy, weighed down, sticky, gummy, waxy, coated, stiff, or "dirty." Some experience difficulty combing through hair or styling it. Curl patterns may be negatively affected.
6.  **Poor Value for Money:** A common sentiment is that the products are overpriced relative to their perceived benefits, especially given the small bottle sizes and the need for frequent reapplication. Many feel the results do not justify the high cost and that cheaper alternatives are more effective.
7.  **Application and Packaging Issues:** Several complaints relate to product usability, including:
    *   Difficult or messy application
    *   Poor spray nozzle design (jet stream instead of mist, aerosol malfunction).
    *   Insufficient product for complete saturation, especially for long or thick hair.
    *   Small or difficult-to-read instructions.
    *   Packaging leaks or malfunctions.
8.  **Color Alteration Issues (Specifically with Purple Shampoo and Blonde Oils):** Some users report unwanted color changes, such as purple, blue, gray, or muddied tones in blonde or highlighted hair. Oils can stain blonde hair yellow or orange. Some report toner stripping and brassiness after purple shampoo use.

**Recurring Dissatisfaction Patterns:**

*   **No Visible Results/Lack of Improvement:** Products consistently fail to deliver on promises of repair, strengthening, smoothing, hydration, frizz control, and enhanced hair appearance.
*   **Adverse Effects Outweighing Benefits:** Even when initial positive results are observed, they are often overshadowed by negative side effects like dryness, breakage, hair loss, and scalp irritation.
*   **Paradoxical Effects:** Products marketed for repair are perceived as causing damage, dryness, and breakage. Products designed for oil absorption may make hair look oilier.
*   **Texture Issues:** Products often lead to undesirable changes in hair texture, such as dryness, coarseness, stiffness, greasiness, or a sticky/gummy feel.
*   **High Expectations, Underwhelming Performance:** The product's high price point, marketing hype, and positive stylist recommendations create high expectations that are not met, leading to amplified disappointment.
*   **Formula Inconsistency:** Original formulas work, but new formulas are disliked
*   **Value Dissatisfaction:** High price points with a lack of noticeable improvement and the need for large amounts of product leads to the feeling of wasted money.

**Concluding Statement:**

Customers express significant frustration that Olaplex products frequently fail to deliver on advertised benefits and, in many cases, worsen hair condition or cause adverse reactions. This, coupled with the high price point and issues with product usability, has resulted in widespread dissatisfaction and a perception of poor value, damaging brand reputation and eroding customer trust.

In [4]:
# Function to extract 'product effectiveness' with negative sentiment
def extract_product_effectiveness(row):
    # Handle missing values safely
    theme_str = str(row['Theme']).lower() if pd.notna(row['Theme']) else ''
    sentiment_str = str(row['Sentiment']).lower() if pd.notna(row['Sentiment']) else ''

    themes = [t.strip() for t in theme_str.split(',')]
    sentiments = [s.strip() for s in sentiment_str.split(',')]

    if 'product effectiveness' in themes:
        indices = [i for i, t in enumerate(themes) if t == 'product effectiveness']
        
        # Ensure the index exists in sentiments before checking
        if any(i < len(sentiments) and sentiments[i] == 'negative' for i in indices):
            return True
    return False

# Filter DataFrame for rating 1 and negative product effectiveness
filtered_df = df[(df['REVIEW_RATING'] == 1) & df.apply(extract_product_effectiveness, axis=1)]

# Keep only relevant columns
filtered_df = filtered_df[['REVIEW_REVIEWTEXT', 'REVIEW_RATING','PRODUCT_ROLL_UP_NAME','Theme', 'Sentiment']]
filtered_df['Theme'] = 'product effectiveness'  # Keep only relevant theme
filtered_df['Sentiment'] = 'negative'  # Keep only relevant sentiment

# Display the filtered reviews
#print(filtered_df)
print(len(filtered_df))

1653


In [6]:
import requests

# Replace with your actual Gemini API key
GEMINI_API_KEY = "AIzaSyCAVKhbKewPHoq377154ma-ythvK9nuSOo"

# API Endpoint
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key={GEMINI_API_KEY}"

# Request Headers
headers = {"Content-Type": "application/json"}

# 🔹 **Filter reviews with negative sentiment and rating of 1**
negative_reviews = filtered_df[filtered_df["REVIEW_RATING"] == 1]["REVIEW_REVIEWTEXT"].tolist()

# Function to generate prompt for analyzing negative reviews
def generate_negative_insights_prompt(negative_reviews):
    review_texts = "\n".join(f"- {review}" for review in negative_reviews)
    
    prompt = f"""
You are a business analyst reviewing customer complaints specifically about **Product Effectiveness**.  
These reviews have a **negative sentiment** and a **rating of 1**, indicating strong dissatisfaction.  

### **Objective:**  
Summarize customer complaints into clear, structured insights.  
The summary should be **concise and business-friendly**, capturing the main dissatisfaction themes.  

### **Instructions:**  
1. Identify the **top 5 most common reasons** customers are dissatisfied.  
2. Highlight **recurring patterns of dissatisfaction** (e.g., lack of results, hair damage).  
3. Ensure the output is **easy to understand and presentable to stakeholders**.  

### **Negative Reviews for Product Effectiveness (Rating: 1):**  
{review_texts}

### **Expected Output Format:**  
**Final Consolidated Insights Summary:**  
- Start with a general statement about overall customer dissatisfaction.  
- List the **key issues** customers face, in a structured way.  
- Summarize **recurring dissatisfaction patterns** concisely.  
- Conclude with an overall statement about customer sentiment toward the product.  
"""
    return prompt

# Process reviews in batches of 200
batch_size = 200
all_insights = []

for i in range(0, len(negative_reviews), batch_size):
    batch_reviews = negative_reviews[i:i+batch_size]
    prompt = generate_negative_insights_prompt(batch_reviews)
    
    data = {"contents": [{"parts": [{"text": prompt}]}]}
    response = requests.post(url, headers=headers, json=data)
    
    extracted_text = response.json().get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "No insights generated.")
    all_insights.append(extracted_text)

# 🔹 **Summarize all batch insights into a structured final report**
def summarize_insights(insights_list):
    summary_prompt = f"""
You are an AI business analyst. Your task is to generate a **concise, structured summary** of customer complaints.  
The input consists of multiple batch analyses, and your job is to combine them into a **single, well-organized summary**.  

### **Instructions:**  
1. **Do NOT repeat insights from different batches.** Instead, identify recurring issues and summarize them.  
2. Group customer complaints into **main problem categories** (e.g., hair damage, ineffectiveness, etc.).  
3. Ensure the output is **concise, professional, and suitable for business presentation**.  
4. Use a structured format similar to the example below.

### **Batch Insights:**  
{''.join(insights_list)}

### **Expected Output Format:**  
**Final Consolidated Insights Summary:**  
- General statement about customer dissatisfaction.  
- Key issues, structured into categories (e.g., Hair Damage, No Improvement, etc.).  
- Common dissatisfaction patterns.  
- A concluding statement summarizing customer sentiment.  
"""
    
    data = {"contents": [{"parts": [{"text": summary_prompt}]}]}
    response = requests.post(url, headers=headers, json=data)
    
    return response.json().get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "No summary generated.")

# Generate and print final summarized insights
final_summary = summarize_insights(all_insights)
print("\nFinal Consolidated Insights Summary:\n", final_summary)



Final Consolidated Insights Summary:
 **Final Consolidated Insights Summary:**

Customers express widespread and significant dissatisfaction with Olaplex products, frequently reporting that they not only fail to deliver promised benefits like hair repair and strengthening, but actively cause damage and adverse effects. A prevailing sentiment is that the products are overpriced, ineffective, and, in many cases, detrimental to hair and scalp health.

**Key Issues:**

1.  **Hair Damage/Dryness/Brittleness:** The most dominant complaint centers around increased hair damage, with customers reporting dryness, brittleness, breakage, split ends, a straw-like texture, coarseness, and overall worsening of hair condition.
2.  **Hair Loss/Shedding:** A significant number of users experienced increased hair loss, shedding, and thinning, with some reporting alarming levels of hair fall and even bald spots.
3.  **Lack of Noticeable Improvement:** Many customers report a complete lack of visible impr

**Final Consolidated Insights Summary:**

Customers express widespread and significant dissatisfaction with Olaplex products, frequently reporting that they not only fail to deliver promised benefits like hair repair and strengthening, but actively cause damage and adverse effects. A prevailing sentiment is that the products are overpriced, ineffective, and, in many cases, detrimental to hair and scalp health.

**Key Issues:**

1.  **Hair Damage/Dryness/Brittleness:** The most dominant complaint centers around increased hair damage, with customers reporting dryness, brittleness, breakage, split ends, a straw-like texture, coarseness, and overall worsening of hair condition.
2.  **Hair Loss/Shedding:** A significant number of users experienced increased hair loss, shedding, and thinning, with some reporting alarming levels of hair fall and even bald spots.
3.  **Lack of Noticeable Improvement:** Many customers report a complete lack of visible improvement in hair health, texture, shine, manageability, or frizz control, despite using the products as directed and for extended periods.
4.  **Scalp Irritation/Allergic Reactions:** Scalp irritation, including itching, redness, flakiness, dandruff, bumps, blisters, burns, and allergic reactions, is a recurring concern among users.
5.  **Oily/Greasy Hair:** A notable segment of customers, even those with dry hair, report that Olaplex products leave their hair feeling excessively oily, greasy, heavy, weighed down, or with a sticky residue.
6.  **Ineffective Toner/Color Changes:** Specific to purple shampoo and other color-altering products, issues include hair turning purple, grey, or blue, brassiness not being toned, or desired toner being stripped.
7. **Product Design/Application Issues:** Many find the application process for No. 0 difficult, noting the watery consistency, ineffective spray nozzle, and the large amount of product needed per application, leading to quick depletion and perceived high cost per use. Dry shampoo users reported the product to be ineffective at cleaning their hair, leaving hair greasy and flat or producing a visible white residue. Finally, aerosol cans stopped working after minimal use.
8.  **Lash Serum Ineffectiveness/Adverse Reactions:** Some customers also reported a lack of growth, lashes falling out, eye irritation, and no noticeable difference after months of use.
9. **Ineffective Volume Products:** Volume products were also failing to create a volumizing effect that customers were hoping for.

**Recurring Dissatisfaction Patterns:**

*   **Paradoxical Damage:** A core, recurring theme is that products marketed for hair repair and strengthening instead cause significant damage, dryness, breakage, and hair loss.
*   **Unmet Expectations:** The products often fail to meet the high expectations set by online hype, stylist recommendations, and brand marketing, resulting in deep disappointment.
*   **High Cost vs. Low Value:** The high price point of Olaplex products amplifies dissatisfaction when customers experience negative or non-existent results, leading to feelings of being misled and overcharged.
*   **Inconsistency Across Hair Types:** Negative experiences are reported across a wide range of hair types (fine, thick, curly, straight, color-treated, virgin), indicating the issues are not limited to specific hair profiles.
*   **Initial Positive Results Followed by Decline:** Some customers report initial positive results that are later overshadowed by negative outcomes, such as hair thinning or damage after prolonged use.

**Overall Sentiment:**

Customer sentiment is overwhelmingly negative, characterized by disappointment, frustration, anger, and regret. Customers perceive the products as overpriced, ineffective, and, in many cases, actively harmful to their hair and scalp. Trust in the brand's claims is eroded by these experiences, leading to a reluctance to repurchase and a desire to warn others about the potential risks.


# Further analysis can be done for the below scenarioes

- Get insights for top 5 themes of reviews having rating from 1 to 5
- Get insights for all the Roll up names separately with different combinations of ratings, themes and sentiments 
- Get insights for different pricing groups, product groups with combinations of ratings and themes
- combine similar themes and get generalised insights 
- create plots for themes and sentiments with % values 
- create  report with improvements , strategies and summary 

In [None]:
# Function to extract 'product effectiveness' with positive sentiment
def extract_product_effectiveness(row):
    # Handle missing values safely
    theme_str = str(row['Theme']).lower() if pd.notna(row['Theme']) else ''
    sentiment_str = str(row['Sentiment']).lower() if pd.notna(row['Sentiment']) else ''

    themes = [t.strip() for t in theme_str.split(',')]
    sentiments = [s.strip() for s in sentiment_str.split(',')]

    if 'product effectiveness' in themes:
        indices = [i for i, t in enumerate(themes) if t == 'product effectiveness']
        
        # Ensure the index exists in sentiments before checking
        if any(i < len(sentiments) and sentiments[i] == 'negative' for i in indices):
            return True
    return False

# Filter DataFrame for rating 1 and negative product effectiveness
filtered_df = df[(df['REVIEW_RATING'] == 1) & df.apply(extract_product_effectiveness, axis=1)]

# Keep only relevant columns
filtered_df = filtered_df[['REVIEW_REVIEWTEXT', 'REVIEW_RATING','PRODUCT_ROLL_UP_NAME','Theme', 'Sentiment']]
filtered_df['Theme'] = 'product effectiveness'  # Keep only relevant theme
filtered_df['Sentiment'] = 'negative'  # Keep only relevant sentiment

# Display the filtered reviews
#print(filtered_df)
print(len(filtered_df))