# Customer review topic understanding using Snowflake Cortex
Understanding customer feedback is critical for businesses, but analyzing large volumes of unstructured text can be challenging. In this notebook, you'll use Cortex AISQL to systematically getting insights from unstructured customer feedback.

### Context
*Tasty Bytes* is a global e-commerce company selling different merchandise. They collect customer reviews to gain insights into the feedback on the product they provide.

In this notebook, we will leverage multiple AISQL functions to answer different use case questions upon customer reviews.

##

## Import sample data

In this next SQL query, we will populate sample data that is used in this and other templates.

In [None]:
USE ROLE SNOWFLAKE_LEARNING_ROLE;

-- use the existing database, schema and warehouse
USE DATABASE SNOWFLAKE_LEARNING_DB;
USE WAREHOUSE SNOWFLAKE_LEARNING_WH;

SET schema_name = CONCAT(current_user(), '_SUMMARIZE_UNSTRUCTURED_CUSTOMER_REVIEWS');
USE SCHEMA IDENTIFIER($schema_name);


/ -- TO ADD TABLE LOADS

  
-- setup completion note
SELECT 'Setup is complete' AS note;


## Overview across `product_reviews` table

In [None]:
SELECT * FROM product_reviews LIMIT 15;

## Warm up: let's see if review texts' sentiment correlates with ratings
We can simply leverage Snowflake's [Sentiment](https://docs.snowflake.com/en/sql-reference/functions/sentiment-snowflake-cortex) function to learn the sentiments across all reviews. Here we run a simple correlation check with user ratings for sanity check. 

In [None]:
WITH EXTRACTED_SENTIMENT as (
    SELECT 
        *,
        SNOWFLAKE.CORTEX.SENTIMENT(review_text) as sentiment,
    FROM product_reviews
)

SELECT
    corr(sentiment, rating) as correlation
FROM EXTRACTED_SENTIMENT;

## Q1: What are the top 3 most common product issues reported in Electronics category reviews?

To answer the first question, we simply leverage our latest [AI_AGG](https://docs.snowflake.com/sql-reference/functions/ai_agg) function to get aggregated insights across all rows of review within the Electronics category.

In [None]:
SELECT 
  AI_AGG(
    review_text, 
    'What are the top 3 most common product issues reported in Electronics category reviews?'
  ) as top_issues
FROM product_reviews pr
JOIN product_catalog pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Electronics';

In [None]:
df = cell7.to_pandas()
print(df['TOP_ISSUES'].iloc[0])

## Q2: What percentage of reviews mention product issues? Is it differentiated by category? 

To answer this question, we simply leverage our latest [AI_FILTER](https://docs.snowflake.com/sql-reference/functions/ai_filter) to identify reviews mentioned specific product issue, and calculate into % issue complaint ratios via all reviews.

In [None]:
WITH issue_detection AS (
  SELECT 
    pr.review_id,
    pc.category,
    AI_FILTER(prompt('This review mentions a product issue or complaint: {0}', pr.review_text)) as has_issue
  FROM product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
)

-- Overall percentage
SELECT 
  'All Categories' as category,
  COUNT(*) as total_reviews,
  SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) as issue_reviews,
  ROUND(SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as issue_percentage
FROM issue_detection

UNION ALL

-- Percentage by category
SELECT 
  category,
  COUNT(*) as total_reviews,
  SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) as issue_reviews,
  ROUND(SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as issue_percentage
FROM issue_detection
GROUP BY category
ORDER BY category;

## Q3: Identify the most common issue in Clothing category.

To answer this question, we start with filtering to Clothing category, and leverage our latest [AI_FILTER](https://docs.snowflake.com/sql-reference/functions/ai_filter) to identify reviews mentioned product issues like above.

The next step we use the [AI_AGG](https://docs.snowflake.com/sql-reference/functions/ai_agg) function to get a list of all product issues mentioned.

In [None]:
create or replace temp table filtered_product_reviews as 
SELECT *
FROM product_reviews
WHERE AI_FILTER(prompt('This review mentions a product issue or complaint: {0}', review_text));

In [None]:
SELECT 
  AI_AGG(
    review_text, 
    'Analyze these clothing product reviews and provide a comprehensive list of all product issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage.'
  ) as clothing_issues
FROM filtered_product_reviews pr
JOIN product_catalog pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Clothing'

### Productionalize the pipeline:
With the issues suggested through the [AI_AGG](https://docs.snowflake.com/sql-reference/functions/ai_agg) function pipeline above, leverage [AI_CLASSIFY](https://docs.snowflake.com/sql-reference/functions/ai_classify) to turn into continuous data pipeline to keep classify the reviews.

In [None]:
WITH clothing_issue_reviews AS (
  SELECT 
    pr.review_id,
    pr.review_text
  FROM filtered_product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
  WHERE pc.category = 'Clothing'
),
classified_reviews AS (
  SELECT 
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text, 
      [
        'Sizing issue', 
        'Color issue', 
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)
SELECT 
    review_id,
    review_text,
    classification:labels[0]::text as issue_category
  FROM classified_reviews


## Systematically generate response for product complaints

We leverage our general [AI_COMPLETE](https://docs.snowflake.com/sql-reference/functions/ai_filter) functionality to construct email message to respond back to certain critical reviews


In [None]:
WITH clothing_issue_reviews AS (
  SELECT 
    pr.review_id,
    pr.review_text
  FROM filtered_product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
  WHERE pc.category = 'Clothing'
)
SELECT 
    review_id,
    review_text,
    AI_COMPLETE('llama4-maverick', 'Please help me draft a concise response to the customer complaints below. Please only include the draft and nothing else: ' || review_text) as response
  FROM clothing_issue_reviews
