# Customer review topic understanding using Snowflake Cortex
Understanding customer feedback is critical for businesses, but analyzing large volumes of unstructured text can be challenging. In this notebook, you'll use Cortex AISQL to systematically getting insights from unstructured customer feedback.

### Context
*Tasty Bytes* is a global e-commerce company selling different merchandise. They collect customer reviews to gain insights into the feedback on the product they provide.

In this notebook, we will leverage multiple AISQL functions to answer different use case questions upon customer reviews.

##

## Step 1: Set up your environment and data

Let's begin by running the query below. It sets the correct role and warehouse for this session and creates a new schema. It also creates and populates two tables, `PRODUCT_REVIEWS` and `PRODUCT_CATALOG`, with sample data for our analysis.

In [None]:
USE ROLE SNOWFLAKE_LEARNING_ROLE;

-- use the existing database, schema and warehouse
USE DATABASE SNOWFLAKE_LEARNING_DB;
USE WAREHOUSE SNOWFLAKE_LEARNING_WH;

SET schema_name = CONCAT(current_user(), '_CUSTOMER_REVIEW_TOPIC_UNDERSTANDING');
USE SCHEMA IDENTIFIER($schema_name);

  /*--
  • file format and stage creation
  --*/

  CREATE OR REPLACE FILE FORMAT csv_ff 
    TYPE = 'csv'
    SKIP_HEADER = 1;

  CREATE OR REPLACE STAGE s3load
    COMMENT = 'Quickstarts S3 Stage Connection'
    URL = 's3://sfquickstarts/misc/aisql/ecommerce_customer_review/'
    FILE_FORMAT = csv_ff;

  /*--
  • raw zone table build 
  --*/
  CREATE OR REPLACE TABLE customer_data
  (
    CUSTOMER_ID	VARCHAR(16777216),
    CUSTOMER_SEGMENT	VARCHAR(16777216),
    JOIN_DATE	DATE,
    LIFETIME_VALUE	NUMBER(38,2),
    PREVIOUS_PURCHASES	NUMBER(38,0),
    AGE_RANGE	VARCHAR(16777216),
    GENDER	VARCHAR(16777216),
    PREFERRED_CATEGORY	VARCHAR(16777216)
  );

  
  CREATE OR REPLACE TABLE product_catalog
  (
    PRODUCT_ID	VARCHAR(16777216),
    PRODUCT_NAME	VARCHAR(16777216),
    CATEGORY	VARCHAR(16777216),
    SUBCATEGORY	VARCHAR(16777216),
    MANUFACTURER	VARCHAR(16777216),
    PRICE	NUMBER(38,2),
    RELEASE_DATE	DATE,
    REVIEW_COUNT	NUMBER(38,0)
  );

  CREATE OR REPLACE TABLE product_reviews
  (
    REVIEW_ID	VARCHAR(16777216),
    PRODUCT_ID	VARCHAR(16777216),
    CUSTOMER_ID	VARCHAR(16777216),
    REVIEW_TEXT	VARCHAR(16777216),
    RATING	NUMBER(38,0),
    REVIEW_DATE	DATE,
    PURCHASE_DATE	DATE,
    VERIFIED_PURCHASE	BOOLEAN,
    HELPFUL_VOTES	NUMBER(38,0)
  );
  
  /*--
  • raw zone table load 
  --*/

  COPY INTO customer_data
  FROM @s3load/customer_data.csv
  ON_ERROR = CONTINUE;

  COPY INTO product_catalog
  FROM @s3load/product_catalog.csv
  ON_ERROR = CONTINUE;

  COPY INTO product_reviews
  FROM @s3load/product_reviews.csv
  ON_ERROR = CONTINUE;


-- setup completion note
SELECT 'Setup is complete' AS note;  


#### Overview across `product_reviews` table

In [None]:
-- Create table with inferred schema
CREATE
OR REPLACE TABLE my_table USING TEMPLATE (
  SELECT
    ARRAY_AGG(OBJECT_CONSTRUCT(*))
  FROM
    TABLE(
      INFER_SCHEMA(
        LOCATION => '@s3load/customer_data.csv',
        FILE_FORMAT => 'csv_ff'
      )
    )
);
-- Load the data
COPY INTO my_table
FROM
  @s3load/customer_data.csv FILE_FORMAT = csv_ff;

## Step 2: Correlate sentiment with ratings

As a first step, let's perform a quick sanity check. We'll use the `SNOWFLAKE.CORTEX.SENTIMENT` function to score the sentiment of each review. We can then check its correlation with the user-provided star rating to see if they align.

In [None]:
WITH EXTRACTED_SENTIMENT AS (
    SELECT 
        RATING,
        SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) AS SENTIMENT
    FROM PRODUCT_REVIEWS
)
SELECT CORR(SENTIMENT, RATING) AS SENTIMENT_RATING_CORRELATION
FROM EXTRACTED_SENTIMENT;

## Step 3: Find top issues in a category

Now, let's dig deeper. Suppose you want to know what the biggest complaints are for 'Electronics'. You can use `AI_AGG` to analyze all relevant reviews and aggregate the common themes into a single summary.

In [None]:
SELECT 
  AI_AGG(
    REVIEW_TEXT, 
    'What are the top 3 most common product issues reported in these reviews?'
  ) AS TOP_ISSUES
FROM PRODUCT_REVIEWS pr
JOIN PRODUCT_CATALOG pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Electronics';

In [None]:
# to view the result
df = agg_insights.to_pandas()
print(df['TOP_ISSUES'].iloc[0])

## Step 4: What percentage of reviews mention product issues? Is it differentiated by category? 

Once you've identified common issues like 'Sizing' or 'Color,' you can use `AI_CLASSIFY` to build a pipeline that automatically categorizes new reviews. This helps you systematically track and report on known problems.

In [None]:
WITH issue_detection AS (
  SELECT 
    pr.review_id,
    pc.category,
    AI_FILTER(prompt('This review mentions a product issue or complaint: {0}', pr.review_text)) as has_issue
  FROM product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
)

-- Overall percentage
SELECT 
  'All Categories' as category,
  COUNT(*) as total_reviews,
  SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) as issue_reviews,
  ROUND(SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as issue_percentage
FROM issue_detection

UNION ALL

-- Percentage by category
SELECT 
  category,
  COUNT(*) as total_reviews,
  SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) as issue_reviews,
  ROUND(SUM(CASE WHEN has_issue THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) as issue_percentage
FROM issue_detection
GROUP BY category
ORDER BY category;

## Step 5: Identify the most common issue in Clothing category.

To answer this question, we start with filtering to Clothing category, and leverage our latest [AI_FILTER](https://docs.snowflake.com/sql-reference/functions/ai_filter) to identify reviews mentioned product issues like above.

The next step we use the [AI_AGG](https://docs.snowflake.com/sql-reference/functions/ai_agg) function to get a list of all product issues mentioned.

In [None]:
CREATE OR REPLACE TEMP TABLE filtered_product_reviews AS
SELECT *
FROM product_reviews
WHERE AI_FILTER(PROMPT('This review mentions a product issue or complaint: {0}', review_text));

In [None]:
SELECT
  AI_AGG(
    review_text, 
    'Analyze these clothing product reviews and provide a comprehensive list of all product issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage.'
  ) as clothing_issues
FROM filtered_product_reviews pr
JOIN product_catalog pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Clothing'

## Step 6: Productionalize the pipeline:
With the issues suggested through the [AI_AGG](https://docs.snowflake.com/sql-reference/functions/ai_agg) function pipeline above, leverage [AI_CLASSIFY](https://docs.snowflake.com/sql-reference/functions/ai_classify) to turn into continuous data pipeline to keep classify the reviews.

In [None]:
WITH clothing_issue_reviews AS (
  SELECT 
    pr.review_id,
    pr.review_text
  FROM filtered_product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
  WHERE pc.category = 'Clothing'
),
classified_reviews AS (
  SELECT 
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text, 
      [
        'Sizing issue', 
        'Color issue', 
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)
SELECT 
    review_id,
    review_text,
    classification:labels[0]::text as issue_category
  FROM classified_reviews


## Step 7: Generate responses to customer complaints

Finally, let's close the loop. You can use `AI_COMPLETE` to help your support team draft empathetic and relevant responses to negative reviews, improving customer satisfaction at scale.

In [None]:
WITH clothing_issue_reviews AS (
  SELECT 
    pr.review_id,
    pr.review_text
  FROM filtered_product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
  WHERE pc.category = 'Clothing'
)
SELECT 
    review_id,
    review_text,
    AI_COMPLETE('llama4-maverick', 'Please help me draft a concise response to the customer complaints below. Please only include the draft and nothing else: ' || review_text) as response
  FROM clothing_issue_reviews


## Key Takeaways

* **End-to-End Workflow**: You can chain Cortex AI functions together (`SENTIMENT` -> `AI_AGG` -> `AI_CLASSIFY` -> `AI_COMPLETE`) to build a powerful analysis pipeline entirely within Snowflake.
* **Insight from Unstructured Data**: You don't need complex data science tools to extract valuable insights from text. All of this was done with familiar SQL.
* **Automate and Scale**: By identifying common issues and creating classifiers, you can automate the process of tracking feedback and responding to customers more efficiently.

## Additional Resources

* [Documentation: Cortex AI SQL Functions](https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql)