# Build a Customer Review Analytics Dashboard with Streamlit on Snowflake

In this notebook, we're performing data processing of the Avalanche customer review data. By the end of the tutorial, we'll have created a few data visualization to gain insights into the general sentiment of the products.

## Avalanche data

The Avalanche data set is based on a hypothetical company that sells winter sports gear. Holistically, this data set is comprised of the product catalog, customer review, shipping logistics and order history.

In this particular notebook, we'll use only the customer review data. We'll start by uploading customer review data in DOCX format. Next, we'll parse and reshape the data into a semi-structured form. Particularly, we'll apply LLMs for language translation and text summarization along with sentiment analysis.

## Retrieve customer review data

First, we're starting by querying and parsing the content from DOCX files that are stored on the `@avalanche_db.avalanche_schema.customer-reviews` stage.

In [None]:
-- Parse content from DOCX files
WITH files AS (
  SELECT 
    REPLACE(REGEXP_SUBSTR(file_url, '[^/]+$'), '%2e', '.') as filename
  FROM DIRECTORY('@avalanche_db.avalanche_schema.customer_reviews')
  WHERE filename LIKE '%.docx'
)
SELECT 
  filename,
  SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
    @avalanche_db.avalanche_schema.customer_reviews,
    filename,
    {'mode': 'layout'}
  ):content AS layout
FROM files;

## Data reshaping

We're reshaping the data to a more structured form by using regular expression to create additional columns from the customer review `LAYOUT` column.

In [None]:
-- Extract PRODUCT name, DATE, and CUSTOMER_REVIEW from the LAYOUT column
SELECT 
  filename,
  REGEXP_SUBSTR(layout, 'Product: (.*?) Date:', 1, 1, 'e') as product,
  REGEXP_SUBSTR(layout, 'Date: (202[0-9]-[0-9]{2}-[0-9]{2})', 1, 1, 'e') as date,
  REGEXP_SUBSTR(layout, '## Customer Review\n([\\s\\S]*?)$', 1, 1, 'es') as customer_review
FROM {{sql1}};

## Apply Cortex LLM on customer review data

Here, we'll apply the Cortex LLM to perform the following 3 tasks:
- Text translation is performed on foreign language text where they are translated to English.
- Text summarization is performed on the translated text to obtain a more concise summary.
- Sentiment score is calculated to give insights on whether the sentiment was positive or negative.

In [None]:
-- Perform translation, summarization and sentiment analysis on customer review
SELECT 
    product,
    date,
    SNOWFLAKE.CORTEX.TRANSLATE(customer_review, '', 'en') as translated_review,
    SNOWFLAKE.CORTEX.SUMMARIZE(translated_review) as summary,
    SNOWFLAKE.CORTEX.SENTIMENT(translated_review) as sentiment_score
FROM {{sql2}}
ORDER BY date;

## Convert SQL output to Pandas DataFrame

Here, we'll convert the SQL output to a Pandas DataFrame by applying the `to_pandas()` method.

In [None]:
sql3.to_pandas()

## Bar charts

Here, we're creating some bar charts for the sentiment scores.

### Daily sentiment scores

Note: Positive values are shown in green while negative values in red.

In [None]:
import streamlit as st
import altair as alt
import pandas as pd

# Ensure SENTIMENT_SCORE is numeric
df['SENTIMENT_SCORE'] = pd.to_numeric(df['SENTIMENT_SCORE'])

# Create the base chart with bars
chart = alt.Chart(df).mark_bar(size=15).encode(
    x=alt.X('DATE:T',
            axis=alt.Axis(
                format='%Y-%m-%d',  # YYYY-MM-DD format
                labelAngle=90)  # Rotate labels 90 degrees
            ),
    y=alt.Y('SENTIMENT_SCORE:Q'),
    color=alt.condition(
        alt.datum.SENTIMENT_SCORE >= 0,
        alt.value('#2ecc71'),  # green for positive
        alt.value('#e74c3c')   # red for negative
    ),
    tooltip=['PRODUCT:N', 'DATE:T'] # Add tooltip
).properties(
    height=500
)

# Display the chart
st.altair_chart(chart, use_container_width=True)

### Product sentiment scores

In [None]:
import streamlit as st
import altair as alt
import pandas as pd

# Create the base chart with aggregation by PRODUCT
bars = alt.Chart(df).mark_bar(size=15).encode(
    y=alt.Y('PRODUCT:N', 
            axis=alt.Axis(
                labelAngle=0,  # Horizontal labels
                labelOverlap=False,  # Prevent label overlap
                labelPadding=10  # Add some padding
            )
    ),
    x=alt.X('mean(SENTIMENT_SCORE):Q',  # Aggregate mean sentiment score
            title='MEAN SENTIMENT_SCORE'),
    color=alt.condition(
        alt.datum.mean_SENTIMENT_SCORE >= 0,
        alt.value('#2ecc71'),  # green for positive
        alt.value('#e74c3c')   # red for negative
    ),
    tooltip=['PRODUCT:N', 'mean(SENTIMENT_SCORE):Q']
).properties(
    height=400
)

# Display the chart
st.altair_chart(bars, use_container_width=True)

In [None]:
# Download button for the CSV file
st.subheader('Processed Customer Reviews Data')
st.download_button(
    label="Download CSV",
    data=df[['PRODUCT', 'DATE', 'SUMMARY', 'SENTIMENT_SCORE']].to_csv(index=False).encode('utf-8'),
    mime="text/csv"
)