# Text Analytics with Cortex AISQL

This notebook demonstrates text-focused AISQL functions:
- **AI_COMPLETE**: Generate completions for text prompts
- **AI_FILTER**: Boolean classification for filtering data
- **AI_CLASSIFY**: Classify text into multiple categories
- **AI_SENTIMENT**: Extract sentiment scores
- **AI_EXTRACT**: Extract structured information
- **SUMMARIZE**: Summarize text content


In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from snowflake.snowpark import Session
from IPython.display import display, Markdown, HTML

# Try to get active session (for Snowflake Notebooks)
# Otherwise, connect using ~/.snowflake/connections.toml
try:
    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
    print("✅ Connected using Snowflake Notebooks session")
except:
    # Fallback: Connect using connections.toml
    import toml
    from pathlib import Path
    
    # Load connection parameters from ~/.snowflake/connections.toml
    toml_path = Path.home() / ".snowflake" / "connections.toml"
    
    if toml_path.exists():
        connections = toml.load(toml_path)
        # Use the default connection or specify a different one
        connection_name = "default"  # Change this to your connection name if needed
        
        if connection_name in connections:
            conn_params = connections[connection_name]
            session = Session.builder.configs(conn_params).create()
            print(f"✅ Connected using connection profile: {connection_name}")
        else:
            raise Exception(f"❌ Connection '{connection_name}' not found in {toml_path}")
    else:
        print(f"❌ Connection file not found at {toml_path}")
        print("Please create ~/.snowflake/connections.toml with your Snowflake credentials")
        print("""
# Example ~/.snowflake/connections.toml format:
[default]
account = "your_account"
user = "your_username"
password = "your_password"
warehouse = "AISQL_WH"
database = "AISQL_DB"
schema = "AISQL_SCHEMA"
        """)
        raise Exception("Connection file not found")


✅ Connected using connection profile: vinodshiv


In [2]:
# Set context
session.sql("USE DATABASE AISQL_DB").collect()
session.sql("USE SCHEMA AISQL_SCHEMA").collect()
session.sql("USE WAREHOUSE AISQL_WH").collect()


[Row(status='Statement executed successfully.')]

## 1. AI_COMPLETE: Text Generation

Generate responses and summaries for customer emails


In [7]:
# Generate responses to customer emails
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 150) as customer_message,
    AI_COMPLETE('claude-3-7-sonnet', 
        'Generate a professional customer service response to this inquiry: ' || content) as ai_response
FROM emails
LIMIT 5
"""

df = session.sql(sql).to_pandas()
display(Markdown("### AI-Generated Customer Responses"))
display(df)


### AI-Generated Customer Responses

Unnamed: 0,TICKET_ID,CUSTOMER_MESSAGE,AI_RESPONSE
0,1830,I'd appreciate a refund for the unused Saturda...,"""# Response to Refund Request - Order #TR78945..."
1,1462,"Also, quick heads up - there seems to be a gli...","""# Response to Customer Inquiry\n\nThank you f..."
2,177,"While I have you, I wanted to share some feedb...","""# Customer Service Response\n\nThank you for ..."
3,632,"Also, the new paperless ticket system is gener...","""# Response to Customer Inquiry\n\nThank you f..."
4,1813,"Also, I purchased tickets for the Summer Elect...","""# Customer Service Response\n\nDear Valued Cu..."


## 2. AI_FILTER: Boolean Text Classification

Filter and classify text into boolean categories (TRUE/FALSE)

`AI_FILTER` is a powerful function that classifies text inputs into boolean values. It's ideal for filtering data based on natural language conditions.


In [8]:
# Filter emails that express satisfaction
# For text, use PROMPT to combine the predicate with the content
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 150) as content_preview,
    AI_FILTER(PROMPT('Does the customer sound satisfied or happy? {0}', content)) as is_satisfied
FROM emails
WHERE AI_FILTER(PROMPT('Does the customer sound satisfied or happy? {0}', content)) = TRUE
LIMIT 20
"""

df_filter = session.sql(sql).to_pandas()
display(Markdown("### Satisfied Customer Emails"))
display(df_filter)

# Count satisfied vs unsatisfied
sql_counts = """
SELECT 
    AI_FILTER(PROMPT('Does the customer sound satisfied or happy? {0}', content)) as is_satisfied,
    COUNT(*) as count
FROM emails
GROUP BY is_satisfied
"""

df_counts = session.sql(sql_counts).to_pandas()
display(Markdown("### Satisfaction Distribution"))
display(df_counts)

# Visualize
fig = px.pie(df_counts, 
             values='COUNT', 
             names='IS_SATISFIED',
             title='Customer Satisfaction Distribution',
             color='IS_SATISFIED',
             color_discrete_map={True: 'green', False: 'red'})
fig.update_layout(height=400)
fig.show()


### Satisfied Customer Emails

Unnamed: 0,TICKET_ID,CONTENT_PREVIEW,IS_SATISFIED
0,202,"That said, I absolutely loved the production q...",True
1,914,I'd appreciate a response within 48 hours as t...,True
2,1202,I had an amazing time at the symphonic metal s...,True
3,1228,I'd greatly appreciate it if you could assist ...,True


### Satisfaction Distribution

Unnamed: 0,IS_SATISFIED,COUNT
0,False,46
1,True,4


### More AI_FILTER Examples

Filter emails by different criteria


In [9]:
# Filter urgent requests
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 150) as content_preview,
    AI_FILTER(PROMPT('Is this request urgent or time-sensitive? {0}', content)) as is_urgent
FROM emails
WHERE AI_FILTER(PROMPT('Is this request urgent or time-sensitive? {0}', content)) = TRUE
LIMIT 15
"""

df_urgent = session.sql(sql).to_pandas()
display(Markdown("### Urgent Requests"))
display(df_urgent)

# Filter refund requests
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 150) as content_preview,
    AI_FILTER(PROMPT('Does the customer want a refund? {0}', content)) as wants_refund
FROM emails
WHERE AI_FILTER(PROMPT('Does the customer want a refund? {0}', content)) = TRUE
LIMIT 15
"""

df_refund = session.sql(sql).to_pandas()
display(Markdown("### Refund Requests"))
display(df_refund)

# Using AI_FILTER with PROMPT for more complex filtering
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 150) as content_preview,
    AI_FILTER(PROMPT('Does this email mention technical issues or bugs? {0}', content)) as has_technical_issue
FROM emails
WHERE AI_FILTER(PROMPT('Does this email mention technical issues or bugs? {0}', content)) = TRUE
LIMIT 15
"""

df_technical = session.sql(sql).to_pandas()
display(Markdown("### Technical Issues"))
display(df_technical)


### Urgent Requests

Unnamed: 0,TICKET_ID,CONTENT_PREVIEW,IS_URGENT
0,1813,"Also, I purchased tickets for the Summer Elect...",True
1,466,I'm also wondering if you could clarify your r...,True
2,1033,I urgently need assistance with transferring m...,True
3,681,"Also, I purchased VIP passes for the summer mu...",True
4,1399,"On a separate note, I processed a refund for t...",True
5,66,"Also, heads up - the drink prices shown on you...",True
6,519,I've been trying to transfer my tickets for ne...,True
7,914,I'd appreciate a response within 48 hours as t...,True
8,1164,"Also, quick question - my payment for the Summ...",True
9,607,This is really frustrating because I spent $3...,True


### Refund Requests

Unnamed: 0,TICKET_ID,CONTENT_PREVIEW,WANTS_REFUND
0,1830,I'd appreciate a refund for the unused Saturda...,True
1,1462,"Also, quick heads up - there seems to be a gli...",True
2,177,"While I have you, I wanted to share some feedb...",True
3,1813,"Also, I purchased tickets for the Summer Elect...",True
4,466,I'm also wondering if you could clarify your r...,True
5,681,"Also, I purchased VIP passes for the summer mu...",True
6,1035,Here is the rewritten version:\n\nI am writing...,True
7,1399,"On a separate note, I processed a refund for t...",True
8,890,I attended the Summer Vibes Festival last week...,True
9,66,"Also, heads up - the drink prices shown on you...",True


### Technical Issues

Unnamed: 0,TICKET_ID,CONTENT_PREVIEW,HAS_TECHNICAL_ISSUE
0,1462,"Also, quick heads up - there seems to be a gli...",True
1,177,"While I have you, I wanted to share some feedb...",True
2,632,"Also, the new paperless ticket system is gener...",True
3,1310,I attended the electronic music festival at Mo...,True
4,66,"Also, heads up - the drink prices shown on you...",True
5,519,I've been trying to transfer my tickets for ne...,True
6,88,There was also a small issue with the sound sy...,True
7,202,"That said, I absolutely loved the production q...",True
8,1088,"During last week's show, we encountered a mino...",True
9,92,"While I have you, I just wanted to say that th...",True


## 3. AI_CLASSIFY: Multi-Category Text Classification

Classify customer emails into multiple categories (vs AI_FILTER's boolean classification)


In [10]:
# Classify emails by issue type
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 100) as content_preview,
    AI_CLASSIFY(
        'Classify this support ticket into one of these categories: ' || content,
        ARRAY_CONSTRUCT('Billing', 'Technical Issue', 'Event Inquiry', 'Refund Request', 'General Question')
    )['labels'][0] as category
FROM emails
LIMIT 50
"""

df_classify = session.sql(sql).to_pandas()
display(Markdown("### Email Classification"))
display(df_classify.head(10))

# Visualize category distribution
category_counts = df_classify['CATEGORY'].value_counts().reset_index()
category_counts.columns = ['Category', 'Count']

display(Markdown("### Category Distribution"))
fig = px.bar(category_counts, 
             x='Count', 
             y='Category', 
             orientation='h',
             color='Category',
             title='Email Category Distribution')
fig.update_layout(showlegend=False, height=400)
fig.show()


### Email Classification

Unnamed: 0,TICKET_ID,CONTENT_PREVIEW,CATEGORY
0,1830,I'd appreciate a refund for the unused Saturda...,"""Refund Request"""
1,1462,"Also, quick heads up - there seems to be a gli...","""Technical Issue"""
2,177,"While I have you, I wanted to share some feedb...","""Technical Issue"""
3,632,"Also, the new paperless ticket system is gener...","""Technical Issue"""
4,1813,"Also, I purchased tickets for the Summer Elect...","""Refund Request"""
5,1320,"Additionally, I wanted to provide some feedbac...","""General Question"""
6,466,I'm also wondering if you could clarify your r...,"""Event Inquiry"""
7,603,One suggestion - it would be amazing if you co...,"""Event Inquiry"""
8,1033,I urgently need assistance with transferring m...,"""Event Inquiry"""
9,681,"Also, I purchased VIP passes for the summer mu...","""Billing"""


### Category Distribution

## 4. AI_SENTIMENT: Sentiment Analysis

Analyze sentiment of customer emails


In [11]:
# Analyze sentiment
# AI_SENTIMENT returns an OBJECT with categories array containing overall sentiment
sql = """
SELECT 
    ticket_id,
    user_id,
    SUBSTR(content, 1, 150) as content_preview,
    AI_SENTIMENT(content) as sentiment_result,
    AI_SENTIMENT(content)['categories'][0]['sentiment']::STRING as overall_sentiment
FROM emails
LIMIT 100
"""

df_sentiment = session.sql(sql).to_pandas()
display(Markdown("### Sentiment Analysis"))
display(df_sentiment.head(10))

# Sentiment distribution
sentiment_dist = df_sentiment['OVERALL_SENTIMENT'].value_counts().reset_index()
sentiment_dist.columns = ['Sentiment', 'Count']

display(Markdown("### Sentiment Distribution"))
# Map sentiment values to colors
color_map = {
    'positive': 'green',
    'negative': 'red', 
    'neutral': 'gray',
    'mixed': 'orange',
    'unknown': 'lightgray'
}
fig = px.pie(sentiment_dist, 
             values='Count', 
             names='Sentiment',
             title='Overall Sentiment Distribution',
             color='Sentiment',
             color_discrete_map=color_map)
fig.update_layout(height=400)
fig.show()


### Sentiment Analysis

Unnamed: 0,TICKET_ID,USER_ID,CONTENT_PREVIEW,SENTIMENT_RESULT,OVERALL_SENTIMENT
0,1830,14,I'd appreciate a refund for the unused Saturda...,"{\n ""categories"": [\n {\n ""name"": ""ov...",neutral
1,1462,29,"Also, quick heads up - there seems to be a gli...","{\n ""categories"": [\n {\n ""name"": ""ov...",neutral
2,177,166,"While I have you, I wanted to share some feedb...","{\n ""categories"": [\n {\n ""name"": ""ov...",mixed
3,632,57,"Also, the new paperless ticket system is gener...","{\n ""categories"": [\n {\n ""name"": ""ov...",mixed
4,1813,91,"Also, I purchased tickets for the Summer Elect...","{\n ""categories"": [\n {\n ""name"": ""ov...",neutral
5,1320,149,"Additionally, I wanted to provide some feedbac...","{\n ""categories"": [\n {\n ""name"": ""ov...",mixed
6,466,16,I'm also wondering if you could clarify your r...,"{\n ""categories"": [\n {\n ""name"": ""ov...",neutral
7,603,128,One suggestion - it would be amazing if you co...,"{\n ""categories"": [\n {\n ""name"": ""ov...",positive
8,1033,649,I urgently need assistance with transferring m...,"{\n ""categories"": [\n {\n ""name"": ""ov...",neutral
9,681,139,"Also, I purchased VIP passes for the summer mu...","{\n ""categories"": [\n {\n ""name"": ""ov...",negative


### Sentiment Distribution

## 5. AI_EXTRACT: Information Extraction

Extract structured information from unstructured text


In [12]:
# Extract structured information from emails
# AI_EXTRACT requires responseFormat as an object, array, or JSON schema
sql = """
SELECT 
    ticket_id,
    SUBSTR(content, 1, 150) as content_preview,
    AI_EXTRACT(
        content, 
        {
            'main_issue': 'What is the main issue or problem? Be concise.',
            'requested_action': 'What action does the customer want? Be specific.',
            'urgency': 'Is this urgent? Answer yes or no.'
        }
    ):response as extracted_info
FROM emails
LIMIT 20
"""

df_extract = session.sql(sql).to_pandas()
display(Markdown("### Extracted Information"))
df_extract


### Extracted Information

Unnamed: 0,TICKET_ID,CONTENT_PREVIEW,EXTRACTED_INFO
0,1830,I'd appreciate a refund for the unused Saturda...,"{\n ""main_issue"": ""I'd appreciate a refund fo..."
1,1462,"Also, quick heads up - there seems to be a gli...","{\n ""main_issue"": ""venue map isn't loading pr..."
2,177,"While I have you, I wanted to share some feedb...","{\n ""main_issue"": ""payment processing seems s..."
3,632,"Also, the new paperless ticket system is gener...","{\n ""main_issue"": ""app has been super glitchy..."
4,1813,"Also, I purchased tickets for the Summer Elect...","{\n ""main_issue"": ""double-booked that weekend..."
5,1320,"Additionally, I wanted to provide some feedbac...","{\n ""main_issue"": ""notifications about countr..."
6,466,I'm also wondering if you could clarify your r...,"{\n ""main_issue"": ""None"",\n ""requested_actio..."
7,603,One suggestion - it would be amazing if you co...,"{\n ""main_issue"": ""None"",\n ""requested_actio..."
8,1033,I urgently need assistance with transferring m...,"{\n ""main_issue"": ""I urgently need assistance..."
9,681,"Also, I purchased VIP passes for the summer mu...","{\n ""main_issue"": ""VIP passes for the summer ..."


## 6. SUMMARIZE: Text Summarization

Generate concise summaries of customer emails


In [13]:
# Summarize long emails
sql = """
SELECT 
    ticket_id,
    content as original_content,
    SNOWFLAKE.CORTEX.SUMMARIZE(content) as summary,
    LENGTH(content) as original_length,
    LENGTH(SNOWFLAKE.CORTEX.SUMMARIZE(content)) as summary_length
FROM emails
WHERE LENGTH(content) > 200
ORDER BY original_length DESC
LIMIT 5
"""

df_summary = session.sql(sql).to_pandas()
display(Markdown("### Email Summaries"))

for idx, row in df_summary.iterrows():
    display(Markdown(f"#### Ticket {row['TICKET_ID']}"))
    display(Markdown("**Original:**"))
    display(Markdown(row['ORIGINAL_CONTENT']))
    display(Markdown("**Summary:**"))
    display(Markdown(row['SUMMARY']))
    display(Markdown("---"))


### Email Summaries

#### Ticket 1208

**Original:**

I recently attended the indie folk festival at Riverbank Arena and had a mixed experience. The acoustics were top-notch, and the sunset show on the Garden Stage was a highlight. However, I ran into trouble with the mobile ticketing system when trying to enter the venue. The QR code failed to load due to weak signal strength (see attached screenshot), resulting in a 20-minute wait. Unfortunately, I wasn't the only one - several other festival-goers experienced similar issues.

**Summary:**

I attended the indie folk festival at Riverbank Arena and had a mixed experience with great acoustics and a sunset show highlight, but faced ticketing issues due to weak signal strength causing QR code failure and long waits for several attendees.

---

#### Ticket 1520

**Original:**

While I have you, I wanted to mention that I've been a regular attendee at your monthly jazz nights for the past year, and I particularly love the intimate setting of your basement venue. The acoustics are fantastic! However, lately it feels like these events have become less frequent - we used to have them every first Friday, but the last two months were skipped without notice. Any chance you could keep us updated about scheduling changes through your newsletter?

**Summary:**

The speaker has attended the monthly jazz nights at the venue for a year, appreciates the intimate setting and good acoustics, but has noticed a decrease in frequency and requests updates through the newsletter.

---

#### Ticket 1320

**Original:**

Additionally, I wanted to provide some feedback about the new notification system. As someone who primarily listens to indie rock and alternative music, I'm getting bombarded with notifications about country music events. While I appreciate staying informed about upcoming shows, it would be great if the notifications were more tailored to my interests based on my previous ticket purchases or maybe a preference setting I could adjust.

**Summary:**

The user expressed feedback about the new notification system, mentioning an excess of country music notifications as a primary concern, and suggested personalized notifications based on previous ticket purchases or a preference setting.

---

#### Ticket 208

**Original:**

I attended last weekend's indie folk festival at Riverbank Arena and wanted to share my experience. While the acoustics were fantastic and I particularly enjoyed the sunset performance on the Garden Stage, I encountered some issues with the mobile ticketing system. During entry, the QR code wouldn't load properly due to poor reception (screenshot attached), causing a 20-minute delay. Several other attendees faced similar problems.

**Summary:**

Attended indie folk festival at Riverbank Arena last weekend, enjoyed the acoustics and sunset performance, but experienced issues with mobile ticketing system due to poor reception, causing delays for several attendees.

---

#### Ticket 1454

**Original:**

While I've got your attention, I also  wanted to mention that I love what you're doing with the new underground electronic music series. The monthly frequency is perfect, and as someone who's really into experimental techno and ambient, it's refreshing to see these genres getting proper representation. However, the sound system at The Blue Room venue could use some improvement, especially for the bass-heavy sets.

**Summary:**

The person expresses appreciation for the new electronic music series, enjoys the monthly frequency, and is a fan of experimental techno and ambient genres. They suggest improving the sound system at The Blue Room venue, particularly for bass-heavy sets.

---