# NovaConnect Snowflake Intelligence - Hands-On Lab

## Learning Objectives

This hands-on lab teaches you to analyze telecommunications data using Snowflake Cortex AI functions. You will:

1. Query and explore structured and unstructured data
2. Use Cortex AISQL functions for text analysis and translation
3. Combine customer, network, and call data for insights
4. Create visualizations to identify business opportunities
5. Leverage Cortex Search for semantic queries
6. Calculate business metrics and risk indicators

**Prerequisites:** 
- data_processing.ipynb has been run successfully
- **Important:** Add `matplotlib` to notebook packages
  - In Snowflake notebook settings
  - Click "Packages" dropdown
  - Type "matplotlib" and add it
  - This enables all visualizations in this lab

**Duration:** 45-60 minutes


In [None]:
# Setup
from snowflake.snowpark import Session
import pandas as pd
import matplotlib.pyplot as plt

session = Session.builder.getOrCreate()
print(f"Connected: {session.get_current_database()}.{session.get_current_schema()}")


## Exercise 1: Data Exploration and Summary Statistics

**What you'll learn:**
- Query multiple tables to understand data volume
- View sample records from processed transcripts
- Verify data loading was successful

**Business value:** Understand the scope of available data for analysis


In [None]:
# Check all tables (using fully qualified names)
print("Data Volume Summary:\n")

session.sql("""
SELECT 'network_performance' AS TABLE_NAME, COUNT(*) AS RECORD_COUNT FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.network_performance
UNION ALL
SELECT 'customer_details', COUNT(*) FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_details
UNION ALL
SELECT 'customer_call_transcripts', COUNT(*) FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts
UNION ALL
SELECT 'customer_complaint_documents', COUNT(*) FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_complaint_documents
UNION ALL
SELECT 'csat_surveys', COUNT(*) FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.csat_surveys
UNION ALL
SELECT 'customer_interaction_history', COUNT(*) FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_interaction_history
ORDER BY RECORD_COUNT DESC
""").show()

# Sample calls
print("\nSample Customer Calls:")
session.sql("""
SELECT 
    call_id AS CALL_ID,
    call_reason AS CALL_REASON,
    summary AS SUMMARY
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts
LIMIT 5
""").show()


## Exercise 2: Multilingual Translation with AI_TRANSLATE

**What you'll learn:**
- Use AI_TRANSLATE to convert text between languages
- Support multilingual teams with automatic translation
- Understand supported language codes

**Business value:** Enable global teams to analyze customer feedback in their preferred language


In [None]:
# Translate summaries to Chinese and Japanese (for regional teams)
# Note: AI_TRANSLATE supports: en, zh, ja, ko, es, fr, de, it, pt, etc.

print("Multi-Language Translation Example:\n")
print("Original Summaries:")
session.sql("""
SELECT 
    call_id AS CALL_ID,
    summary AS SUMMARY_ENGLISH
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts 
LIMIT 2
""").show()

print("\n" + "="*80)
print("Translations:")
print("="*80 + "\n")

# Show translations for first 2 call summaries
session.sql("""
SELECT 
    call_id AS CALL_ID,
    summary AS ENGLISH,
    SNOWFLAKE.CORTEX.AI_TRANSLATE(summary, 'en', 'zh') AS CHINESE,
    SNOWFLAKE.CORTEX.AI_TRANSLATE(summary, 'en', 'ja') AS JAPANESE
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts 
LIMIT 2
""").show()


## Exercise 3: Customer Sentiment Analysis with Visualization

**What you'll learn:**
- Extract sentiment from AI_SENTIMENT VARIANT results
- Aggregate sentiment across all customer calls
- Create bar charts to visualize sentiment distribution

**Business value:** Identify patterns in customer emotions and satisfaction levels


In [None]:
# Sentiment distribution
sentiment_query = """
SELECT 
    sentiment_score:categories[0]:sentiment::VARCHAR as SENTIMENT,
    COUNT(*) as COUNT
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts
GROUP BY sentiment
"""

# Display nicely with Snowflake native format
print("Customer Sentiment Distribution:\n")
session.sql(sentiment_query).show()

# Get data for chart
sentiment = session.sql(sentiment_query).to_pandas()

# Visualize (column names are uppercase from Snowflake)
colors = {'positive': 'green', 'negative': 'red', 'neutral': 'gray', 'mixed': 'orange'}
plt.figure(figsize=(8, 5))
plt.bar(sentiment['SENTIMENT'], sentiment['COUNT'], 
        color=[colors.get(s, 'blue') for s in sentiment['SENTIMENT']])
plt.ylabel('Number of Calls')
plt.title('Sentiment Distribution')
plt.show()


## Exercise 4: Customer Satisfaction (CSAT) Analysis

**What you'll learn:**
- Analyze CSAT score distribution across customer surveys
- Calculate average satisfaction metrics
- Create histograms to identify satisfaction trends

**Business value:** Measure customer satisfaction and track service quality performance


In [None]:
# CSAT distribution
csat_query = """
SELECT csat_score AS CSAT_SCORE, 
       COUNT(*) as COUNT
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.csat_surveys
GROUP BY csat_score
ORDER BY csat_score
"""

# Display nicely with Snowflake native format
print("CSAT Score Distribution:\n")
session.sql(csat_query).show()

# Get data for chart and average calculation
csat = session.sql(csat_query).to_pandas()

# Visualize
plt.figure(figsize=(8, 5))
plt.bar(csat['CSAT_SCORE'], csat['COUNT'], color='teal')
plt.xlabel('CSAT Score (1-5)')
plt.ylabel('Count')
plt.title('CSAT Distribution')
plt.xticks([1, 2, 3, 4, 5])
plt.show()

avg = (csat['CSAT_SCORE'] * csat['COUNT']).sum() / csat['COUNT'].sum()
print(f"\nAverage CSAT: {avg:.2f}")


## Exercise 5: Network Performance Analysis by Region

**What you'll learn:**
- Aggregate network metrics by geographic region
- Compare performance across Malaysian states
- Identify regions with network quality issues

**Business value:** Prioritize infrastructure investments based on regional performance


In [None]:
# Performance by region (use all available data)
perf_query = """
SELECT region AS REGION, 
       AVG(avg_latency_ms) as LATENCY,
       AVG(packet_loss_pct) as PACKET_LOSS
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.network_performance
GROUP BY region
ORDER BY latency DESC
"""

# Display nicely with Snowflake native format
print("Network Performance by Region:\n")
session.sql(perf_query).show()

# Get data for chart
perf = session.sql(perf_query).to_pandas()

# Visualize
if len(perf) > 0:
    plt.figure(figsize=(10, 5))
    plt.barh(perf['REGION'], perf['LATENCY'])
    plt.xlabel('Average Latency (ms)')
    plt.title('Network Latency by Region')
    plt.tight_layout()
    plt.show()
else:
    print("No network performance data available")


## Exercise 6: Semantic Search with Cortex Search

**What you'll learn:**
- Perform semantic search across call transcripts using Cortex Search
- Use natural language queries to find relevant customer interactions
- Search without exact keyword matching

**Business value:** Quickly find relevant customer interactions using natural language queries

**Available Search Services:**
- `CALL_TRANSCRIPT_SEARCH` - Search across customer call transcripts
- `SUPPORT_TICKET_SEARCH` - Search across support tickets

**Note:** If search service doesn't exist, verify with `SHOW CORTEX SEARCH SERVICES;` in SQL


In [None]:
# Search across call transcripts using Cortex Search
print("Searching for 'network problems' in customer call transcripts...\n")

session.sql("""
WITH search_response AS (
  SELECT PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.CALL_TRANSCRIPT_SEARCH',
        '{
           "query": "network problems",
           "columns": ["CALL_ID", "SPEAKER_ROLE", "SENTIMENT_SCORE", "CALL_TIMESTAMP"],
           "limit": 5
        }'
    )
  ) as response
)
SELECT 
    value:CALL_ID::VARCHAR as CALL_ID,
    value:SPEAKER_ROLE::VARCHAR as SPEAKER_ROLE,
    value:SENTIMENT_SCORE::FLOAT as SENTIMENT_SCORE,
    value:CALL_TIMESTAMP::TIMESTAMP as CALL_TIMESTAMP,
    value:SEGMENT_TEXT::VARCHAR as MATCHING_TEXT
FROM search_response,
LATERAL FLATTEN(input => response:results)
""").show()


## Exercise 7: Customer Risk Identification and Segmentation

**What you'll learn:**
- Use customer_360_view to identify at-risk customers
- Segment customers by risk level
- Visualize risk distribution across customer segments

**Business value:** Proactively identify customers likely to churn for retention interventions


In [None]:
# Find at-risk customers
risk_query = """
SELECT customer_segment AS CUSTOMER_SEGMENT, 
       COUNT(*) as TOTAL,
       SUM(CASE WHEN is_at_risk THEN 1 ELSE 0 END) as AT_RISK
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_360_view
GROUP BY customer_segment
"""

# Display nicely with Snowflake native format
print("At-Risk Customers by Segment:\n")
session.sql(risk_query).show()

# Get data for chart
risk = session.sql(risk_query).to_pandas()

# Visualize
plt.figure(figsize=(10, 5))
x = range(len(risk))
plt.bar(x, risk['TOTAL'], label='Total', alpha=0.6)
plt.bar(x, risk['AT_RISK'], label='At Risk', color='red')
plt.xticks(x, risk['CUSTOMER_SEGMENT'])
plt.ylabel('Customers')
plt.title('Customer Risk by Segment')
plt.legend()
plt.show()


## Exercise 8: Revenue Impact and Risk Quantification

**What you'll learn:**
- Calculate revenue at risk from dissatisfied customers
- Aggregate financial impact by customer segment
- Visualize revenue exposure from potential churn

**Business value:** Quantify financial impact of customer dissatisfaction to justify retention investments


In [None]:
# Revenue at risk
revenue_query = """
SELECT customer_segment AS CUSTOMER_SEGMENT,
       SUM(CASE WHEN is_at_risk THEN monthly_revenue ELSE 0 END) as REVENUE_AT_RISK
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_360_view
GROUP BY customer_segment
ORDER BY revenue_at_risk DESC
"""

# Display nicely with Snowflake native format
print("Revenue at Risk by Customer Segment:\n")
session.sql(revenue_query).show()

# Get data for chart and total calculation
revenue = session.sql(revenue_query).to_pandas()

# Visualize
plt.figure(figsize=(8, 5))
plt.bar(revenue['CUSTOMER_SEGMENT'], revenue['REVENUE_AT_RISK'], color='darkred')
plt.ylabel('Monthly Revenue at Risk (RM)')
plt.title('Revenue at Risk by Segment')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print(f"\nTotal at Risk: RM {revenue['REVENUE_AT_RISK'].sum():,.2f}")


## Exercise 9: Data Privacy with AI_REDACT

**What you'll learn:**
- Use AI_REDACT to automatically remove PII from text
- Protect customer privacy while enabling data analysis
- Create anonymized datasets for sharing

**Business value:** Comply with data privacy regulations while maintaining analytical capabilities


In [None]:
import streamlit as st
# Redact PII from transcript
st.markdown("## Use case: Share data for analysis while protecting customer privacy and complying with data protection regulations")
st.markdown("## PII Redaction Example:\n")
st.markdown("#### Original Call Transcript (with PII):")


call_id = st.selectbox('Select Call:',(session.table('TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts').select('CALL_ID').distinct()))

# Show original transcript
st.write(session.sql("""
SELECT 
    call_id AS CALL_ID,
    SUBSTR(transcript_text, 1, 200) AS ORIGINAL_TRANSCRIPT
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts

where call_id = call_id
LIMIT 1
""").collect()[0][1])

st.markdown("#### After AI_REDACT (PII removed):")


# Show redacted version
st.write(session.sql("""
SELECT 
    call_id AS CALL_ID,
    SUBSTR(SNOWFLAKE.CORTEX.AI_REDACT(transcript_text), 1, 200) AS REDACTED_TRANSCRIPT
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts
where call_id = call_id
LIMIT 1
""").collect()[0][1])



## Exercise 10: Custom Analysis and Experimentation

**What you'll learn:**
- Apply learned concepts to create custom analyses
- Combine multiple AI functions for complex insights
- Develop your own SQL queries and visualizations

**Business value:** Build custom analytics tailored to specific business questions

**Try this:**
- Analyze competitor mentions and customer sentiment
- Create additional visualizations
- Experiment with other Cortex AI functions (AI_COMPLETE, AI_EXTRACT, etc.)


In [None]:
# Example: Calls mentioning competitors
print("Calls Mentioning Competitors:\n")
session.sql("""
SELECT call_id AS CALL_ID, 
       call_reason AS CALL_REASON,
       sentiment_score:categories[0]:sentiment::VARCHAR as SENTIMENT,
       summary AS SUMMARY
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_call_transcripts
WHERE LOWER(transcript_text) LIKE '%maxis%' 
   OR LOWER(transcript_text) LIKE '%mobile%'
LIMIT 5
""").show()

# Your custom analysis here:
# - Try different SQL queries
# - Use other AI functions (AI_COMPLETE, AI_CLASSIFY)
# - Create your own visualizations

print("\\nLab Complete! Now use Snowflake Intelligence Agent for natural language queries.")


## Exercise 11: ML Prediction - Churn Risk Scoring Model

**What you'll learn:**
- Build a weighted scoring model to predict customer churn
- Use multiple features (CSAT, complaints, unresolved issues) for prediction
- Compare predicted vs actual churn rates
- Visualize model performance with 4 comprehensive charts

**Business objectives:**
- **Proactive Retention:** Identify at-risk customers before they churn to competitors (Maxis, U Mobile)
- **Resource Optimization:** Prioritize retention efforts on high-risk, high-value customers
- **Revenue Protection:** Quantify and prevent revenue loss from customer churn
- **Operational Efficiency:** Automate churn risk assessment instead of manual analysis

**Model features and weights:**
- Low CSAT Score (<3): 30 points - Strong predictor of dissatisfaction
- High Complaints (>2): 25 points - Indicates ongoing service issues
- Unresolved Issues (>0): 25 points - Critical driver of churn
- Frequent Calls (>2): 20 points - Sign of persistent problems

**Risk threshold:** Score >50 indicates high churn probability

**Expected outcome:** Actionable list of customers requiring immediate retention intervention


In [None]:
# Simple churn prediction model
churn = session.sql("""
SELECT 
    customer_id,
    customer_segment,
    total_calls,
    total_complaints,
    avg_csat_score,
    unresolved_issues,
    is_at_risk,
    is_churned
FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_360_view c
JOIN TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_details d USING (customer_id)
WHERE total_calls > 0
""").to_pandas()

print(f"Dataset: {len(churn)} customers\\n")

# Calculate churn risk score
churn['risk_score'] = (
    (churn['AVG_CSAT_SCORE'] < 3).astype(int) * 30 +  # Low satisfaction
    (churn['TOTAL_COMPLAINTS'] > 2).astype(int) * 25 +  # Multiple complaints
    (churn['UNRESOLVED_ISSUES'] > 0).astype(int) * 25 +  # Unresolved issues
    (churn['TOTAL_CALLS'] > 2).astype(int) * 20  # Frequent caller
)

churn['predicted_churn'] = churn['risk_score'] > 50

# Results
print("Prediction Results:")
print(f"Predicted High Risk: {churn['predicted_churn'].sum()}")
print(f"Actual Churned: {churn['IS_CHURNED'].sum()}")

# Visualizations
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(14, 10))

# 1. Risk Score Distribution
ax1.hist(churn['risk_score'], bins=15, color='coral', edgecolor='black', alpha=0.7)
ax1.axvline(x=50, color='red', linestyle='--', linewidth=2, label='Churn Threshold')
ax1.set_xlabel('Churn Risk Score')
ax1.set_ylabel('Number of Customers')
ax1.set_title('Churn Risk Score Distribution', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Feature Importance
features = ['Low CSAT', 'High Complaints', 'Unresolved Issues', 'Frequent Caller']
weights = [30, 25, 25, 20]
ax2.barh(features, weights, color=['red', 'orange', 'coral', 'yellow'])
ax2.set_xlabel('Weight in Risk Score')
ax2.set_title('Churn Prediction Features', fontweight='bold')
ax2.grid(axis='x', alpha=0.3)

# 3. CSAT vs Risk Score
colors = churn['IS_CHURNED'].map({True: 'red', False: 'green'})
ax3.scatter(churn['AVG_CSAT_SCORE'], churn['risk_score'], c=colors, alpha=0.6, s=100)
ax3.set_xlabel('Average CSAT Score')
ax3.set_ylabel('Churn Risk Score')
ax3.set_title('CSAT vs Churn Risk (Red=Churned, Green=Active)', fontweight='bold')
ax3.axhline(y=50, color='red', linestyle='--', alpha=0.5)
ax3.axvline(x=3, color='orange', linestyle='--', alpha=0.5)
ax3.grid(True, alpha=0.3)

# 4. Predictions by Segment
seg = churn.groupby('CUSTOMER_SEGMENT').agg({
    'predicted_churn': 'sum',
    'IS_CHURNED': 'sum'
})
x = range(len(seg))
width = 0.35
ax4.bar([i-width/2 for i in x], seg['predicted_churn'], width, label='Predicted', color='orange', alpha=0.8)
ax4.bar([i+width/2 for i in x], seg['IS_CHURNED'], width, label='Actual', color='red', alpha=0.8)
ax4.set_xticks(x)
ax4.set_xticklabels(seg.index, rotation=45, ha='right')
ax4.set_ylabel('Number of Customers')
ax4.set_title('Churn: Predicted vs Actual by Segment', fontweight='bold')
ax4.legend()
ax4.grid(axis='y', alpha=0.3)

plt.suptitle('Churn Prediction Model', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nUse this model to:")
print("- Identify at-risk customers before they churn")
print("- Prioritize retention efforts by risk score")
print("- Target high-value customers with proactive support")

# Show top 10 at-risk customers in a table
print("\n" + "="*80)
print("TOP 10 AT-RISK CUSTOMERS - IMMEDIATE ACTION REQUIRED")
print("="*80 + "\n")

session.sql("""
WITH churn_scores AS (
    SELECT 
        c.customer_id,
        c.customer_segment,
        c.total_calls,
        c.total_complaints,
        c.avg_csat_score,
        c.unresolved_issues,
        c.is_at_risk,
        d.is_churned,
        d.monthly_revenue,
        (
            (CASE WHEN c.avg_csat_score < 3 THEN 1 ELSE 0 END * 30) +
            (CASE WHEN c.total_complaints > 2 THEN 1 ELSE 0 END * 25) +
            (CASE WHEN c.unresolved_issues > 0 THEN 1 ELSE 0 END * 25) +
            (CASE WHEN c.total_calls > 2 THEN 1 ELSE 0 END * 20)
        ) as risk_score
    FROM TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_360_view c
    JOIN TELCO_OPERATIONS_AI.DEFAULT_SCHEMA.customer_details d ON c.customer_id = d.customer_id
    WHERE c.total_calls > 0
)
SELECT 
    customer_id AS CUSTOMER_ID,
    customer_segment AS SEGMENT,
    risk_score AS RISK_SCORE,
    monthly_revenue AS MONTHLY_REVENUE,
    avg_csat_score AS CSAT,
    total_complaints AS COMPLAINTS,
    unresolved_issues AS UNRESOLVED
FROM churn_scores
WHERE risk_score > 50
ORDER BY risk_score DESC, monthly_revenue DESC
LIMIT 10
""").show()


## Lab Summary and Completion

### What You Accomplished

Congratulations! You have completed all 11 exercises and learned to:

**Cortex AI Functions:**
- AI_TRANSLATE for multilingual support
- AI_REDACT for data privacy compliance
- AI_SENTIMENT for emotion analysis (VARIANT handling)
- Cortex Search for semantic queries across structured and unstructured data

**Data Analysis:**
- Explored 81,425+ records across multiple tables
- Analyzed customer call transcripts and PDF documents
- Calculated CSAT scores and sentiment patterns
- Identified network performance issues by region

**Business Intelligence:**
- Identified at-risk customers using customer_360_view
- Quantified revenue at risk from potential churn
- Built churn prediction model with feature scoring
- Created visualizations for executive dashboards

### Key Takeaways

1. **Structured + Unstructured Integration:** Snowflake seamlessly combines CSV data, audio transcripts, and PDF documents
2. **AI-Powered Insights:** Cortex AI functions extract meaning from unstructured text without manual coding
3. **Business Impact:** Data directly translates to actionable metrics (revenue at risk, churn probability)
4. **Scalability:** Same techniques work for 25 calls or 25,000 calls

### Next Steps

**1. Use Snowflake Intelligence Agent:**
Ask natural language questions like:
- "Which customers in Penang have the highest churn risk?"
- "What are the top network issues causing customer complaints?"
- "Show me revenue at risk from business customers"

**2. Build Production Solutions:**
- Create scheduled dashboards
- Set up alerts for at-risk customers
- Automate churn prediction scoring

**3. Extend the Analysis:**
- Add more data sources (social media, NPS surveys)
- Build advanced ML models
- Create real-time monitoring dashboards

**Thank you for completing the NovaConnect Snowflake Intelligence Hands-On Lab!**