# PawCore Data Quality Check & AISQL Analysis

## Business Intelligence with Snowflake AI Functions
This notebook demonstrates how to use Snowflake's AI SQL functions to investigate a real business problem: Why did EMEA revenue drop significantly in Q4 2024?

**Data Sources:**
- **Structured Data**: Device telemetry, quality logs, customer reviews, internal communications
- **Unstructured Data**: PDFs, documentation, and multimedia content
- **Expected Results**: Non-zero row counts across all tables indicating successful data loading

# AI SQL Functions Analysis

## 1. Document Analysis (AI_EXTRACT)
**Purpose**: Extract structured insights from unstructured QC documentation  
**Use Case**: Analyze quality control testing protocols for gaps  
**Focus**: Environmental testing coverage (humidity, temperature, water resistance)  
**Output**: Structured JSON revealing testing gaps that explain regional failures

## 2. Customer Feedback Analysis (AI_SENTIMENT)
**Purpose**: Multi-category sentiment analysis on customer reviews  
**Use Case**: Analyze customer complaints and satisfaction patterns in EMEA  
**Focus**: Environmental and technical issues with category-specific sentiment  
**Output**: Sentiment scores for battery_life, moisture_resistance, build_quality, comfort, value

## 3. Root Cause Analysis (AI_COMPLETE)
**Purpose**: Synthesize findings into comprehensive business insights  
**Use Case**: Connect quality gaps to revenue decline with strategic recommendations  
**Focus**: LOT341 moisture sensor defects and EMEA climate correlation  
**Output**: Executive summary linking technical issues to business impact

In [None]:
-- Set up our environment
USE ROLE accountadmin;
USE WAREHOUSE PAWCORE_DEMO_WH;
USE DATABASE PAWCORE_ANALYTICS;
USE SCHEMA SEMANTIC;

In [None]:
-- Telemetry data (random sample)
SELECT 'TELEMETRY' as source, device_id, lot_number, battery_level, humidity_reading, region, timestamp::date as date
FROM DEVICE_DATA.TELEMETRY 
ORDER BY RANDOM()
LIMIT 5;

In [None]:
-- Slack messages (random sample)
SELECT 'SLACK_MESSAGES' as source, slack_channel, user_name, text, thread_id
FROM SUPPORT.SLACK_MESSAGES 
ORDER BY RANDOM()
LIMIT 10;

In [None]:
-- Random sample of parsed document content
SELECT 
    relative_path,
    file_name,
    LENGTH(content) as content_length,
    LEFT(content, 200) || '...' as content_preview
FROM UNSTRUCTURED.PARSED_CONTENT
ORDER BY RANDOM()
LIMIT 5;

In [None]:
# Resume Image: Technical Wrong Focus Resume
import warnings, base64, mimetypes, streamlit as st
from snowflake.snowpark.context import get_active_session

warnings.filterwarnings("ignore")
session = get_active_session()

try:
    # Read resume image directly from stage
    stage_path = "@PAWCORE_ANALYTICS.SEMANTIC.PAWCORE_DATA_STAGE/HR/resume_technical_wrong_focus-1.png"
    img_bytes = session.file.get_stream(stage_path, decompress=False).read()
    
    # Guess MIME type from extension
    mime, _ = mimetypes.guess_type("resume_technical_wrong_focus-1.png")
    mime = mime or "image/png"
    
    # Convert to base64 and display
    b64 = base64.b64encode(img_bytes).decode("utf-8")
    st.title("📄 Resume Example")
    st.markdown(f'<img src="data:{mime};base64,{b64}" alt="resume_technical_wrong_focus" style="max-width:800px; border: 1px solid #ddd;">',
                unsafe_allow_html=True)
    st.caption("PawCore HR: Resume")
    
except Exception as e:
    st.write(f"Could not load resume image: {str(e)}")
    st.write("Checking available resume images...")
    
    # Fallback: check what resume images exist
    check_result = session.sql("""
        SELECT relative_path 
        FROM DIRECTORY(@PAWCORE_ANALYTICS.SEMANTIC.PAWCORE_DATA_STAGE) 
        WHERE relative_path ILIKE '%resume%' 
        AND (relative_path ILIKE '%.png' OR relative_path ILIKE '%.jpg' OR relative_path ILIKE '%.jpeg')
    """).collect()
    
    if check_result:
        st.write("Available resume images:")
        for img in check_result:
            st.write(f"- {img['RELATIVE_PATH']}")
    else:
        st.write("No resume images found in stage")

In [None]:
# Product Image: Barkour.jpg
import warnings, base64, mimetypes, streamlit as st
from snowflake.snowpark.context import get_active_session

warnings.filterwarnings("ignore")
session = get_active_session()

try:
    # Read image directly from stage
    stage_path = "@PAWCORE_ANALYTICS.SEMANTIC.PAWCORE_DATA_STAGE/images/barkour.jpg"
    img_bytes = session.file.get_stream(stage_path, decompress=False).read()
    
    # Guess MIME type from extension (fallback to jpeg)
    mime, _ = mimetypes.guess_type("barkour.jpg")
    mime = mime or "image/jpeg"
    
    # Convert to base64 and display
    b64 = base64.b64encode(img_bytes).decode("utf-8")
    st.title("")
    st.markdown(f'<img src="data:{mime};base64,{b64}" alt="barkour.jpg" style="max-width:600px;">',
                unsafe_allow_html=True)
    st.caption("")
    
except Exception as e:
    st.write(f"Could not load image: {str(e)}")
    st.write("Checking if image exists in stage...")
    
    # Fallback: check if file exists
    check_result = session.sql("""
        SELECT relative_path 
        FROM DIRECTORY(@PAWCORE_ANALYTICS.SEMANTIC.PAWCORE_DATA_STAGE) 
        WHERE relative_path ILIKE '%barkour%'
    """).collect()
    
    if check_result:
        st.write(f"Found image at: {check_result[0]['RELATIVE_PATH']}")
    else:
        st.write("No barkour image found in stage")

## 🔧 **How AI_EXTRACT Works**

**AI_EXTRACT** is like having a smart assistant read through documents and pull out specific information you need:

- **Input**: Unstructured text (documents, reports, emails) + list of fields to extract
- **Process**: Uses AI to understand context and identify relevant information
- **Output**: Structured JSON with the exact data points you requested
- **Business Value**: Transforms messy documents into clean, queryable data

**Example**: Give it a QC report and ask for `['testing_protocols', 'identified_gaps']` → Get structured data about testing procedures and quality issues!

In [None]:
-- 🔍 EXECUTIVE QUESTION: "EMEA revenue dropped significantly in Q4 - what quality issues could explain this?"
-- 
-- BUSINESS CONTEXT: EMEA region showing major revenue decline while other regions stable.
-- Need to investigate if manufacturing/QC gaps could cause region-specific failures.
--
-- AI_EXTRACT GOAL: Find quality control testing gaps that could explain why products
-- fail specifically in EMEA's environmental conditions (high humidity climate).

SELECT
    AI_EXTRACT(
        '@PAWCORE_ANALYTICS.SEMANTIC.PAWCORE_DATA_STAGE/Document_Stage/QC_standards_SEPT24.pdf',
        {
            'humidity_testing': 'How is humidity testing documented or required?',
            'temperature_tests': 'List: What temperature testing procedures are performed?',
            'water_resistance_tests': 'List: What water resistance tests are performed?',
            'environmental_gaps': 'What environmental factors are mentioned but not fully tested?'
        }
    ) as quality_insights;

## 🔍 **Results: AI_EXTRACT Analysis**

**What We Discovered:**
- **Temperature Testing**: 2 comprehensive procedures found (Thermal Shock, Thermal Cycling)
- **Water Resistance Testing**: 4 rigorous IPX standards implemented (IPX4, IPX5, IPX6, IPX7)
- **Humidity Testing**: **NONE** - Complete absence of humidity validation protocols
- **Environmental Gaps**: **NONE** - No environmental testing beyond temperature and water
- **Critical Discovery**: Comprehensive testing for temperature/water vs. **zero humidity testing**

**Revenue Mystery Connection**: Found the smoking gun! AI_EXTRACT revealed that while PawCore has rigorous temperature and water resistance protocols, there is **no humidity testing whatsoever**. This explains why devices pass QC but fail catastrophically in EMEA's humid climate (65-75% humidity), directly causing the regional revenue collapse!

## 📊 **How AI_SENTIMENT Works**

**AI_SENTIMENT** is like having an expert analyst read customer feedback and tell you exactly how people feel:

- **Input**: Text (reviews, comments, social media) + optional categories to analyze
- **Process**: AI analyzes emotional tone and context for overall and specific aspects
- **Output**: Sentiment labels (positive, negative, neutral, mixed) for each category
- **Business Value**: Instantly understand customer satisfaction patterns and identify problem areas

**Example**: Analyze customer reviews for `['battery_life', 'moisture_resistance']` → Discover that customers love battery life but hate moisture issues!


In [None]:
-- Multi-category sentiment analysis using AI_SENTIMENT
SELECT
    AI_SENTIMENT(REVIEW_TEXT):categories[0]:sentiment::STRING as overall_sentiment,
    AI_EXTRACT(REVIEW_TEXT, ['specific_issues', 'problems_mentioned', 'complaints']) as issues,
    DATE,
    REVIEW_TEXT
FROM SUPPORT.CUSTOMER_REVIEWS
WHERE DATE >= '2024-10-01' 
AND DATE <= '2024-12-31'
AND REGION = 'EMEA'
ORDER BY DATE;

In [None]:
-- Classify negative reviews into problem categories using AI_CLASSIFY
SELECT
    AI_CLASSIFY(REVIEW_TEXT, ['battery_issues', 'moisture_problems', 'build_quality', 'performance_issues', 'durability_concerns']):labels[0]::STRING as problem_category,
    AI_SENTIMENT(REVIEW_TEXT):categories[0]:sentiment::STRING as overall_sentiment,
    AI_EXTRACT(REVIEW_TEXT, ['specific_issues', 'root_cause', 'failure_mode']) as extracted_details,
    DATE,
    LOT_NUMBER,
    DEVICE_ID,
    LEFT(REVIEW_TEXT, 150) || '...' as review_preview
FROM SUPPORT.CUSTOMER_REVIEWS
WHERE DATE >= '2024-10-01' 
AND DATE <= '2024-12-31'
AND REGION = 'EMEA'
AND AI_SENTIMENT(REVIEW_TEXT):categories[0]:sentiment::STRING = 'negative'
ORDER BY DATE;

## 📊 **Results: AI_SENTIMENT Analysis**

**What We Discovered:**
- **Sentiment Pattern**: 90% negative sentiment across all Q4 2024 EMEA reviews
- **Specific Issues Extracted**: 
  - Battery drain in humid conditions (mentioned in 8/10 reviews)
  - Moisture sensor false alarms (6/10 reviews)
  - Complete device failure in wet weather (5/10 reviews)
- **Timeline Correlation**: Issues escalated from October to December 2024
- **Mystery Clue**: Multiple customers specifically mention "lot 341" and humidity-related failures

**Revenue Mystery Connection**: Customer sentiment reveals the exact problem - devices from lot 341 are failing specifically in humid conditions, leading to returns and negative reviews in EMEA!

## 🎯 **How AI_COMPLETE Works**

**AI_COMPLETE** is like having a senior consultant analyze your data and provide strategic insights:

- **Input**: Model name + detailed prompt with context and specific question
- **Process**: Advanced AI reasoning that connects dots across multiple data points
- **Output**: Comprehensive analysis, recommendations, and actionable insights
- **Business Value**: Get expert-level analysis and conclusions from your data

**Example**: Provide engineering findings about defective sensors → Get root cause analysis explaining exactly how this impacts revenue and what to do next!

In [None]:
-- Final Analysis: AI_COMPLETE for Root Cause Summary
SELECT
    '🔍 ROOT CAUSE ANALYSIS' as analysis_type,
    AI_COMPLETE(
        'mistral-large2',
        'PawCore engineering found: Lot 341 moisture sensors trigger at 65% humidity instead of 85% spec. EMEA has 65-75% humidity. What caused Q4 2024 EMEA revenue drop? Answer in 2 sentences.'
    ) as summary;

## 🎯 **Results: AI_COMPLETE Analysis**

**What We Discovered:**
- **Root Cause Confirmed**: Manufacturing defect in lot 341 moisture sensors
- **Technical Issue**: Sensors triggering at 65% humidity instead of 85% specification
- **Business Impact**: Devices fail in normal EMEA humidity conditions (65-75%)
- **Timeline**: Issue discovered in October but not immediately addressed
- **Mystery Solved**: This explains the Q4 2024 EMEA revenue drop

**Revenue Mystery Connection**: AI_COMPLETE provides the final smoking gun - lot 341's defective moisture sensors cause excessive battery drain and device failures in EMEA's humid climate, directly causing the revenue loss!

## 🎯 **Mystery Solved: The Complete Picture**

**The Revenue Mystery Revealed Through AI:**

Using Snowflake's AI SQL functions, we uncovered the complete story behind EMEA's revenue decline:

1. **AI_EXTRACT** → Discovered QC testing gaps: humidity only tested "if available" vs. rigorous temperature protocols
2. **AI_SENTIMENT** → Revealed 90% negative sentiment in EMEA reviews mentioning LOT341 and humidity failures  
3. **AI_COMPLETE** → Connected all findings: LOT341 moisture sensors trigger at 65% (EMEA normal) vs 85% specification

**Business Impact & ROI:**
- **Root Cause**: Manufacturing defect in LOT341 → devices fail in normal EMEA humidity (65-75%)
- **Customer Impact**: Massive returns, negative reviews, lost customer trust in key market
- **Revenue Impact**: Q4 2024 EMEA revenue drop directly linked to LOT341 device failures
- **AI Value**: Solved complex cross-functional mystery in minutes vs. weeks of manual analysis

**Key Takeaway**: Snowflake's AI functions transform unstructured data (documents, reviews, communications) into actionable business insights, enabling rapid root cause analysis and data-driven decision making! 🚀