# Build Rag Based Equipment Maintenance App Using Snowflake Cortex

## Step 1: Setup Environment


## Database Schema Setup for LLM RAG System

### Purpose
This SQL script establishes the core database infrastructure for a Retrieval Augmented Generation (RAG) system. The setup creates a dedicated environment for LLM operations with appropriate compute and storage resources.

### Architecture Components
1. **Warehouse Layer**
   - Medium-sized compute resources
   - 5-minute auto-suspension for cost optimization

2. **Database Layer** 
   - Dedicated LLM database
   - RAG-specific schema for organized data storage

### Technical Specifications
- Warehouse: Medium size with 300s auto-suspend
- Database Name: LLM
- Schema: RAG
- Role: DEMOADMIN for administrative access

### Usage Notes
- The warehouse auto-suspends after 5 minutes of inactivity
- All RAG-related objects should be created within the LLM.RAG schema



In [None]:
USE ROLE DEMOADMIN;

CREATE OR REPLACE WAREHOUSE Medium WAREHOUSE_SIZE='Medium' AUTO_SUSPEND = 300;
CREATE OR REPLACE DATABASE LLM;
CREATE OR REPLACE SCHEMA RAG;

USE LLM.RAG;


## Internal Stage Creation for RAG System

### Purpose
Creates a dedicated internal stage to store and manage repair manual documents for the RAG-based maintenance system.

### Technical Details
- Stage Name: REPAIR_MANUALS
- Type: Internal Snowflake stage
- Access: Controlled through standard Snowflake permissions
- Supported Files: PDF documents containing equipment repair manuals

### Usage
- Upload repair manuals directly through Snowsight UI
- Access documents for processing and vectorization
- Enables secure document storage within Snowflake environment


In [None]:
CREATE STAGE REPAIR_MANUALS;

## Step 2: Repair Manual Upload Guide

### Purpose
Instructions for obtaining and uploading equipment repair manuals to Snowflake internal stage.

### Required Manuals
- Otto 1500 Manual
- Otto 600 Manual
- Otto 100 Manual
- Lifter Manual

### Manual Acquisition
1. **GitHub Repository Access**
   - HTTPS Clone:
     ```bash
     git clone https://github.com/Snowflake-Labs/sfguide-build-rag-based-equipment-maintenance-app-using-snowflake-cortex.git
     ```
   - SSH Clone:
     ```bash
     git clone git@github.com:Snowflake-Labs/sfguide-build-rag-based-equipment-maintenance-app-using-snowflake-cortex.git
     ```

### Upload Process
1. **Access Snowflake UI**
   - Navigate to Data -> Databases
   - Select LLM database
   - Open RAG schema
   - Access Stages -> REPAIR_MANUALS

2. **Upload Files**
   - Click "+ Files" button in top right
   - Browse to downloaded manual PDFs
   - Select all four manuals
   - Confirm upload completion

### Validation
- Verify all manuals appear in stage
- Ensure files are readable
- Check file permissions

## Step 3: Data Engineering

## PDF Parser Function Creation

### Purpose
Creates a Python UDF (User-Defined Function) that extracts text content from PDF files stored in Snowflake stages.

### Technical Specifications
- Function Name: py_read_pdf
- Input: File path (string)
- Output: Extracted text (string)
- Runtime: Python 3.8
- Dependencies: 
  - snowflake-snowpark-python
  - pypdf2

### Implementation Details
- Uses PyPDF2 library for PDF processing
- Leverages SnowflakeFile for secure file access
- Processes PDF files page by page
- Concatenates text from all pages into single string

### Usage
- Call function with stage file path
- Returns extracted text for further processing
- Supports RAG pipeline document ingestion

In [None]:
----------------------------------------------------------------------
-- Create a python function to parse PDF files
----------------------------------------------------------------------  
CREATE OR REPLACE FUNCTION py_read_pdf(file string)
    returns string
    language python
    runtime_version = 3.8
    packages = ('snowflake-snowpark-python','pypdf2')
    handler = 'read_file'
as
$$
from PyPDF2 import PdfFileReader
from snowflake.snowpark.files import SnowflakeFile
from io import BytesIO
def read_file(file_path):
    whole_text = ""
    with SnowflakeFile.open(file_path, 'rb') as file:
        f = BytesIO(file.readall())
        pdf_reader = PdfFileReader(f)
        whole_text = ""
        for page in pdf_reader.pages:
            whole_text += page.extract_text()
    return whole_text
$$;

## PDF Content Storage Table Creation

### Purpose
Creates a table to store extracted text content from PDF repair manuals, using the previously defined PDF parser function.

### Table Structure
- Table Name: repair_manuals
- Columns:
  - file_name: Name of source PDF file
  - contents: Extracted text content from PDF

### Technical Implementation
1. Uses CTE to get unique filenames from stage
2. Processes each PDF using py_read_pdf function
3. Stores results in persistent table
4. Leverages build_scoped_file_url for secure file access

### Data Flow
1. Reads filenames from REPAIR_MANUALS stage
2. Extracts text content using Python UDF
3. Stores processed content in structured table

### Validation
- Includes verification query to check stored content
- Enables immediate data quality assessment

In [None]:
USE ROLE DEMOADMIN;
USE DATABASE LLM;
USE SCHEMA RAG;
----------------------------------------------------------------------
-- Create a table for storing the text parsed from each PDF
----------------------------------------------------------------------  
CREATE OR REPLACE TABLE repair_manuals AS
    WITH filenames AS (SELECT DISTINCT METADATA$FILENAME AS file_name FROM @repair_manuals)
    SELECT 
        file_name, 
        py_read_pdf(build_scoped_file_url(@repair_manuals, file_name)) AS contents
    FROM filenames;

--Validate
SELECT * FROM repair_manuals;

## Text Chunking Implementation for RAG System

### Purpose
Creates a table of chunked text from repair manual documents, optimizing content for RAG processing by breaking large documents into manageable, overlapping segments.

### Technical Specifications
- Table Name: repair_manuals_chunked
- Chunk Size: 3000 characters
- Overlap Size: 1000 characters
- Implementation: Recursive CTE

### Table Structure
- file_name: Source document identifier
- chunk_number: Sequential chunk identifier
- chunk_text: Raw text segment
- combined_chunk_text: Formatted text with context

### Processing Logic
1. Initial split: First 3000 characters with 1000 character overlap
2. Recursive processing: Continues splitting remaining text
3. Maintains document context through file name prefixing
4. Preserves sequential order of chunks

### Usage
- Enables efficient text embedding generation
- Maintains context through chunk overlap
- Facilitates precise information retrieval

In [None]:
----------------------------------------------------------------------
-- Chunk the file contents into 3000 character chunks, overlap each
-- chunk by 1000 characters.
----------------------------------------------------------------------
SET chunk_size = 3000;
SET overlap = 1000;
CREATE OR REPLACE TABLE repair_manuals_chunked AS 
WITH RECURSIVE split_contents AS (
    SELECT 
        file_name,
        SUBSTRING(contents, 1, $chunk_size) AS chunk_text,
        SUBSTRING(contents, $chunk_size-$overlap) AS remaining_contents,
        1 AS chunk_number
    FROM 
        repair_manuals

    UNION ALL

    SELECT 
        file_name,
        SUBSTRING(remaining_contents, 1, $chunk_size),
        SUBSTRING(remaining_contents, $chunk_size+1),
        chunk_number + 1
    FROM 
        split_contents
    WHERE 
        LENGTH(remaining_contents) > 0
)
SELECT 
    file_name,
    chunk_number,
    chunk_text,
    CONCAT(
        'Sampled contents from repair manual [', 
        file_name,
        ']: ', 
        chunk_text
    ) AS combined_chunk_text
FROM 
    split_contents
ORDER BY 
    file_name,
    chunk_number;

--Validate
SELECT * FROM repair_manuals_chunked;

## Text Vectorization for RAG System

### Purpose
Creates embeddings for chunked repair manual text using Snowflake Cortex's embedding functionality, preparing the content for semantic search and retrieval.

### Technical Specifications
- Table Name: repair_manuals_chunked_vectors
- Model: e5-base-v2
- Vector Dimension: 768
- Source: Chunked repair manual text

### Table Structure
- file_name: Source document identifier
- chunk_number: Sequential chunk identifier
- chunk_text: Original text segment
- combined_chunk_text: Formatted text with context
- combined_chunk_vector: Text embedding vector

### Implementation Details
- Uses Snowflake Cortex embed_text_768 function
- Processes each text chunk into vector representation
- Maintains original text and metadata
- Enables semantic similarity search

### Usage
- Powers semantic search capabilities
- Enables similarity matching for RAG
- Facilitates efficient information retrieval

In [None]:
----------------------------------------------------------------------
-- "Vectorize" the chunked text into a language encoded representation
----------------------------------------------------------------------  
CREATE OR REPLACE TABLE repair_manuals_chunked_vectors AS 
SELECT 
    file_name, 
    chunk_number, 
    chunk_text, 
    combined_chunk_text,
    snowflake.cortex.embed_text_768('e5-base-v2', combined_chunk_text) as combined_chunk_vector
FROM 
    repair_manuals_chunked;

--Validate
SELECT * FROM repair_manuals_chunked_vectors;

## RAG Query Function Implementation

### Purpose
Creates a function that implements the RAG (Retrieval Augmented Generation) pattern, combining vector similarity search with LLM-based response generation for equipment repair queries.

### Technical Specifications
- Function Name: REPAIR_MANUALS_LLM
- Input: User prompt (string)
- Output: Table with response and context details
- Models Used:
  - e5-base-v2 for embedding
  - mixtral-8x7b for text generation

### Function Components
1. **Vector Similarity Search**
   - Uses cosine similarity for context retrieval
   - Retrieves top 10 most relevant chunks
   - Scores matches based on vector similarity

2. **Response Generation**
   - Combines user query with retrieved context
   - Uses Mixtral 8x7B model for response generation
   - Returns structured output with context

### Output Structure
- response: LLM-generated answer
- file_name: Source document
- chunk_text: Retrieved context
- chunk_number: Context identifier
- score: Similarity score

### Usage
- Input maintenance-related questions
- Receives contextualized responses
- Includes source documentation references

In [None]:
----------------------------------------------------------------------
-- Invoke an LLM, sending our question as part of the prompt along with 
-- additional "context" from the best matching chunk (based on cosine similarity)
----------------------------------------------------------------------  
SET prompt = 'OTTO 1500 agv is not driving straight.  How do I troubleshoot and resolve this issue?';

CREATE OR REPLACE FUNCTION REPAIR_MANUALS_LLM(prompt string)
RETURNS TABLE (response string, file_name string, chunk_text string, chunk_number int, score float)
AS
    $$
    WITH best_match_chunk AS (
        SELECT
            v.file_name,
            v.chunk_number,
            v.chunk_text,
            VECTOR_COSINE_SIMILARITY(v.combined_chunk_vector, snowflake.cortex.embed_text_768('e5-base-v2', prompt)) AS score
        FROM 
            repair_manuals_chunked_vectors v
        ORDER BY 
            score DESC
        LIMIT 10
    )
    SELECT 
        SNOWFLAKE.cortex.COMPLETE('mixtral-8x7b', 
            CONCAT('Answer this question: ', prompt, '\n\nUsing this repair manual text: ', chunk_text)
        ) AS response,
        file_name,
        chunk_text,
        chunk_number,
        score
    FROM
        best_match_chunk
    $$;

In [None]:
SELECT * FROM TABLE(REPAIR_MANUALS_LLM($prompt));


## Step 4: LLM Generated Repair logs

## Equipment Repair Logs Table Creation

### Purpose
Creates a structured table to track and store equipment maintenance records, enabling systematic logging of repair incidents and their resolutions.

### Table Structure
- date_reported: Timestamp of issue reporting
- equipment_model: Model identifier of the equipment
- equipment_id: Unique identifier for specific equipment unit
- problem_reported: Description of the reported issue
- resolution_notes: Documentation of repair actions and solutions

### Usage Scenarios
- Track maintenance history
- Monitor equipment issues
- Document repair solutions
- Enable maintenance analysis
- Support preventive maintenance

### Integration Points
- Connects with RAG system for solution retrieval
- Supports maintenance pattern analysis
- Enables historical repair tracking

In [None]:
----------------------------------------------------------------------
-- Create a table to represent equipment repair logs
----------------------------------------------------------------------  
CREATE OR REPLACE TABLE repair_logs (
    date_reported datetime, 
    equipment_model string,
    equipment_id string,
    problem_reported string,
    resolution_notes string
);

## Simulated Repair Log Data Population

### Purpose
Populates the repair_logs table with simulated maintenance records for various Otto AGV models, establishing a historical maintenance database for RAG system training.

### Data Structure
- Timestamps: March 2023 - December 2023
- Equipment Models: Otto Forklift, Otto 100, Otto 600, Otto 1500
- Record Types:
  - Equipment issues
  - Resolution actions
  - Maintenance outcomes

### Implementation Details
- 40+ maintenance records
- Diverse problem scenarios
- Detailed resolution notes
- Multiple equipment models
- Various AGV units (AGV-001 to AGV-012)

### Data Usage
- Training data for RAG system
- Historical maintenance patterns
- Equipment issue analysis
- Resolution effectiveness tracking

In [None]:
----------------------------------------------------------------------
-- Load (simulated) repair logs.
----------------------------------------------------------------------  
INSERT INTO repair_logs (date_reported, equipment_model, equipment_id, problem_reported, resolution_notes) VALUES
('2023-03-23 08:42:48', 'Otto Forklift', 'AGV-010', 'Vision System Calibration Error', 'Recalibrated the vision system and replaced damaged image sensors. Tested object recognition accuracy.'),
('2023-09-30 04:42:47', 'Otto 100', 'AGV-011', 'Wireless Receiver Malfunction', 'Replaced faulty wireless receiver and updated communication protocols. Ensured robust signal reception.'),
('2023-09-27 05:01:16', 'Otto Forklift', 'AGV-006', 'Inadequate Lifting Force', 'Adjusted the hydraulic pressure settings and replaced weak hydraulic pistons. Tested lifting capacity with maximum load.'),
('2023-02-16 09:42:31', 'Otto 1500', 'AGV-001', 'Hydraulic System Overpressure', 'Adjusted hydraulic system and replaced faulty pressure valves. Ensured safe and stable operation.'),
('2023-10-29 23:44:57', 'Otto 600', 'AGV-003', 'Erratic Forklift Movement', 'Repaired damaged forklift steering components and recalibrated steering controls. Ensured smooth and accurate movement.'),('2023-11-21 18:35:09', 'Otto 600', 'AGV-002', 'Motor Torque Fluctuations', 'Replaced worn motor brushes and serviced motor components. Calibrated motor for consistent torque output.'),
('2023-07-04 14:22:33', 'Otto Forklift', 'AGV-005', 'Control Software Hangs', 'Diagnosed software hanging issue, optimized system resources, and applied software updates. Conducted stress tests for reliability.'),
('2023-12-13 21:16:49', 'Otto 1500', 'AGV-004', 'Path Deviation in Navigation', 'Updated navigation algorithms and recalibrated wheel encoders. Performed path accuracy tests in different layouts.'),
('2023-08-10 10:55:43', 'Otto 100', 'AGV-012', 'Steering Response Delay', 'Diagnosed and fixed the delay in steering response. Calibrated the steering system for immediate and accurate response.'),
('2023-05-15 16:11:28', 'Otto Forklift', 'AGV-009', 'Unresponsive Touch Panel', 'Replaced the touch panel and updated the interface software. Tested for user interaction and responsiveness.'),
('2023-08-31 02:54:20', 'Otto 100', 'AGV-003', 'Charging System Inefficiency', 'Upgraded the charging system components and optimized charging algorithms for faster and more efficient charging.'),
('2023-10-05 20:24:19', 'Otto Forklift', 'AGV-008', 'Payload Sensor Inaccuracy', 'Calibrated payload sensors and replaced defective units. Ensured accurate load measurement and handling.'),
('2023-02-19 22:29:24', 'Otto 1500', 'AGV-009', 'Cooling Fan Malfunction', 'Replaced malfunctioning cooling fans and cleaned air vents. Tested under load to ensure effective heat dissipation.'),
('2023-05-29 15:09:15', 'Otto 100', 'AGV-011', 'Drive Motor Overheating', 'Serviced drive motors and replaced worn components. Improved motor cooling and monitored temperature during operation.'),
('2023-04-30 01:03:03', 'Otto 600', 'AGV-002', 'Laser Scanner Inaccuracy', 'Calibrated laser scanners and updated scanning software. Ensured precise environmental mapping and obstacle detection.'),
('2023-03-14 13:15:52', 'Otto Forklift', 'AGV-006', 'Conveyor Belt Misalignment', 'Realigned the conveyor belt and adjusted tension settings. Conducted operational tests for smooth and consistent movement.'),
('2023-11-14 08:11:58', 'Otto 1500', 'AGV-012', 'Forklift Sensor Misalignment', 'Realigned forklift sensors and calibrated for precise object positioning and handling.'),
('2023-12-24 22:35:13', 'Otto 600', 'AGV-008', 'Erratic Forklift Movement', 'Repaired damaged forklift steering components and recalibrated steering controls. Ensured smooth and accurate movement.'),
('2023-09-20 08:08:16', 'Otto 100', 'AGV-007', 'Hydraulic System Overpressure', 'Adjusted hydraulic system pressure settings and replaced faulty pressure valves. Ensured safe and stable operation.'),
('2023-10-20 00:37:29', 'Otto 600', 'AGV-003', 'Forklift Sensor Misalignment', 'Performed alignment on forklift sensors and calibrated for precise object positioning and handling.'),('2023-08-20 12:49:44', 'Otto 1500', 'AGV-008', 'Control Software Hangs', 'Diagnosed software hanging issue, optimized system resources, and applied software updates. Conducted stress tests for reliability.'),
('2023-07-08 03:37:26', 'Otto 1500', 'AGV-002', 'Wireless Receiver Malfunction', 'Replaced faulty wireless receiver and updated communication protocols. Ensured robust signal reception.'),
('2023-10-12 09:05:07', 'Otto 1500', 'AGV-001', 'Laser Scanner Inaccuracy', 'Calibrated laser scanners and updated scanning software. Ensured precise environmental mapping and obstacle detection.'),
('2023-03-12 19:28:34', 'Otto 1500', 'AGV-008', 'Hydraulic System Overpressure', 'Adjusted hydraulic system pressure settings and replaced faulty pressure valves. Ensured safe and stable operation.'),
('2023-01-19 23:10:03', 'Otto 600', 'AGV-006', 'Inconsistent Conveyor Speed', 'Repaired gearbox in conveyor attachment and adjusted speed control settings. Verified consistent conveyor operation.'),
('2023-06-29 20:02:38', 'Otto 600', 'AGV-002', 'Battery Overheating', 'Replaced faulty battery cells and improved battery ventilation system. Monitored temperature during charging and operation.'),
('2023-05-09 23:19:03', 'Otto 600', 'AGV-011', 'Inconsistent Conveyor Speed', 'Repaired gearbox in conveyor attachment and adjusted speed control settings. Verified consistent conveyor operation.'),
('2023-06-09 17:56:51', 'Otto Forklift', 'AGV-002', 'Motor Torque Fluctuations', 'Replaced worn motor brushes and serviced motor components. Calibrated motor for consistent torque output.'),
('2023-03-02 09:21:22', 'Otto 1500', 'AGV-004', 'Payload Sensor Inaccuracy', 'Calibrated payload sensors and replaced defective units. Ensured accurate load measurement and handling.'),
('2023-07-16 00:00:54', 'Otto 1500', 'AGV-003', 'Drive Motor Overheating', 'Serviced drive motors and replaced worn components. Improved motor cooling and monitored temperature during operation.'),
('2023-02-28 12:48:29', 'Otto 600', 'AGV-001', 'Inadequate Lifting Force', 'Adjusted the hydraulic pressure settings and replaced weak hydraulic pistons. Tested lifting capacity with maximum load.'),
('2023-10-10 23:04:35', 'Otto Forklift', 'AGV-010', 'Unresponsive Touch Panel', 'Replaced the touch panel and updated the interface software. Tested for user interaction and responsiveness.'),
('2023-08-01 13:37:16', 'Otto 600', 'AGV-004', 'Cooling Fan Malfunction', 'Replaced malfunctioning cooling fans and cleaned air vents. Tested under load to ensure effective heat dissipation.'),
('2023-05-10 17:48:27', 'Otto Forklift', 'AGV-005', 'Battery Overheating', 'Replaced faulty battery cells and improved battery ventilation system. Monitored temperature during charging and operation.'),
('2023-02-05 12:37:50', 'Otto Forklift', 'AGV-010', 'Charging System Inefficiency', 'Upgraded the charging system components and optimized charging algorithms for faster and more efficient charging.'),('2023-08-24 15:29:05', 'Otto 600', 'AGV-012', 'Inconsistent Conveyor Speed', 'Repaired gearbox in conveyor attachment and adjusted speed control settings. Verified consistent conveyor operation.'),
('2023-03-28 02:59:06', 'Otto Forklift', 'AGV-011', 'Inadequate Lifting Force', 'Adjusted the hydraulic pressure settings and replaced weak hydraulic pistons. Tested lifting capacity with maximum load.'),
('2023-08-07 20:55:21', 'Otto 600', 'AGV-007', 'Cooling Fan Malfunction', 'Replaced malfunctioning cooling fans and cleaned air vents. Tested under load to ensure effective heat dissipation.'),
('2023-05-24 15:45:35', 'Otto 600', 'AGV-008', 'Charging System Inefficiency', 'Upgraded the charging system components and optimized charging algorithms for faster and more efficient charging.'),
('2023-08-06 21:27:28', 'Otto Forklift', 'AGV-008', 'Path Deviation in Navigation', 'Updated navigation algorithms and recalibrated wheel encoders. Performed path accuracy tests in different layouts.'),
('2023-02-18 15:41:59', 'Otto 1500', 'AGV-002', 'Battery Overheating', 'Replaced faulty battery cells and improved battery ventilation system. Monitored temperature during charging and operation.'),
('2023-08-11 11:55:51', 'Otto Forklift', 'AGV-003', 'Charging System Inefficiency', 'Upgraded the charging system components and optimized charging algorithms for faster and more efficient charging.'),
('2023-11-11 14:43:55', 'Otto 100', 'AGV-001', 'Charging System Inefficiency', 'Upgraded the charging system components and optimized charging algorithms for faster and more efficient charging.'),
('2023-02-17 09:23:34', 'Otto 600', 'AGV-001', 'Control Software Hangs', 'Diagnosed software hanging issue, optimized system resources, and applied software updates. Conducted stress tests for reliability.'),
('2023-03-13 18:19:47', 'Otto 100', 'AGV-011', 'Path Deviation in Navigation', 'Updated navigation algorithms and recalibrated wheel encoders. Performed path accuracy tests in different layouts.'),
('2023-12-02 02:13:06', 'Otto 1500', 'AGV-001', 'Drive Motor Overheating', 'Serviced drive motors and replaced worn components. Improved motor cooling and monitored temperature during operation.');

--Validate
SELECT * FROM repair_logs;

## Repair Logs Formatting for LLM Context

### Purpose
Creates a formatted version of repair logs with structured text representation, optimizing the maintenance records for LLM consumption and context retrieval.

### Technical Specifications
- Table Name: repair_logs_formatted
- Source: repair_logs table
- Format: Natural language narrative structure
- Preserves: All original columns

### Text Structure
1. Equipment Identification
   - Model information
   - AGV reference
2. Problem Description
   - Clear problem statement
   - Issue details
3. Resolution Details
   - Solution steps
   - Maintenance actions

### Usage Benefits
- Enhanced LLM comprehension
- Structured context retrieval
- Consistent format for responses
- Improved semantic search results

In [None]:
----------------------------------------------------------------------
-- Format the logs in a way that will be helpful context for the LLM
----------------------------------------------------------------------  
CREATE OR REPLACE TABLE repair_logs_formatted AS
SELECT
    *,
    CONCAT(
        'The following Problem was Reported for a ',
        equipment_model,
        ' AGV.\n\nProblem:\n', 
        problem_reported, 
        '\n\nResolution:\n', 
        resolution_notes) AS combined_text
FROM
    repair_logs;

--Validate
SELECT * FROM repair_logs_formatted;

## Repair Logs Vectorization

### Purpose
Creates embeddings for formatted repair log entries using Snowflake Cortex, enabling semantic search capabilities across maintenance records.

### Technical Specifications
- Table Name: repair_logs_vectors
- Model: e5-base-v2
- Vector Dimension: 768
- Source: Formatted repair logs

### Table Structure
- date_reported: Timestamp of issue
- equipment_model: AGV model identifier
- equipment_id: Specific unit ID
- problem_reported: Issue description
- resolution_notes: Solution details
- combined_vector: Text embedding vector

### Implementation Features
- Preserves all original log fields
- Generates vectors from formatted text
- Enables semantic similarity search
- Supports maintenance pattern analysis

### Applications
- Similar case retrieval
- Pattern recognition
- Maintenance trend analysis
- Knowledge base search

In [None]:
----------------------------------------------------------------------
-- "Vectorize" the formatted contents
----------------------------------------------------------------------  
CREATE OR REPLACE TABLE repair_logs_vectors AS
SELECT 
    date_reported, 
    equipment_model,
    equipment_id,
    problem_reported,
    resolution_notes,
    snowflake.cortex.embed_text_768('e5-base-v2', combined_text) as combined_vector
FROM repair_logs_formatted;

--Validate
SELECT * FROM repair_logs_vectors;

## Repair Logs RAG Function Implementation

### Purpose
Creates a table-valued function that implements RAG pattern for maintenance queries, combining vector similarity search of repair logs with LLM-based response generation.

### Technical Specifications
- Function Name: REPAIR_LOGS_LLM
- Input: Maintenance query (string)
- Output: LLM response and relevant repair logs
- Models:
  - e5-base-v2 for embeddings
  - mixtral-8x7b for response generation

### Function Components
1. **Vector Search (best_match_repair_logs CTE)**
   - Performs cosine similarity matching
   - Retrieves top 10 similar repair logs
   - Ranks matches by similarity score

2. **Response Generation (combined_notes CTE)**
   - Aggregates relevant repair notes
   - Structures prompt with maintenance context
   - Generates targeted repair recommendations

### Output Structure
- response: LLM-generated repair guidance
- relevant_repair_logs: Supporting historical repairs

### Implementation Features
- Context-aware responses
- Historical repair reference
- Similarity-based retrieval
- Structured maintenance guidance

In [None]:
----------------------------------------------------------------------
-- Create a table valued function that looks for the best repair logs 
-- (based upon cosine similarity) and pass those as context to the LLM.
----------------------------------------------------------------------  
CREATE OR REPLACE FUNCTION REPAIR_LOGS_LLM(prompt string)
RETURNS TABLE (response string, relevant_repair_logs string)
AS
    $$
       WITH best_match_repair_logs AS (
            SELECT 
                *, 
                VECTOR_COSINE_SIMILARITY(
                    combined_vector,
                    snowflake.cortex.embed_text_768('e5-base-v2', prompt)
                ) AS score
            FROM
                repair_logs_vectors
            ORDER BY
                score DESC
            LIMIT 10
        ),
        combined_notes AS (
            SELECT 
                SNOWFLAKE.CORTEX.COMPLETE('mixtral-8x7b', 
                    CONCAT('An equipment technician is dealing with this problem on an AGV: ', 
                    prompt, 
                    '\n\nUsing these previous similar resolution notes, what is the recommended course of action to troubleshoot and repair the AGV?\n\n', 
                    LISTAGG(resolution_notes, '\n\nResolution Note:\n')
                    )
                ) AS response,
                LISTAGG(resolution_notes, '\n\nResolution Note:\n') AS relevant_repair_logs
            FROM best_match_repair_logs
        ) 
        SELECT * FROM combined_notes
    $$;

## RAG System Testing Query

### Purpose
Tests the RAG-based maintenance advisory system with a specific AGV issue, demonstrating the system's ability to provide contextual troubleshooting guidance.

### Test Configuration
- Equipment: OTTO 1500 AGV
- Issue Type: Navigation/Movement
- Query Focus: Straight-line deviation
- Function: REPAIR_LOGS_LLM

### Query Components
1. **Test Prompt**
   - Specific equipment model (OTTO 1500)
   - Clear problem description
   - Request for troubleshooting steps

2. **Expected Output**
   - LLM-generated response
   - Relevant historical repair logs
   - Context-aware solutions

In [None]:
----------------------------------------------------------------------
-- Test the LLM
----------------------------------------------------------------------  
SET prompt = 'OTTO 1500 agv is not driving straight.  How do I troubleshoot and resolve this issue?';

SELECT * FROM TABLE(REPAIR_LOGS_LLM($prompt));

## Step 5: Combined Logs and Manuals

## Combined RAG Response Function

### Purpose
Creates a function that integrates responses from both repair manuals and historical logs, providing a comprehensive summarized solution using Snowflake Cortex.

### Technical Specifications
- Function Name: COMBINED_REPAIR_LLM
- Input: Maintenance query (string)
- Output: Summarized response
- Components:
  - REPAIR_MANUALS_LLM
  - REPAIR_LOGS_LLM
  - Cortex Summarization

### Function Architecture
1. **Response Collection (stacked_results CTE)**
   - Retrieves manual-based solution
   - Retrieves historical repair solution
   - Combines both perspectives

2. **Text Aggregation (collapsed_results CTE)**
   - Combines multiple responses
   - Prepares text for summarization

3. **Final Processing**
   - Summarizes combined knowledge
   - Generates concise solution

### Implementation Benefits
- Comprehensive solution coverage
- Balanced theoretical/practical guidance
- Concise, actionable responses
- Integrated knowledge sources

In [None]:
----------------------------------------------------------------------
-- Run both LLMs, combine the contents, and ask Snowflake Cortex to summarize
----------------------------------------------------------------------  
CREATE OR REPLACE FUNCTION COMBINED_REPAIR_LLM(prompt string)
RETURNS TABLE (response string)
AS
    $$
       WITH stacked_results AS
        (
            SELECT TOP 1 response FROM TABLE(REPAIR_MANUALS_LLM(prompt)) 
            UNION
            SELECT response FROM TABLE(REPAIR_LOGS_LLM(prompt))
        ),
        collapsed_results AS (
            SELECT 
                LISTAGG(response) AS collapsed_text 
            FROM 
                stacked_results
        )
        SELECT
            SNOWFLAKE.CORTEX.SUMMARIZE(collapsed_text) AS response
        FROM
            collapsed_results
    $$;

## Combined RAG System Testing Query

### Purpose
Tests the integrated RAG solution that combines both repair manual knowledge and historical repair logs, providing a comprehensive summarized solution for AGV maintenance issues.

### Test Configuration
- Equipment: OTTO 1500 AGV
- Issue Type: Navigation/Movement
- Query Focus: Straight-line deviation
- Function: COMBINED_REPAIR_LLM

### Test Components
1. **Input Query**
   - Specific equipment identification
   - Clear problem description
   - Request for troubleshooting steps

2. **Expected Output**
   - Summarized solution combining:
     - Manual-based guidance
     - Historical repair solutions
   - Concise, actionable recommendations

In [None]:
----------------------------------------------------------------------
-- Test the combined function
----------------------------------------------------------------------  
SET prompt = 'OTTO 1500 agv is not driving straight.  How do I troubleshoot and resolve this issue?';

SELECT * FROM TABLE(COMBINED_REPAIR_LLM($prompt));

## Step 6: Setup Streamlit app

## AI-Guided Maintenance Streamlit Application

### Purpose
Creates an interactive web interface for the AI-powered maintenance advisory system, combining repair manual knowledge and historical repair logs.

### Configuration Details
1. **Basic Settings**
   - App Name: AI-Guided Equipment Maintenance
   - Location: Compute_ML
   - Warehouse: Medium
   - Database/Schema: LLM.RAG

2. **Resource Allocation**
   - Warehouse Size: Medium
   - Auto-suspend: 300 seconds
   - Concurrency: Standard

3. **Security Settings**
   - Role: DEMOADMIN
   - Authentication: Snowflake native
   - Access Control: Role-based

### Application Components
1. **Interface Elements**
   - Title: AI-Guided Maintenance
   - Equipment Image: OTTO 100 AGV visual reference
   - Question Input: Text field for maintenance queries
   - Submit Button: Snowflake-branded trigger

2. **Tab Structure**
   - Tab 1: Repair Manuals Analysis
     - Maintenance manual recommendations
     - Relevant text chunks and scores
   
   - Tab 2: Repair Logs Analysis
     - Historical repair recommendations
     - Relevant repair log entries
   
   - Tab 3: Combined Insights
     - Synthesized recommendations
     - Integrated knowledge base

### Technical Implementation
1. **Dependencies**
   - streamlit: Web interface framework
   - snowflake.snowpark: Database connectivity
   
2. **Session Management**
   - Active Snowflake session
   - Query execution context
   
3. **Data Processing**
   - RAG-based query processing
   - Multi-source response generation
   - Dynamic result rendering

### Implementation Features
- Interactive query interface
- Real-time LLM responses
- Multi-tab result display
- Visual equipment references
- Contextual repair guidance

### Usage Flow
1. User enters maintenance question
2. System processes query across knowledge bases
3. Results displayed in organized tabs
4. Recommendations provided with context



## Cleanup

In [None]:
DROP FUNCTION IF EXISTS py_read_pdf(STRING);
DROP FUNCTION IF EXISTS REPAIR_MANUALS_LLM(STRING);
DROP FUNCTION IF EXISTS REPAIR_LOGS_LLM(STRING);
DROP FUNCTION IF EXISTS COMBINED_REPAIR_LLM(STRING);

In [None]:

-- Drop tables
DROP TABLE IF EXISTS repair_manuals;
DROP TABLE IF EXISTS repair_manuals_chunked;
DROP TABLE IF EXISTS repair_manuals_chunked_vectors;
DROP TABLE IF EXISTS repair_logs;
DROP TABLE IF EXISTS repair_logs_formatted;
DROP TABLE IF EXISTS repair_logs_vectors;

In [None]:
-- Drop stage
DROP STAGE IF EXISTS REPAIR_MANUALS;

-- Drop schema
DROP SCHEMA IF EXISTS LLM.RAG;

In [None]:
SELECT 
    CURRENT_WAREHOUSE(),
    CURRENT_DATABASE(),
    CURRENT_SCHEMA(),
    CURRENT_ROLE();

In [None]:
use warehouse compute_wh;
use database notebooks_db;

In [None]:





-- Drop database
DROP DATABASE IF EXISTS LLM;

-- Drop warehouse
DROP WAREHOUSE IF EXISTS Medium;