# RAG Chatbot for KubeCon Sessions

This guide walks through building a Retrieval-Augmented Generation (RAG) chatbot using Snowflake Cortex and Streamlit for KubeCon session data.



## Prerequisites

Before proceeding, ensure you have:
- A Snowflake account with access to Cortex.
- Required permissions to create tables and search services.
- Python environment with `streamlit`, `snowflake-core`, and `snowflake-snowpark`.
- Download and save the PDF file for KubeCon Schedule: [View the KCCNCEU 2025 Schedule](https://kccnceu2025.sched.com/print?iframe=yes&w=100%&sidebar=yes&bg=no) 

## Step 1: Staging and Listing Available Files in Snowflake:

To create a named internal stage using Snowsight, follow these steps:  

1. **Sign in to Snowsight.**  
2. In the navigation menu, select **Create » Stage » Snowflake Managed**.  
3. In the **Create Stage** dialog, enter a **Stage Name**.  
4. Select the **database and schema** where you want to create the stage.  
5. Optionally, **deselect Directory table**.  
   - Directory tables allow you to see files on the stage but require a warehouse, which incurs a cost.  
   - You can choose to deselect this option now and enable a directory table later.  
6. Select the type of **Encryption** supported for all files on your stage.  
   - For details, see [Encryption for Internal Stages](#).  
   - **Note:** You cannot change the encryption type after creating the stage. 

To upload files onto your stage, follow these steps:  

1. **Sign in to Snowsight.**  
2. Select **Data » Add Data**.  
3. On the **Add Data** page, select **Load files into a Stage**.  
4. In the **Upload Your Files** dialog, select the files you want to upload.  
   - You can upload multiple files at the same time.  
5. Select the **database schema** where you created the stage, then select the **stage**.  
6. Optionally, select or create a **path** where you want to save your files within the stage.  
7. Click **Upload**.  


In [None]:
--list the staged file(s)
ls @FAWAZG_SCHEMA.KUBECON;

# Step 2: Parsing KubeCon Session Document

The `PARSE_DOCUMENT` function extracts text, data, and layout elements from documents. It can be used for:

1. Powering **RAG pipelines** for Cortex Search.
2. Enabling **LLM processing** like document summarization or translation using Cortex AI Functions.
3. Performing **zero-shot entity extraction** with Cortex AI Structured Outputs.


In [None]:
CREATE OR REPLACE TABLE FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_PARSED_CONTENT AS SELECT 
      relative_path,
      TO_VARCHAR(
        SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
          @FAWAZG_SCHEMA.KUBECON, 
          relative_path, 
          {'mode': 'LAYOUT'}
        ) :content
      ) AS parsed_text
    FROM directory(@FAWAZG_SCHEMA.KUBECON)
    WHERE relative_path LIKE '%.pdf'

In [None]:
-- check the results of results Step 2: Parsing KubeCon Session Document
SELECT * FROM FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_PARSED_CONTENT LIMIT 2


# Step 3: Chunking the Parsed Content

The `SPLIT_TEXT_RECURSIVE_CHARACTER` function splits text into smaller chunks for text embedding or search indexing. It works as follows:

- Splits text based on separators (default or custom).
- Recursively splits chunks longer than the specified `chunk_size`.
- Example: With `format='none'`, it first splits on `\n\n` (paragraphs), then `\n` (line breaks), repeating until all chunks are under the `chunk_size`.


In [None]:
CREATE OR REPLACE TABLE FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_CHUNKED_CONTENT (
    file_name VARCHAR,
    CHUNK VARCHAR
);

INSERT INTO FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_CHUNKED_CONTENT (file_name, CHUNK)
SELECT
    relative_path,
    c.value AS CHUNK
FROM
    FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_PARSED_CONTENT,
    LATERAL FLATTEN( input => SNOWFLAKE.CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER (
        parsed_text,
        'markdown',
        300,
        250
    )) c;

In [None]:
-- check the resuls of Step 3: Chunking the Parsed Content
SELECT * FROM FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_CHUNKED_CONTENT LIMIT 10

# Step 4: Creating a Search Service in Snowflake Cortex
This command triggers the creation of the search service for your data with the following behavior:

- **Queries** will search for matches in the `transcript_text` column.
- **TARGET_LAG** sets the search service to check for updates to `support_transcripts` approximately once per day.
- The **warehouse** `cortex_search_wh` will be used to materialize query results initially and when the base table updates.

![Cortex Search RAG](https://docs.snowflake.com/en/_images/cortex-search-rag.png)

In [None]:
CREATE OR REPLACE CORTEX SEARCH SERVICE FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_SEARCH_SERVICE
    ON chunk
    WAREHOUSE = fawazg_wh
    TARGET_LAG = '1 minute'
    EMBEDDING_MODEL = 'snowflake-arctic-embed-l-v2.0'
    AS (
    SELECT
        file_name,
        chunk
    FROM FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_CHUNKED_CONTENT
    );


In [None]:
-- Query Step 4 with SQL
SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'FAWAZG_DB.FAWAZG_SCHEMA.KUBECON_SEARCH_SERVICE',
      '{
         "query": "Any talks about Snowflake?",
         "columns":[
            "file_name",
            "CHUNK"
         ],
         "limit":1
      }'
  )
)['results'] as results;

# Step 5: Building the KubeCon Chatbot with Streamlit

1. **Imports and Setup**  
   Imports necessary libraries: `streamlit` for UI, `Root` and `get_active_session` for Snowflake interaction.

2. **Initialize Chatbot and Service Metadata**  
   Fetches Cortex Search service metadata and initializes conversation state. Provides options to clear chat history or use it in the conversation.

3. **Query the Search Service**  
   Executes a search query on the selected Cortex Search service and retrieves relevant context documents for the chatbot.

4. **Create and Process Prompts**  
   Constructs prompts by combining chat history, search context, and the user’s question. Sends this prompt to the Snowflake model (`cortex.complete`) for response generation.

5. **Main Function and Chat Interaction**  
   Displays chat history, handles user input, and processes queries. Uses the generated response from the model to continue the conversation.


In [None]:
import streamlit as st
from snowflake.core import Root # requires snowflake>=0.8.0
from snowflake.snowpark.context import get_active_session

## Initialize Chatbot

def init_chatbot():
    if "service_metadata" not in st.session_state:
        services = session.sql("SHOW CORTEX SEARCH SERVICES;").collect()
        service_metadata = []
        if services:
            for s in services:
                svc_name = s["name"]
                svc_search_col = session.sql(
                    f"DESC CORTEX SEARCH SERVICE {svc_name};"
                ).collect()[0]["search_column"]
                service_metadata.append(
                    {"name": svc_name, "search_column": svc_search_col}
                )

        st.session_state.service_metadata = service_metadata


    st.sidebar.button("Clear conversation", key="clear_conversation")
    st.sidebar.toggle("Use chat history", key="use_chat_history", value=True)

    
    if st.session_state.clear_conversation or "messages" not in st.session_state:
        st.session_state.messages = []
## Query the Search Service
def query_cortex_search_service(query):
    db, schema = session.get_current_database(), session.get_current_schema()

    cortex_search_service = (
        root.databases[db]
        .schemas[schema]
        .cortex_search_services[st.session_state.selected_cortex_search_service]
    )

    context_documents = cortex_search_service.search(
        query, columns=[], limit=st.session_state.num_retrieved_chunks
    )
    results = context_documents.results

    service_metadata = st.session_state.service_metadata
    search_col = [s["search_column"] for s in service_metadata
                    if s["name"] == st.session_state.selected_cortex_search_service][0]

    context_str = ""
    for i, r in enumerate(results):
        context_str += f"Context document {i+1}: {r[search_col]} \n" + "\n"

   
    return context_str
    
## Get the chat history
def get_chat_history():
    start_index = max(
        0, len(st.session_state.messages) - st.session_state.num_chat_messages
    )
    return st.session_state.messages[start_index : len(st.session_state.messages) - 1]

def complete(model, prompt):
    return session.sql("SELECT snowflake.cortex.complete(?,?)", (model, prompt)).collect()[0][0]

def make_chat_history_summary(chat_history, question):
    prompt = f"""
        [INST]
        Based on the chat history below and the question, generate a query that extend the question
        with the chat history provided. The query should be in natural language.
        Answer with only the query. Do not add any explanation.

        <chat_history>
        {chat_history}
        </chat_history>
        <question>
        {question}
        </question>
        [/INST]
    """

    summary = complete(st.session_state.model_name, prompt)

   

    return summary

def create_prompt(user_question):
    """
    Create a prompt for the language model by combining the user question with context retrieved
    from the cortex search service and chat history (if enabled). Format the prompt according to
    the expected input format of the model.

    Args:
        user_question (str): The user's question to generate a prompt for.

    Returns:
        str: The generated prompt for the language model.
    """
    if st.session_state.use_chat_history:
        chat_history = get_chat_history()
        if chat_history != []:
            question_summary = make_chat_history_summary(chat_history, user_question)
            prompt_context = query_cortex_search_service(question_summary)
        else:
            prompt_context = query_cortex_search_service(user_question)
    else:
        prompt_context = query_cortex_search_service(user_question)
        chat_history = ""

    prompt = f"""
            [INST]
            You are a helpful AI chat assistant with RAG capabilities. When a user asks you a question,
            you will also be given context provided between <context> and </context> tags. Use that context
            with the user's chat history provided in the between <chat_history> and </chat_history> tags
            to provide a summary that addresses the user's question. Ensure the answer is coherent, concise,
            and directly relevant to the user's question.

            If the user asks a generic question which cannot be answered with the given context or chat_history,
            just say "I don't know the answer to that question.

            Don't saying things like "according to the provided context".

            <chat_history>
            {chat_history}
            </chat_history>
            <context>
            {prompt_context}
            </context>
            <question>
            {user_question}
            </question>
            [/INST]
            Answer:
        """
    return prompt

def main():
    st.title(f":speech_balloon: KubeCon 2025 Chatbot with Snowflake Cortex and Unstructured Data")
    init_chatbot()
    icons = {"assistant": "❄️", "user": "👤"}

    # Display chat messages from history on app rerun
    for message in st.session_state.messages:
        with st.chat_message(message["role"], avatar=icons[message["role"]]):
            st.markdown(message["content"])

    disable_chat = (
        "service_metadata" not in st.session_state
        or len(st.session_state.service_metadata) == 0
    )
    if question := st.chat_input("Any talks about Snowflake?", disabled=disable_chat):
        # Add user message to chat history
        st.session_state.messages.append({"role": "user", "content": question})
        # Display user message in chat message container
        with st.chat_message("user", avatar=icons["user"]):
            st.markdown(question.replace("$", "\$"))

        # Display assistant response in chat message container
        with st.chat_message("assistant", avatar=icons["assistant"]):
            message_placeholder = st.empty()
            question = question.replace("'", "")
            with st.spinner("Thinking..."):
                generated_response = complete(
                    st.session_state.model_name, create_prompt(question)
                )
                message_placeholder.markdown(generated_response)

        st.session_state.messages.append(
            {"role": "assistant", "content": generated_response}
        )

if __name__ == "__main__":
    session = get_active_session()
    st.session_state.model_name = "snowflake-arctic"
    st.session_state.num_chat_messages = 5
    st.session_state.num_retrieved_chunks = 5
    st.session_state.selected_cortex_search_service  = "KUBECON_SEARCH_SERVICE"
    root = Root(session)
    main()

## Conclusion

This guide outlines how to build a RAG-based chatbot using Snowflake Cortex and Streamlit to query and retrieve KubeCon session data efficiently. This notebook demonstrates how to use Snowflake Cortex for creating a chatbot that can query parsed KubeCon session data. It starts by listing the staged files and parsing the session documents using the `PARSE_DOCUMENT` function to extract content. The parsed text is then chunked into smaller pieces using `SPLIT_TEXT_RECURSIVE_CHARACTER` to optimize it for search indexing. Afterward, a Cortex search service is created on the chunked content, and queries can be run against this service to retrieve relevant information. In the final step, Streamlit is used to build a chatbot interface, enabling users to interact with the system and ask questions about the parsed content.

For more information on how to get started with Snowflake Cortex, including Retrieval Augmented Generation (RAG) applications, refer to the following links:  
- [Snowflake Quickstarts](https://quickstarts.snowflake.com/)  
- [RAG Applications with Snowflake](https://www.snowflake.com/en/fundamentals/rag/)  
- [Cortex Search Overview](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview)