# Data Analysis with LLM in Snowflake Notebooks

A notebook that answer questions about data via the use of an LLM reasoning model namely the DeepSeek-R1.

Here's what we're implementing to investigate the tables:
1. Retrieve penguins data
2. Convert table to a DataFrame
3. Create a text box for accepting user input
4. Generate LLM response to answer questions about the data

## 1. Retrieve penguins data

We'll start by performing a simple SQL query to retrieve the penguins data.

In [None]:
SELECT * FROM CHANINN_DEMO_DATA.PUBLIC.PENGUINS

## 2. Convert table to a DataFrame

Next, we'll convert the table to a Pandas DataFrame.

In [None]:
sql_output.to_pandas()

## 3. Create helper functions

Here, we'll create several helper functions that will be used in the forthcoming app that we're developing.
1. `generate_deepseek_response()` - accepts user-provided `prompt` as input query model. Briefly, the input box allow users to ask questions about data and that will be assigned to the `prompt` variable.

In [None]:
# Helper function
def generate_deepseek_response(prompt):
    cortex_prompt = f"'[INST] {prompt} [/INST]'"
    prompt_data = [{'role': 'user', 'content': cortex_prompt}]
    prompt_json = escape_sql_string(json.dumps(prompt_data))
    response = session.sql(
        "select snowflake.cortex.complete(?, ?)", 
        params=['deepseek-r1', prompt_json]
    ).collect()[0][0]
    
    return response

def extract_think_content(response):
    think_pattern = r'<think>(.*?)</think>'
    think_match = re.search(think_pattern, response, re.DOTALL)
    
    if think_match:
        think_content = think_match.group(1).strip()
        main_response = re.sub(think_pattern, '', response, flags=re.DOTALL).strip()
        return think_content, main_response
    return None, response

def escape_sql_string(s):
    return s.replace("'", "''")

## Create the Asking about Penguins app

Now that we have the data and helper functions ready, let's wrap up by creating the app.



In [None]:
import streamlit as st
from snowflake.snowpark.context import get_active_session
import json
import pandas as pd
import re

# Write directly to the app
st.title("🐧 Ask about Penguins")

# Get the current credentials
session = get_active_session()

# df = sql_output.to_pandas()

user_queries = ["Which penguins has the longest bill length?",
                "Where do the heaviest penguins live?",
                "Which penguins has the shortest flippers?"]

question = st.selectbox("What would you like to know?", user_queries)
# question = st.text_input("Ask a question", user_queries[0])

prompt = [
    {
        'role': 'system',
        'content': 'You are a helpful assistant that uses provided data to answer natural language questions.'
    },
    {
        'role': 'user',
        'content': (
            f'The user has asked a question: {question}. '
            f'Please use this data to answer the question: {df.to_markdown(index=False)}'
        )
    },
    {
        'temperature': 0.7,
        'max_tokens': 1000,
        'guardrails': True
    }
]

df

if st.button("Submit"):
    status_container = st.status("Thinking ...", expanded=True)
    with status_container:
        response = generate_deepseek_response(prompt)
        think_content, main_response = extract_think_content(response)
        if think_content:
            st.write(think_content)
                
    status_container.update(label="Thoughts", state="complete", expanded=False)
    st.markdown(main_response)

## Want to learn more?

- More about [palmerpenguins](https://allisonhorst.github.io/palmerpenguins/) data set.
- More about [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake)
- For more inspiration on how to use Streamlit widgets in Notebooks, check out [Streamlit Docs](https://docs.streamlit.io/) and this list of what is currently supported inside [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-use-with-snowflake#label-notebooks-streamlit-support)