# A Getting Started Guide With Snowflake Arctic and Snowflake Cortex 

## Overview

Getting started with AI on enterprise data can seem overwhelming, between getting familiar with LLMs, how to perform custom prompt engineering, and how to get a wide range of LLMs deployed/integrated to run multiple tests all while keeping that valuable enterprise data secure. Well, a lot of these complexities are being abstracted away for you in Snowflake Cortex. 

In this guide, we will go through two flows – for the first three examples we will not have to worry about prompt engineering and, as a bonus, another example where we will build a prompt for a custom task and see [Snowflake Arctic](https://www.snowflake.com/en/data-cloud/arctic/) in action!

### What is Snowflake Cortex?
Snowflake Cortex is an intelligent, fully managed service that offers machine learning and AI solutions to Snowflake users. Snowflake Cortex capabilities include:

LLM Functions: SQL and Python functions that leverage large language models (LLMs) for understanding, querying, translating, summarizing, and generating free-form text.

ML Functions: SQL functions that perform predictive analysis such as forecasting and anomaly detection using machine learning to help you gain insights into your structured data and accelerate everyday analytics.

Learn more about [Snowflake Cortex](https://docs.snowflake.com/en/user-guide/snowflake-cortex/overview).

### What is Snowflake Arctic?

Snowflake Arctic is a family of enterprise-grade models built by Snowflake. The family includes a set of embedding models that excel in retrieval use cases and a general-purpose LLM that exhibits top-tier intelligence in enterprise tasks such as SQL generation, code generation, instruction following and more. All of these models are available for all types of academic and commercial use under an Apache 2.0 license. 

Learn more about [benchmarks and how Snowflake Arctic was built](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/).

### What You Will Learn

How to use Snowflake Arctic for custom tasks like summarizing long-form text into JSON formatted output using prompt engineering and Snowflake Cortex task-specific LLM functions to perform operations like translate text between languages or score the sentiment of a piece of text.

### What You Will Build

An interactive Streamlit application running in Snowflake.



## Setup

Prior to GenAI, a lot of the information was buried in text format and therefore going underutilized for root cause analysis due to complexities in implementing natural language processing. But with Snowflake Cortex it’s as easy as writing a SQL statement! 

In this guide, we'll utilize synthetic call transcripts data, mimicking text sources commonly overlooked by organizations, including customer calls/chats, surveys, interviews, and other text data generated in marketing and sales teams.

Let’s create the table and load the data.

### Create Table and Load Data

*Note: If you use different names for objects created in this section, be sure to update scripts and code in the following sections accordingly.*


In [None]:
USE ROLE ACCOUNTADMIN;

CREATE WAREHOUSE DASH_S WAREHOUSE_SIZE=SMALL;
CREATE DATABASE DASH_DB;
CREATE SCHEMA DASH_SCHEMA;

USE DASH_DB.DASH_SCHEMA;
USE WAREHOUSE DASH_S;

CREATE or REPLACE file format csvformat
  SKIP_HEADER = 1
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  type = 'CSV';

CREATE or REPLACE stage call_transcripts_data_stage
  file_format = csvformat
  url = 's3://sfquickstarts/misc/call_transcripts/';

CREATE or REPLACE table CALL_TRANSCRIPTS ( 
  date_created date,
  language varchar(60),
  country varchar(60),
  product varchar(60),
  category varchar(60),
  damage_type varchar(90),
  transcript varchar
);

COPY into CALL_TRANSCRIPTS
  from @call_transcripts_data_stage;

## Snowflake Cortex

Given the data in `call_transcripts` table, let’s see how we can use Snowflake Cortex. It offers access to industry-leading AI models, without requiring any knowledge of how the AI models work, how to deploy LLMs, or how to manage GPU infrastructure.

### Translate
Using Snowflake Cortex function **snowflake.cortex.translate** we can easily translate any text from one language to another. Let’s see how easy it is to use this function.


In [None]:
select snowflake.cortex.translate('wie geht es dir heute?','de','en');

Now let’s see how you can translate call transcripts from German to English in batch mode using just SQL.

In [None]:
select transcript,snowflake.cortex.translate(transcript,'de','en') from call_transcripts where language = 'German';

### Sentiment Score
Now let’s see how we can use **snowflake.cortex.sentiment** function to generate sentiment scores on call transcripts. 

*Note: Score is between -1 and 1; -1 = most negative, 1 = positive, 0 = neutral*


In [None]:
select transcript, snowflake.cortex.sentiment(transcript) from call_transcripts where language = 'English';

### Summarize
Now that we know how to translate call transcripts in English, it would be great to have the model pull out the most important details from each transcript so we don’t have to read the whole thing. Let’s see how **snowflake.cortex.summarize** function can do this and try it on one record.

In [None]:
select transcript,snowflake.cortex.summarize(transcript) as summary from call_transcripts where language = 'English' limit 1;


### Summary with tokens count

*Note: Snowflake Cortex LLM functions incur compute cost based on the number of tokens processed. Refer to the [consumption table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf#page=9) for each function’s cost in credits per million tokens.*

In [None]:

select transcript,snowflake.cortex.summarize(transcript) as summary,snowflake.cortex.count_tokens('summarize',transcript) as number_of_tokens from call_transcripts where language = 'English' limit 1;

## Snowflake Arctic

### Prompt Engineering
Being able to pull out the summary is good, but it would be great if we specifically pull out the product name, what part of the product was defective, and limit the summary to 200 words. Let’s see how we can accomplish this using the **snowflake.cortex.complete** function.

*Note: Besides Snowflake Arctic you can also [use other supported LLMs](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#availability) in Snowflake with snowflake.cortex.complete function.*

In [None]:
SET prompt = 
'### 
Summarize this transcript in less than 200 words. 
Put the product name, defect and summary in JSON format. 
###';

select snowflake.cortex.complete('snowflake-arctic',concat($prompt,transcript)) as summary
from call_transcripts where language = 'English' limit 1;

## Streamlit Application

Now let's put it all together in a Streamlit application.

In [None]:
import streamlit as st
from snowflake.snowpark.context import get_active_session
session = get_active_session()
# Add a query tag to the session. This helps with performance monitoring and troubleshooting
session.query_tag = {"origin":"sf_sit-is", 
                     "name":"aiml_notebooks_artic_cortex", 
                     "version":{"major":1, "minor":0},
                     "attributes":{"is_quickstart":1, "source":"notebook"}}

sample_transcript = """Customer: Hello!
Agent: Hello! I hope you're having a great day. To best assist you, can you please share your first and last name and the company you're calling from?
Customer: Sure, I'm Michael Green from SnowSolutions.
Agent: Thanks, Michael! What can I help you with today?
Customer: We recently ordered several DryProof670 jackets for our store, but when we opened the package, we noticed that half of the jackets have broken zippers. We need to replace them quickly to ensure we have sufficient stock for our customers. Our order number is 60877.
Agent: I apologize for the inconvenience, Michael. Let me look into your order. It might take me a moment.
Customer: Thank you."""

def process_request(req,txt,from_lang='',to_lang=''):
    with st.status("In progress...") as status:
        txt = txt.replace("'", "\\'")
        cortex_response = ''
        if req == 'sentiment':
            cortex_response = session.sql(f"select snowflake.cortex.sentiment('{txt}') as sentiment").to_pandas().iloc[0]['SENTIMENT']
            st.caption("Note: Score is between -1 and 1; -1 = Most negative, 1 = Positive, 0 = Neutral")  
        elif req == 'summary':
            prompt = f"Summarize this transcript in less than 200 words. Put the product name, defect if any, and summary in JSON format: {txt}"
            cortex_prompt = "'[INST] " + prompt + " [/INST]'"
            cortex_response = session.sql(f"select snowflake.cortex.complete('snowflake-arctic', {cortex_prompt}) as summary").to_pandas().iloc[0]['SUMMARY']
        else:
            cortex_response = session.sql(f"""select snowflake.cortex.translate('{txt}','{supported_languages[from_language]}','{supported_languages[to_language]}') 
                                            as translation""").to_pandas().iloc[0]['TRANSLATION']
        st.write(cortex_response)
    status.update(label="Done!", state="complete", expanded=True)

with st.container():
    entered_text = st.text_area("Enter text",height=200,value=sample_transcript)
    col1,col2,col3 = st.columns(3)

    with col1:
        st.subheader("JSON Summary")
        btn_summarize = st.button("Summarize")
        if entered_text and btn_summarize:
            process_request('summary',entered_text)
            
    with col2:
        st.subheader("Sentiment Analysis")
        btn_sentiment = st.button("Sentiment Score")
        if entered_text and btn_sentiment:
            process_request('sentiment',entered_text)
                
    with col3:
        st.subheader("Translate")
        supported_languages = {'German':'de','French':'fr','Korean':'ko','Portuguese':'pt','English':'en','Italian':'it','Russian':'ru','Swedish':'sv','Spanish':'es','Japanese':'ja','Polish':'pl'}
        col_to,col_from = st.columns(2)
        with col_to:
            from_language = st.selectbox('From',dict(sorted(supported_languages.items())))
        with col_from:
            to_language = st.selectbox('To',dict(sorted(supported_languages.items())))
        btn_translate = st.button("Translate")
        if entered_text and btn_translate:
            process_request('translate',entered_text,supported_languages[from_language],supported_languages[to_language])

## Conclusion And Resources

Congratulations! You've successfully completed the Getting Started with Snowflake Arctic and Snowflake Cortex quickstart guide. 

### What You Learned

- How to use Snowflake Arctic for custom tasks like summarizing long-form text into JSON formatted output using prompt engineering and Snowflake Cortex task-specific LLM functions to perform operations like translate text between languages or score the sentiment of a piece of text.
- How to build an interactive Streamlit application running in Snowflake.

### Related Resources

- [Snowflake Cortex: Overview](https://docs.snowflake.com/en/user-guide/snowflake-cortex/overview)
- [Snowflake Cortex: LLM Functions](https://docs.snowflake.com/user-guide/snowflake-cortex/llm-functions)
- [Snowflake Cortex: LLM Functions Cost Considerations](https://docs.snowflake.com/user-guide/snowflake-cortex/llm-functions#cost-considerations) and [Consumption Table](https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf#page=9)
- [Snowflake Cortex: ML Functions](https://docs.snowflake.com/en/guides-overview-ml-functions)
- [Snowflake Arctic: Hugging Face](https://huggingface.co/Snowflake/snowflake-arctic-instruct)
- [Snowflake Arctic: Cookbooks](https://www.snowflake.com/en/data-cloud/arctic/cookbook/)
- [Snowflake Arctic: Benchmarks](https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/)
- [Snowflake Arctic: GitHub repo](https://github.com/Snowflake-Labs/snowflake-arctic)