# DSPY + Snowflake: Towards Secure and Future-Proof LLM Pipelines

DSPy is an open source framework for declaritively building LLM pipelines and automatically optimizing prompts. Snowflake Cortex is a managed LLM service that allows users to leverage LLM's without taking data out of Snowflake.

This notebook is based on the [DSPY tutorials](https://dspy-docs.vercel.app/docs/category/tutorials) and will walk you through how to use DSPy to do: 
* Basic RAG - Build a simple RAG program using the declaritive programming paradigm and Snowflake Cortex LLMs 
* End to End RAG in Snowflake - RAG example using a knowledge base and embeddings stored in Snowflake using the DSPy Snowflake Retriever
* Multi Hop RAG - Build an architecture that can break down complex questions and ask follow ups
* Pipeline Optimization - Automatically optimize Snowflake Cortex prompts to eliminate the need for manual prompt engineering

# DSPY Setup

If you don't already have dspy and the snowpark dependencies installed on your machine, you can install them with pip:

In [None]:
!pip install dspy-ai[snowflake]

The fundamental elements of a RAG architecture include:
* A Language Model (LM) - Given context from a particular knowledge base, a LM generates the response to the user's prompt.
* Knowledge Base - A database with passages and embeddings of content that will be required to generate the desired response.
* Retriever - A mechanism for retrieving the relevant context required to generate a response to the user's prompt.



To start, we will import the requirements for our program and load our Snowflake credentials.

In [1]:
import dspy
from dspy.evaluate.evaluate import Evaluate
from dspy.retrieve.snowflake_rm import SnowflakeRM
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
from dsp.utils import deduplicate
from snowflake.snowpark import Session
import os
import warnings
warnings.filterwarnings("ignore")

connection_parameters = {
    
    "account": os.getenv('SNOWFLAKE_ACCOUNT'),
    "user": os.getenv('SNOWFLAKE_USER'),
    "password": os.getenv('SNOWFLAKE_PASSWORD'),
    "role": os.getenv('SNOWFLAKE_ROLE'),
    "warehouse": os.getenv('SNOWFLAKE_WAREHOUSE'),
    "database": os.getenv('SNOWFLAKE_DATABASE'),
    "schema": os.getenv('SNOWFLAKE_SCHEMA')}  


Below we configure the basic program requirements:
* LM: A Snowflake Cortex hosted Mixtral 8x7B model
* Knowledge Base: Publicly hosted Wikipedia Abstracts from [2017 dump](https://hotpotqa.github.io/wiki-readme.html)
* Retriever: Colbert V2

In [39]:
turbo = dspy.Snowflake(model="mixtral-8x7b",credentials=connection_parameters)
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(lm=turbo,rm=colbertv2_wiki17_abstracts)

# Basic RAG Example

Given a user's query, the most simple RAG architecture will 
1) Retrieve the K most relevant passages from our knowledge base for the user's query
2) Generate a response to the query utilizing the the relevant passages retrieved in step 1.

The building blocks of DSPy include:
- DSPy Signatures to define the expected inputs and outputs of the program
- DSPy Modules to define the core flow of your program

In [56]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 10 words")

class RAG(dspy.Module):
    def __init__(self, num_passages=5):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)


Notice above that the implementation of our RAG pipeline is decoupled from our underlying data, from our language model, and from our prompt. `dspy.Retrieve` and `dspy.ChainOfThought` will use the user-configured retriever (`colbertv2_wiki17_abstracts`) and the user-configured language model (`turbo`) when the RAG pipeline is called.

In [4]:
rag = RAG()
rag("What castle did David Gregory inherit?")

Prediction(
    context=['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of 

We can see that the pipeline returns a dspy.Predicition object that contains the relevant context that was retrieved from our knowledge base and the final answer generated by the language model using Chain of Thought.

In [6]:
rag("How many stories did the castle that David Gregory inherited have?")

Prediction(
    context=['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," and one of the additions to her six-part series on the Tudor royals. (The other titles in the series are "The Constant Princess," 

Note, while the basic RAG pipeline we built can answer the first simple question, it struggles when more complex reasoning is required. While our retriever is able to effectively get information about David Gregory and which castle he inherited, additional queries are required to retrieve information about the characteristics of the castle. We will explore building a more complex reasoning pipeline later in this notebook. 

# DSPy Snowlake Retriever 

The first example uses a Snowflake Cortex hosted Mixtral 8x7B model with and open source retriever and knowledge base (CoblertV2 server with the Wikipedia abstracts). But what if I want to securely store and manage my data inside of Snowflake?  

### What if my knowledge base and embeddings are stored in Snowflake?

We can utilize DSPY's Snowflake Retriever Module (RM) to point at whichever Snowflake table contains our embeddings. If our embeddings are not yet stored in Snowflake, but we have the raw data in a local directory or a Snowflake stage, we can easily generate and load the embeddings into a new Snowflake table using the SnowVecDB helper, which you can find [here](to_be_updated).

To demonstrate the DSPy Snowflake RM in this notebook, we've used `SnowVectorDB` to generate the embeddings for the latest Snowflake 10Ks from the investor relations page [here](https://investors.snowflake.com/financials/sec-filings/default.aspx).

#### Prepare the Embeddings [Optional]

Below we create a new `10K_EMBEDDINGS` table using the annual reports that we've downloaded to a local directory. However, if your data is already in a Snowflake Stage, instead of using a local directory, you can point the helper module to generate the embeddings using your staged files, by using the `stage` argument in SVDB.

In [None]:
from snowvecdb import SnowVectorDB
snowpark =Session.builder.configs(connection_parameters).create()
SVDB = SnowVectorDB(snowflake_session=snowpark,chunk_size=500,chunk_overlap=75)
SVDB(vector_table_name="10K_EMBEDDINGS",data_source_directory="/your_local_directory_with_files")

#### Setup the Snowflake Retriever

With your embeddings in a Snowflake table, you can now setup the DSPy Snowflake retriever. 

The SVDB helper above creates the embeddings table with generic names for the passage (CHUNK) and the related embedding (CHUNK_VEC). By default, `SnowflakeRM` assumes your embeddings table has these headers, but you can easily ovverride them with the `embeddings_field` and `embeddings_field_text` arguments.

In [4]:
snowflake_retriever = SnowflakeRM(snowflake_table_name="10K_EMBEDDINGS",snowflake_credentials=connection_parameters)
dspy.settings.configure(lm=turbo,rm=snowflake_retriever)

#### Test the Snowflake Retriever

Above we update our DSPy settings to use the snowflake retriever instead of `colbertv2_wiki17_abstracts` so future calls to the same RAG pipeline will use the `10K_EMBEDDINGS` table under the hood. 

In [8]:
rag = RAG()
rag("In what year did Snowflake IPO?")

Prediction(
    context=['Our platform is the innovative technology that powers the Data Cloud, enabling customers to consolidate data into a single source of truth to drive meaningful insights, apply\nAI to solve business problems, build data applications, and share data and data products. We provide our platform through a customer-centric, consumption-based business model,\nonly charging customers for the resources they use.\nSnowflake solves the decades-old problem of data silos and data governance. Leveraging the elasticity and performance of the public cloud, our platform enables customers to\nunify and query data to support a wide variety of use cases. It also provides frictionless and governed data access so users can securely share data inside and outside of their\norganizations, generally without copying or moving the underlying data. As a result, customers can blend existing data with new data for broader context, augment data science\nefforts, and create new monetization str

Note, that Snowflake IPO'd in 2020 and our original wikipedia retriever only contains abstracts from 2017, so we confirm that the new retriever is working as intended.

In [9]:
rag("In what fiscal year did snowflake IPO?")

Prediction(
    context=['fiscal year ended January 31, 2021 filed with the SEC on March 31, 2021.\nOverview\nWe believe in a data connected world where organizations have seamless access to explore, share, and unlock the value of data. To realize this vision, we deliver the Data\nCloud, a network where Snowflake customers, partners, data providers, and data consumers can break down data silos and derive value from rapidly growing data sets in secure,\ngoverned, and compliant ways.\nOur platform is the innovative technology that powers the Data Cloud, enabling customers to consolidate data into a single source of truth to drive meaningful business insights,\nbuild data-driven applications, and share data. We provide our platform through a customer-centric, consumption-based business model, only charging customers for the resources\nthey use.\nOur cloud-native architecture consists of three independently scalable but logically integrated layers across storage, compute, and cloud service

# Multi Hop RAG

The zero shot RAG examples above can struggle to find the right answer in some cases. To build a more effective program, we can build a pipeline that has the ability to generate follow up questions and answers, as seen below:

In [47]:
class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()


class MultiHopPipeline(dspy.Module):
    def __init__(self, passages_per_hop=5, max_hops=3):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)

        return dspy.Prediction(context=context,answer=pred.answer)

#### Now, let's switch back to using the Wikipedia knowledge base to see if the multi hop approach can answer the question 

In [48]:
dspy.settings.configure(rm=colbertv2_wiki17_abstracts)
multi_rag = MultiHopPipeline()
multi_rag("How many stories did the castle that David Gregory inherited have?")

Prediction(
    context=['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'The Boleyn Inheritance | The Boleyn Inheritance is a novel by British author Philippa Gregory which was first published in 2006. It is a direct sequel to her previous novel "The Other Boleyn Girl," and one of the additions to her six-part series on the Tudor royals. (The other titles in the series are "The Constant Princess," 

Indeed, the more advanced reasoning in our `MultiHopPipeline` allows us to get to the correct answer now. Below, we can see what's happening under the hood by inspecting the calls to Snowflake Cortex with the `inspect_history` method.

In [49]:
turbo.inspect_history(n=3)




Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «The Boleyn Inheritance | The Boleyn Inheritance is a novel by British

'\n\n\nWrite a simple search query that will help answer a complex question.\n\n---\n\nFollow the following format.\n\nContext: may contain relevant facts\n\nQuestion: ${question}\n\nReasoning: Let\'s think step by step in order to ${produce the query}. We ...\n\nQuery: ${query}\n\n---\n\nContext:\n[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»\n[2] «The Boleyn Inheritance | The Boleyn Inheritan

# LLM Evaluation

There a various ways to evaluate the performance of RAG systems. Fortunately with DSPy, evaluation metrics are easy to define. 

Below we'll demonstrate two different approaches:

* Exact Answer Match: Using DSPy utils to evaluate whether the generated response is an exact match of the ground truth answer
* Semantic Match: Using an LLM as a Judge to determine whether the answer is correct


**Note: because these LLMs exhibit non-deterministic behavior, as you rerun the cells below your results may vary.**


### Training Data Ingestion

For evaluation purposes, we'll use a standard industry data set with labels, HotPotQA, so will switch back to using our Wikipedia retriever.

In [60]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=50, eval_seed=1000, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

dspy.settings.configure(rm=colbertv2_wiki17_abstracts)

(50, 50)

### Exact Answer Evaluation

The most stringent evaluation metric requires that our output exactly match the ground truth answer.

In [16]:
def validate_exact_answer(example, pred, trace=None):
    if not dspy.evaluate.answer_exact_match(example, pred): return False
   
    return True

evaluate = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=0)
evaluate(RAG(),validate_exact_answer)

Average Metric: 7 / 50  (14.0): 100%|██████████| 50/50 [01:45<00:00,  2.11s/it]


14.0

### Semantic Evaluation

For some use cases, an exact match metric may be too restrictive. If we don't care about the exact language and simply want to evalute whether the generated response is factually correct, we can use another LLM in order to do so (sometimes known as the LLM as Judge approach).

DSPy allows us to use arbitrary user defined methods for our evaluation metrics, so we're able to use another DSPy program to evaluate the peformance of our primary 
agent's predictions. Below is an example of how we can do this by defining a semantic similarity metric that uses a separate DSPy program for evaluation.

In [20]:
class Judge(dspy.Signature):
    """Judge if the predicted answer contains the correct response based on the ground truth answer."""

    ground_truth = dspy.InputField(desc="ground truth")
    prediction = dspy.InputField(desc="predicted answer")
    assessment_answer: bool = dspy.OutputField(desc="only True or False without any rationale")

judge = dspy.ChainOfThought(Judge)

def semantic_similarity(example, pred, trace=None):
    
    equivalent = judge(ground_truth=example.answer, prediction=pred.answer)
        
    return True if "true" in equivalent.assessment_answer.lower() else False

## Performance Comparison: Llama vs. Mixtral

Let's evaluate the performance of our zero shot pipeline using 2 different models -  a 7B parameter model (Mixtral) vs. a 21B parameter model (Reka)

### Mixtral 8x7b Performance - Out of the Box

In [61]:
evaluate(RAG(),semantic_similarity)

Average Metric: 27 / 50  (54.0): 100%|██████████| 50/50 [03:09<00:00,  3.78s/it]


54.0

### LLama3-70B Performance - Out of the Box

To evaluate the performance of the Mistral Large model, all we need to do is change the dspy context. We don't need to change anything about the RAG pipeline.

In [63]:
llama_turbo = dspy.Snowflake(model="llama3-70b",credentials=connection_parameters)
dspy.settings.configure(lm=llama_turbo,rm=colbertv2_wiki17_abstracts)

evaluate(RAG(),semantic_similarity)

Average Metric: 31 / 50  (62.0): 100%|██████████| 50/50 [06:39<00:00,  7.99s/it]


62.0

#### Llama3 70B outperforms Mixtral by 15%

 &nbsp;

# Pipeline Optimization

DSPy's built-in optimizers let us tune our LLM pipelines, automatically adjusting our prompts and LM weights to improve performance. Below we utilize the `BootstrapFewShowWithRandomSearch` optimizer to maximize our `semantic_similarity` metric.

### Mixtral 8x7B - Optimized Peformance

In [64]:
optimizer = BootstrapFewShotWithRandomSearch(metric=semantic_similarity)
optimized_pipeline = optimizer.compile(RAG(), teacher=RAG(), trainset=trainset)
evaluate(optimized_pipeline,semantic_similarity)

Average Metric: 31 / 50  (62.0): 100%|██████████| 50/50 [01:20<00:00,  1.61s/it]
Average Metric: 35 / 50  (70.0): 100%|██████████| 50/50 [01:15<00:00,  1.51s/it]
 10%|█         | 5/50 [00:27<04:10,  5.57s/it]
Average Metric: 35 / 50  (70.0): 100%|██████████| 50/50 [00:57<00:00,  1.14s/it]
 18%|█▊        | 9/50 [00:51<03:53,  5.70s/it]
Average Metric: 31 / 50  (62.0): 100%|██████████| 50/50 [00:59<00:00,  1.18s/it]
  6%|▌         | 3/50 [00:14<03:48,  4.87s/it]
Average Metric: 33 / 50  (66.0): 100%|██████████| 50/50 [00:51<00:00,  1.03s/it]
  2%|▏         | 1/50 [00:06<05:16,  6.45s/it]
Average Metric: 34 / 50  (68.0): 100%|██████████| 50/50 [00:57<00:00,  1.16s/it]
  4%|▍         | 2/50 [00:09<03:43,  4.66s/it]
Average Metric: 31 / 50  (62.0): 100%|██████████| 50/50 [00:52<00:00,  1.06s/it]
 12%|█▏        | 6/50 [00:28<03:28,  4.74s/it]
Average Metric: 35 / 50  (70.0): 100%|██████████| 50/50 [00:50<00:00,  1.00s/it]
  6%|▌         | 3/50 [00:16<04:16,  5.46s/it]
Average Metric: 13 / 17

Average Metric: 32 / 50  (64.0): 100%|██████████| 50/50 [04:24<00:00,  5.29s/it]


64.0

### Optimizer delivers almost 20% performance improvement in Mixtral pipeline and matches performance Llama3-70B performance.

&nbsp;

### What's going on under the hood?

Our Optimized Pipeline includes question/answer examples from the training data and examples of Q&A responses that have been generated by our teacher 
program during the Boostrapping process

In [34]:
optimized_pipeline("What castle did David Gregory inherit?")
turbo.inspect_history()




Answer questions with short factoid answers.

---

Question: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?
Answer: Rosario Dawson

Question: which American actor was Candace Kita guest starred with
Answer: Bill Murray

Question: Who composed "Sunflower Slow Drag" with the King of Ragtime?
Answer: Scott Hayden

Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?
Answer: Kerry Condon

Question: Tombstone stared an actor born May 17, 1955 known as who?
Answer: Bill Paxton

Question: Remember Me Ballin' is a CD single by Indo G that features an American rapper born in what year?
Answer: 1979

Question: What evening cable television station programming block has a show with Ashley Holliday as a cast member?
Answer: Nick at Nite

Question: Which is taller, the Empire State Building or the Bank of America Tower?
Answer

'\n\n\nAnswer questions with short factoid answers.\n\n---\n\nQuestion: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?\nAnswer: Rosario Dawson\n\nQuestion: which American actor was Candace Kita guest starred with\nAnswer: Bill Murray\n\nQuestion: Who composed "Sunflower Slow Drag" with the King of Ragtime?\nAnswer: Scott Hayden\n\nQuestion: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?\nAnswer: Kerry Condon\n\nQuestion: Tombstone stared an actor born May 17, 1955 known as who?\nAnswer: Bill Paxton\n\nQuestion: Remember Me Ballin\' is a CD single by Indo G that features an American rapper born in what year?\nAnswer: 1979\n\nQuestion: What evening cable television station programming block has a show with Ashley Holliday as a cast member?\nAnswer: Nick at Nite\n\nQuestion: Which is taller, the Empire State Building or the