# Llama3

Learn more about Llama3 in Meta's [release notes!](https://ai.meta.com/blog/meta-llama-3/)

Massive thank you to our friends at [Ollama](https://ollama.com/library/llama3:latest) for supporting this so quickly!

This notebook will:

1. Show you how to build a RAG system with Llama3, Ollama, Weaviate, and DSPy
2. Use DSPy's MIPRO optimizer to find the optimal RAG prompt for Llama3

Please note the optimal prompt is not the same for all language models! We have recently published a blog post explaining this [here](https://weaviate.io/blog/dspy-optimizers) if interested.

### Connect to Llama3 (hosted with Ollama) and Weaviate

In [None]:
import dspy
llama3_ollama = dspy.OllamaLocal(model="llama3:8b-instruct-q5_1", max_tokens=4000, timeout_s=480)

import weaviate
from dspy.retrieve.weaviate_rm import WeaviateRM
weaviate_client = weaviate.connect_to_local()
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client, k=10)

dspy.settings.configure(lm=llama3_ollama, rm=retriever_model)

In [None]:
llama3_ollama("say hello")

In [None]:
# genereate qa pairs
qdict = {}
qdict["What is the capital of France?"] = "Paris"
qdict["What is the capital of Germany?"] = "Berlin"
qdict["What is the capital of Italy?"] = "Rome"
qdict["What is the capital of Spain?"] = "Madrid"
qdict["What is the capital of Portugal?"] = "Lisbon"
qdict["What is the capital of the United Kingdom?"] = "London"
qdict["What is the capital of the United States?"] = "Washington, D.C."
qdict["What is the capital of Canada?"] = "Ottawa"
qdict["What is the capital of Mexico?"] = "Mexico City"
qdict["What is the capital of Brazil?"] = "Brasília"
qdict["What is the capital of Argentina?"] = "Buenos Aires"
qdict["What is the capital of Chile?"] = "Santiago"
qdict["What is the capital of Australia?"] = "Canberra"
qdict["What is the capital of New Zealand?"] = "Wellington"
qdict["What is the capital of Japan?"] = "Tokyo"
qdict["What is the capital of South Korea?"] = "Seoul"
gold_answers = []
queries = []
for key in qdict:
    queries.append(key)
    gold_answers.append(qdict[key])

### Load Dataset (Questions derived from Weaviate's Blog Posts)

In [None]:
# import json

# file_path = './WeaviateBlogRAG-0-0-0.json'
# with open(file_path, 'r') as file:
#     dataset = json.load(file)

# gold_answers = []
# queries = []

# for row in dataset:
#     gold_answers.append(row["gold_answer"])
#     queries.append(row["query"])
    
data = []

for i in range(len(gold_answers)):
    data.append(dspy.Example(gold_answer=gold_answers[i], question=queries[i]).with_inputs("question"))

trainset, devset, testset = data[:25], data[25:35], data[35:]

# Metric to Assess Response Quality 

In [None]:
class TypedEvaluator(dspy.Signature):
    """Evaluate the quality of a system's answer to a question according to a given criterion."""
    
    criterion: str = dspy.InputField(desc="The evaluation criterion.")
    question: str = dspy.InputField(desc="The question asked to the system.")
    ground_truth_answer: str = dspy.InputField(desc="An expert written Ground Truth Answer to the question.")
    predicted_answer: str = dspy.InputField(desc="The system's answer to the question.")
    rating: float = dspy.OutputField(desc="A float rating between 1 and 5. IMPORTANT!! ONLY OUTPUT THE RATING!!")


def MetricWrapper(gold, pred, trace=None):
    alignment_criterion = "How aligned is the predicted_answer with the ground_truth?"
    return dspy.TypedPredictor(TypedEvaluator)(criterion=alignment_criterion,
                                          question=gold.question,
                                          ground_truth_answer=gold.gold_answer,
                                          predicted_answer=pred.answer).rating

### DSPy RAG Program 

In [None]:
class GenerateAnswer(dspy.Signature):
    """Assess the the context and answer the question."""

    context = dspy.InputField(desc="Helpful information for answering the question.")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="A detailed answer that is supported by the context. ONLY OUTPUT THE ANSWER!!")
    
class RAG(dspy.Module):
    def __init__(self, k=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=k)
        self.generate_answer = dspy.Predict(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        pred = self.generate_answer(context=context, question=question).answer
        return dspy.Prediction(context=context, answer=pred, question=question)

# Run!

In [None]:
print(RAG()("What is binary quantization?").answer)

# Compile with MIPRO

What is the optimal prompt for Llama3 when answering questions about Weaviate?

Starting with the prompt,

`Assess the context and answer the question.`

DSPy's MIPRO optimizers finds better performance with,

`Given the provided context, your task is to understand the content and accurately answer the question based on the information available in the context. You should use formal English with technical terminologies where necessary and provide a detailed, relevant response.`


In [None]:
from dspy.teleprompt import MIPRO

import openai
gpt4 = dspy.OpenAI(model="gpt-4", max_tokens=4000, model_type="chat")

teleprompter = MIPRO(prompt_model=gpt4, 
                     task_model=llama3_ollama, 
                     metric=MetricWrapper, 
                     num_candidates=3, 
                     init_temperature=0.5)
kwargs = dict(num_threads=1, 
              display_progress=True, 
              display_table=0)
MIPRO_compiled_RAG = teleprompter.compile(RAG(), trainset=trainset[:5], num_trials=3, max_bootstrapped_demos=1, max_labeled_demos=0, eval_kwargs=kwargs)

In [None]:
MIPRO_compiled_RAG("what are cross encoders?").answer

In [None]:
llama3_ollama.inspect_history(n=1)