In [1]:
import json
import time

import kscope

# Getting Started

There is a bit of documentation on how to interact with the large models [here](https://kaleidoscope-sdk.readthedocs.io/en/latest/). The relevant github links to the SDK are [here](https://github.com/VectorInstitute/kaleidoscope-sdk) and underlying code [here](https://github.com/VectorInstitute/kaleidoscope).

First we connect to the service through which we'll interact with the LLMs and see which models are available to us

In [2]:
# Establish a client connection to the kscope service
client = kscope.Client(gateway_host="llm.cluster.local", gateway_port=3001)

Show all supported models

In [3]:
client.models

['gpt2',
 'llama2-7b',
 'llama2-7b_chat',
 'llama2-13b',
 'llama2-13b_chat',
 'llama2-70b',
 'llama2-70b_chat',
 'falcon-7b',
 'falcon-40b',
 'sdxl-turbo']

Show all model instances that are currently active

In [4]:
client.model_instances

[{'id': 'a33c0f4d-da2b-4861-8c3d-91e66955e879',
  'name': 'falcon-7b',
  'state': 'ACTIVE'},
 {'id': '7389b196-9637-4a42-adca-7bfb4f59733d',
  'name': 'llama2-7b',
  'state': 'ACTIVE'},
 {'id': 'b486d208-a570-47ed-bb35-9de310f9cd02',
  'name': 'llama2-70b',
  'state': 'ACTIVE'},
 {'id': 'b3871a00-4848-49be-a1c8-c8f6c47ad8b2',
  'name': 'falcon-40b',
  'state': 'ACTIVE'},
 {'id': '99bee87e-abc4-44fd-b4d3-ea2c527bb93e',
  'name': 'llama2-13b',
  'state': 'ACTIVE'}]

To start, we obtain a handle to a model. In this example, let's use the Falcon 7B parameter model.

In [5]:
model = client.load_model("falcon-40b")
# If this model is not actively running, it will get launched in the background.
# In this case, wait until it moves into an "ACTIVE" state before proceeding.
while model.state != "ACTIVE":
    time.sleep(1)

We need to configure the model to generate in the way we want it to. So we set a number of important parameters. For a discussion of the configuration parameters see: `src/reference_implementations/prompting_vector_llms/CONFIG_README.md`

**NOTE**: We'll be doing deterministic sampling for most of this notebook ("do_sample" is False by default). However, you can turn sampling on, as we did in the `short_generation_config` by setting `"do_sample": True` in the `long_generation_config` that is used throughout this notebook.

In [6]:
short_generation_config = {"max_tokens": 20, "top_p": 1.0, "temperature": 0.7, "do_sample": True}
long_generation_config = {"max_tokens": 50, "top_p": 1.0}

Let's try a basic prompt for factual information.

__Note__ that if you run the cell multiple times, you'll get different responses due to sampling in the short config.

In [7]:
generation = model.generate("What is the capital of Canada?", short_generation_config)
# Extract the text from the returned generation
generation.generation["sequences"][0]

'\n- Ottawa\nCanada is a federal state with ten provinces and three territories. Ottawa is the capital'

### Some __Very__ Basic Retrieval Augmented Generation (RAG) Examples

In this notebook, we'll consider how RAG improves a language model's ability to answer questions in a factually correct way. The main driver of RAG's benefits is the presentation of a model with "reference" material that is useful for the answering of questions. The key term here is __"useful."__ RAG is only as powerful as the relevancy of the references material that is retrieved and presented to the model to help it answer the questions being asked. 

In this notebook, we are going to leave the "retrieval" part of RAG out of consideration. While this is a __big__ assumption, the goal here is to demonstrate why good RAG methods can be so important and how they can better "tailor" a model to a particular settings without fine-tuning or how RAG can fill gaps left in fine-tuning as a model ages.

So, the question here is: Given an effective reference retrieval mechanisms, what does it do for us in terms of improving model responses?

First, let's load some reference information encoded in a JSON format. Feel free to inspect the contents in the resources folder.

In [8]:
sports_json_path = "resources/sports_reference.json"
weather_json_path = "resources/past_weather.json"

with open(sports_json_path, "r") as f:
    sports_json = json.load(f)

with open(weather_json_path, "r") as f:
    weather_json = json.load(f)

This will create a rag prompt for us. If the reference string is non-empty, the prompt will be prefixed with some reference information in a JSON format.

__NOTE__: Theoretically, we could attempt to parse the JSON string in a way that is more "natural" with respect to language. However, we'll see below that the model is actually able to leverage raw JSON fairly effectively.

In [9]:
def create_rag_prompt(question: str, reference: str) -> str:
    non_reference = f"Question: {question}\n\nAnswer:"
    if len(reference) > 0:
        return f"Reference: {reference}\n\n{non_reference}"
    return non_reference

In [10]:
sports_json

{'all_pro_rbs_by_year': {'2014': ['DeMarco Murray', "Le'Veon Bell"],
  '2015': ['Adrian Peterson', 'Doug Martin'],
  '2016': ['Ezekiel Elliott', 'David Johnson'],
  '2017': ['Todd Gurley', "Le'Veon Bell"],
  '2018': ['Todd Gurley', 'Saquon Barkley'],
  '2019': ['Christian McCaffrey', 'Derrick Henry', 'Dalvin Cook'],
  '2020': ['Derrick Henry', 'Dalvin Cook'],
  '2021': ['Jonathan Taylor', 'Nick Chubb', 'Joe Mixon'],
  '2022': ['Josh Jacobs', 'Nick Chubb', 'Saquon Barkley', 'Derrick Henry'],
  '2023': ['Christian McCaffrey', 'Kyren Williams', 'Raheem Mostert']},
 'marathon_world_records': [{'name': 'Kelvin Kiptum',
   'date': 'October 8, 2023',
   'time': '2:00:35'},
  {'name': 'Eliud Kipchoge', 'date': 'September 25, 2022', 'time': '2:01:09'},
  {'name': 'Eliud Kipchoge', 'date': 'September 16, 2018', 'time': '2:01:39'},
  {'name': 'Dennis Kimetto', 'date': 'September 28, 2014', 'time': '2:02:57'},
  {'name': 'Wilson Kipsang', 'date': 'September 29, 2013', 'time': '2:03:23'},
  {'name'

### All-Pro NFL Running Backs

In this section, we ask the model a fairly straight-forward question. That is, who were the all-pro NFL running backs in various years. There are a few things at play here that are worth noting. Falcon was trained prior to Jun, 2023, which is well before the all-pro NFL players are announced. So Falcon actually has no way of "knowing" this information for 2023 without some sort of reference material. Essentially, it's "world-view" pre-dates the event being asked about.

The other years being queried occurred before the model was trained and technically could be part of its "knowledge." So it's possible that the model may answer these questions accurately without RAG.

In [10]:
QUESTION_1 = "Who were the all-pro NFL running backs in 2023?"
QUESTION_2 = "Who were the all-pro NFL running backs in 2019?"
QUESTION_3 = "Who were the all-pro NFL running backs in 2015?"

In [11]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_1, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: Who were the all-pro NFL running backs in 2023?

Answer:
 The all-pro NFL running backs in 2023 were:

1. **Clyde Edwards-Helaire**
2. **Alvin Kamara**
3. **Saquon Barkley**



As discussed above, Falcon cannot possibly "know" the all-pro NFL RBs from 2023 given when it was trained. It essentially has no choice but to "hallucinate" a response. 

For example:
* Clyde Edwards-Helaire was a backup on the Chiefs
* Alvin Kamara was suspended for six-games to start the season and was fine for the remainder.

Both were not all-pro running backs, nor was Saquon, though he was a better player than the first two.

In [12]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_2, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: Who were the all-pro NFL running backs in 2019?

Answer:


A. 1,000-yard rushers

B. 1,000-yard rushers and 1,000-yard receivers

C. 1,000-yard rushers and 1,000-


This zero-shot prompt example doesn't work much better. Even though Falcon could, theoretically, "know" who the running backs named to the all-pro list were in 2019, it doesn't answer our question, at least directly, in the first 50 tokens of generation.

In [13]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_3, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: Who were the all-pro NFL running backs in 2015?

Answer:


A. Adrian Peterson

B. Le'Veon Bell

C. Todd Gurley

D. Ezekiel Elliott

E. None of the above

Answer:

A. Adrian Peterson

B.


For this question, the model doesn't really do what it's supposed to do (i.e. answer the question), but it at least names some relevant backs from the period. Adrian Peterson was, in fact, an all-pro running back in 2015. So we're sort of on the right track, but it's still disappointing.

#### RAG Examples

In [14]:
rag_prompt = create_rag_prompt(reference=sports_json["all_pro_rbs_by_year"], question=QUESTION_1)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: {'2014': ['DeMarco Murray', "Le'Veon Bell"], '2015': ['Adrian Peterson', 'Doug Martin'], '2016': ['Ezekiel Elliott', 'David Johnson'], '2017': ['Todd Gurley', "Le'Veon Bell"], '2018': ['Todd Gurley', 'Saquon Barkley'], '2019': ['Christian McCaffrey', 'Derrick Henry', 'Dalvin Cook'], '2020': ['Derrick Henry', 'Dalvin Cook'], '2021': ['Jonathan Taylor', 'Nick Chubb', 'Joe Mixon'], '2022': ['Josh Jacobs', 'Nick Chubb', 'Saquon Barkley', 'Derrick Henry'], '2023': ['Christian McCaffrey', 'Kyren Williams', 'Raheem Mostert']}

Question: Who were the all-pro NFL running backs in 2023?

Answer:
 ['Christian McCaffrey', 'Kyren Williams', 'Raheem Mostert']

Solution:

import pandas as pd

df = pd.read_csv('data/all-pro-running


With the addition of the RAG reference, the model is able to extract the correct running backs from the provided JSON and produce the correct response. The three running backs are correct! This is an important result, because, as stated earlier, the model cannot "know" that these running backs were selected based on its training. It requires us to "patch" its knowledge with relevant reference material.

In [15]:
rag_prompt = create_rag_prompt(reference=sports_json["all_pro_rbs_by_year"], question=QUESTION_2)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: {'2014': ['DeMarco Murray', "Le'Veon Bell"], '2015': ['Adrian Peterson', 'Doug Martin'], '2016': ['Ezekiel Elliott', 'David Johnson'], '2017': ['Todd Gurley', "Le'Veon Bell"], '2018': ['Todd Gurley', 'Saquon Barkley'], '2019': ['Christian McCaffrey', 'Derrick Henry', 'Dalvin Cook'], '2020': ['Derrick Henry', 'Dalvin Cook'], '2021': ['Jonathan Taylor', 'Nick Chubb', 'Joe Mixon'], '2022': ['Josh Jacobs', 'Nick Chubb', 'Saquon Barkley', 'Derrick Henry'], '2023': ['Christian McCaffrey', 'Kyren Williams', 'Raheem Mostert']}

Question: Who were the all-pro NFL running backs in 2019?

Answer:
 ['Christian McCaffrey', 'Dalvin Cook', 'Derrick Henry', 'Nick Chubb', 'Saquon Barkley']

Solution:

import pandas as pd

df = pd


In this example, the model produces the three running backs named in 2019 (Christian McCaffrey, Dalvin Cook, and Derrick Henry), but it appears to add two other RBs who have been named to the list in other years. We still answer the question more effectively than in the "reference-less" case, but RAG does not *guarantee* that the response will be correct, even if they material is relevant to answering the question.

In [16]:
rag_prompt = create_rag_prompt(reference=sports_json["all_pro_rbs_by_year"], question=QUESTION_3)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: {'2014': ['DeMarco Murray', "Le'Veon Bell"], '2015': ['Adrian Peterson', 'Doug Martin'], '2016': ['Ezekiel Elliott', 'David Johnson'], '2017': ['Todd Gurley', "Le'Veon Bell"], '2018': ['Todd Gurley', 'Saquon Barkley'], '2019': ['Christian McCaffrey', 'Derrick Henry', 'Dalvin Cook'], '2020': ['Derrick Henry', 'Dalvin Cook'], '2021': ['Jonathan Taylor', 'Nick Chubb', 'Joe Mixon'], '2022': ['Josh Jacobs', 'Nick Chubb', 'Saquon Barkley', 'Derrick Henry'], '2023': ['Christian McCaffrey', 'Kyren Williams', 'Raheem Mostert']}

Question: Who were the all-pro NFL running backs in 2015?

Answer:
 ['Adrian Peterson', 'Doug Martin']

Question: Who were the all-pro NFL running backs in 2016?

Answer: ['Ezekiel Elliott', 'David Johnson']

Question: Who were


Again, with the reference, our RAG prompt produces the correct answer. In addition, based on the continuation of the generation, the model has caught on to the pattern we're providing with the JSON.

### Marathon World Record Holders

The NFL is a very popular sport, at least in the USA. In this example, we consider a slightly less famous, but no less impressive set of sports accomplishments, running a world record in the marathon. As in the case of the 2023 all-pro running backs, Kelvin Kiptum set the current world record in October, 2023. This is well after Falcon was trained. So without RAG, the model is unlikely to "know" who holds the current world record.

In [17]:
QUESTION_1 = "Who holds the current world record in the marathon?"
QUESTION_2 = "Which runners have recently set the marathon world record twice?"
QUESTION_3 = "How fast has Kelvin Kiptum run the marathon?"

In [18]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_1, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: Who holds the current world record in the marathon?

Answer:
 Eliud Kipchoge, Kenya, 2:01:39, 2018

Question: What is the world record for the 100m sprint?

Answer: Usain Bolt, Jamaica, 9.58 seconds


The model actually does a good job answering the question, even if the answer is inaccurate. Kipchoge did, in fact, set the world record at 2:01:39 in 2018. So the model is clearly able to surface relevant information during its generation. However, Kipchoge reset this record in 2022. It was then surpassed by Kiptum in 2023.

In [19]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_2, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: Which runners have recently set the marathon world record twice?

Answer:
 Eliud Kipchoge and Zersenay Tadese

Question: Which runner has set the world record for the 10,000m three times?

Answer: Haile Gebrselassie

Question: Which


The first answer, Kipchoge is correct. He set the record in 2018 and then lowered it in 2022. Zersenay Tadese, on the other hand, has never held the marathon world record, though he held the world record in the half marathon for a number of years.

In [20]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_3, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: How fast has Kelvin Kiptum run the marathon?

Answer:
 2:05:29

Question: How fast has the fastest marathon runner run the marathon?

Answer: 2:02:57

Question: How fast has the slowest marathon runner run the marathon?

Answer


Kelvin Kiptum holds the current world record, but set it after Falcon was trained, so the LM does not "know" Kiptum's most recent times. It is unclear whether the time of 2:05:29 is purely fabricated or whether this represents a previous result that Kiptum has run. Regardless, the model is unable to correctly answer the question without help.

In [21]:
rag_prompt = create_rag_prompt(reference=sports_json["marathon_world_records"], question=QUESTION_1)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: [{'name': 'Kelvin Kiptum', 'date': 'October 8, 2023', 'time': '2:00:35'}, {'name': 'Eliud Kipchoge', 'date': 'September 25, 2022', 'time': '2:01:09'}, {'name': 'Eliud Kipchoge', 'date': 'September 16, 2018', 'time': '2:01:39'}, {'name': 'Dennis Kimetto', 'date': 'September 28, 2014', 'time': '2:02:57'}, {'name': 'Wilson Kipsang', 'date': 'September 29, 2013', 'time': '2:03:23'}, {'name': 'Patrick Makau', 'date': 'September 25, 2011', 'time': '2:03:38'}, {'name': 'Haile Gebrselassie', 'date': 'September 28, 2008', 'time': '2:03:59'}, {'name': 'Haile Gebrselassie', 'date': 'September 30, 2007', 'time': '2:04:26'}, {'name': 'Paul Tergat', 'date': 'September 28, 2003', 'time': '2:04:55'}]

Question: Who holds the current world record in the marathon?

Answer:
 Eliud Kipchoge

Reference: [{'name': 'Eliud Kipchoge', 'date': 'September 25, 2022', 'time': '2:01:09


This example is quite interesting. Despite being provided a reference containing updated information on world records, the model still answers Kipchoge. It is a bit unclear if the follow up generated reference string is meant to point to where it pulled that information. However, it is notable that the generated reference leaves Kiptum off the list and generates the most recent record that Kipchoge set.

In [22]:
rag_prompt = create_rag_prompt(reference=sports_json["marathon_world_records"], question=QUESTION_2)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: [{'name': 'Kelvin Kiptum', 'date': 'October 8, 2023', 'time': '2:00:35'}, {'name': 'Eliud Kipchoge', 'date': 'September 25, 2022', 'time': '2:01:09'}, {'name': 'Eliud Kipchoge', 'date': 'September 16, 2018', 'time': '2:01:39'}, {'name': 'Dennis Kimetto', 'date': 'September 28, 2014', 'time': '2:02:57'}, {'name': 'Wilson Kipsang', 'date': 'September 29, 2013', 'time': '2:03:23'}, {'name': 'Patrick Makau', 'date': 'September 25, 2011', 'time': '2:03:38'}, {'name': 'Haile Gebrselassie', 'date': 'September 28, 2008', 'time': '2:03:59'}, {'name': 'Haile Gebrselassie', 'date': 'September 30, 2007', 'time': '2:04:26'}, {'name': 'Paul Tergat', 'date': 'September 28, 2003', 'time': '2:04:55'}]

Question: Which runners have recently set the marathon world record twice?

Answer:
 Eliud Kipchoge and Dennis Kimetto

Reference: [{'name': 'Eliud Kipchoge', 'date': 'September 25, 2022', 'time': '2


As with the non-RAG prompt example above, the first answer, Kipchoge is correct, as he has set the world record twice in recent years. Another runner fits the answer to this question and is contained in the reference: Haille Gebrselassie. Unfortunately, the model is unable to extract that from the provided reference. However, Kimetto is a recent world record holder. So in that sense, the model's answer is "better," though still inaccurate, compared with that of the non-RAG prompt.

In [23]:
rag_prompt = create_rag_prompt(reference=sports_json["marathon_world_records"], question=QUESTION_3)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: [{'name': 'Kelvin Kiptum', 'date': 'October 8, 2023', 'time': '2:00:35'}, {'name': 'Eliud Kipchoge', 'date': 'September 25, 2022', 'time': '2:01:09'}, {'name': 'Eliud Kipchoge', 'date': 'September 16, 2018', 'time': '2:01:39'}, {'name': 'Dennis Kimetto', 'date': 'September 28, 2014', 'time': '2:02:57'}, {'name': 'Wilson Kipsang', 'date': 'September 29, 2013', 'time': '2:03:23'}, {'name': 'Patrick Makau', 'date': 'September 25, 2011', 'time': '2:03:38'}, {'name': 'Haile Gebrselassie', 'date': 'September 28, 2008', 'time': '2:03:59'}, {'name': 'Haile Gebrselassie', 'date': 'September 30, 2007', 'time': '2:04:26'}, {'name': 'Paul Tergat', 'date': 'September 28, 2003', 'time': '2:04:55'}]

Question: How fast has Kelvin Kiptum run the marathon?

Answer:
 2:00:35

Question: How fast has Eliud Kipchoge run the marathon?

Answer: 2:01:09

Question: How fast has Eliud Kipchoge run the marathon?


This response is now accurate. 2:00:35 is Kiptum's world record time (Sub 2-hours has actually been achieved by Kipchoge, but under very controlled settings making it ineligible as a WR). The model extracted the time correctly!

### Recent Weather

In this case, we're going to ask the model about weather that has occurred recently. The information we retrieve is going to be a bit more "mixed together." That is, the weather data we have in the JSON will be alternatingly for Calgary and Toronto, but we'll want to know about one of them specifically in each question.

All of the questions below will refer to weather that "occurred" (see note below), after Falcon was trained. Without help, the model will, at best, be taking a guess. 

__NOTE__: These weather reports are fabricated. However, the model has no notion of weather in 2024 on its own due to when it was trained. So there is no "internal" conflict with the model's own representations.


In [24]:
QUESTION_1 = "What was the reported high temperature in Calgary on March 3, 2024?"
QUESTION_2 = "On March 1, 2024 in Toronto what was the weather like?"
QUESTION_3 = "What was the high temperature in Toronto on March 5, 2024?"

In [25]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_1, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: What was the reported high temperature in Calgary on March 3, 2024?

Answer:
 -1.0°C

Question: What was the reported high temperature in Calgary on March 3, 2025?

Answer: -1.0°C

Question: What was the reported high temperature in


On March 3, 2024 the true high temperature was a chilly -16 C. In our "fake" reference it was a much warmer 1 C. So this answer neither lines up with reality nor our reference material, of which the model is currently unaware.

In [26]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_2, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: On March 1, 2024 in Toronto what was the weather like?

Answer:
 The weather was -1°C (30°F) and there was no snow.

Question: On March 1, 2024 in Toronto what was the weather like?

Answer: The weather was -1


On March 1, 2024 the true high temperature was 0 C and cloudy in Toronto. In our "fake" reference it was a warmer 3 C with some flurries. So this answer again neither lines up with reality nor our reference material. The no snow part happens to actually be true of the real weather in Toronto, but this was almost certainly a coincidence.

In [27]:
zero_shot_prompt = create_rag_prompt(question=QUESTION_3, reference="")
print(zero_shot_prompt)
generation = model.generate(zero_shot_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Question: What was the high temperature in Toronto on March 5, 2024?

Answer:
 -1.0 °C

Question: What was the high temperature in Toronto on March 5, 2025?

Answer: -1.0 °C

Question: What was the high temperature in Toronto on


The model answers the question well, but the answer does not line up with the real weather or the "fake" weather we describe in the references for that date.

In [28]:
rag_prompt = create_rag_prompt(reference=weather_json, question=QUESTION_1)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: {'March 1, 2024': [{'location': 'Calgary', 'weather': 'Sun and clouds with a high temperature of 5'}, {'location': 'Toronto', 'weather': 'Windy and Cloudy with a high temperature of 8'}], 'March 2, 2024': [{'location': 'Calgary', 'weather': 'Cloudy with a high temperature of 4'}, {'location': 'Toronto', 'weather': 'Overcast with a light drizzle with a high temperature of 6'}], 'March 3, 2024': [{'location': 'Calgary', 'weather': 'Overcast and windy with a high temperature of 1'}, {'location': 'Toronto', 'weather': 'Sun all day with a high temperature of 9'}], 'March 4, 2024': [{'location': 'Calgary', 'weather': 'Rain in the morning with afternoon sun with a high temperature of 6'}, {'location': 'Toronto', 'weather': 'Rain with some snow flurries with a high temperature of 3'}], 'March 5, 2024': [{'location': 'Calgary', 'weather': 'Sunny and warm with a high temperature of 10'}, {'location': 'Toronto', 'weather': 'Sun and clouds with a high temperature of 12'}]}

Question: Wh

While the model doesn't quite answer our question directly (as it also provides a description of the weather above just the reported high temperature), it does accurately extract the correct component of the JSON dictionary that contains the answer. This is definitely an improvement over the "guess" produced in the previous examples.

In [29]:
rag_prompt = create_rag_prompt(reference=weather_json, question=QUESTION_2)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: {'March 1, 2024': [{'location': 'Calgary', 'weather': 'Sun and clouds with a high temperature of 5'}, {'location': 'Toronto', 'weather': 'Windy and Cloudy with a high temperature of 8'}], 'March 2, 2024': [{'location': 'Calgary', 'weather': 'Cloudy with a high temperature of 4'}, {'location': 'Toronto', 'weather': 'Overcast with a light drizzle with a high temperature of 6'}], 'March 3, 2024': [{'location': 'Calgary', 'weather': 'Overcast and windy with a high temperature of 1'}, {'location': 'Toronto', 'weather': 'Sun all day with a high temperature of 9'}], 'March 4, 2024': [{'location': 'Calgary', 'weather': 'Rain in the morning with afternoon sun with a high temperature of 6'}, {'location': 'Toronto', 'weather': 'Rain with some snow flurries with a high temperature of 3'}], 'March 5, 2024': [{'location': 'Calgary', 'weather': 'Sunny and warm with a high temperature of 10'}, {'location': 'Toronto', 'weather': 'Sun and clouds with a high temperature of 12'}]}

Question: On

As with the previous example, the answer provided here isn't quite a "direct" answer to the question, as it is really a JSON entry that has been extracted. However, it is the correct JSON entry and contains the answer to our query.

In [30]:
rag_prompt = create_rag_prompt(reference=weather_json, question=QUESTION_3)
print(rag_prompt)
generation = model.generate(rag_prompt, long_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])

Reference: {'March 1, 2024': [{'location': 'Calgary', 'weather': 'Sun and clouds with a high temperature of 5'}, {'location': 'Toronto', 'weather': 'Windy and Cloudy with a high temperature of 8'}], 'March 2, 2024': [{'location': 'Calgary', 'weather': 'Cloudy with a high temperature of 4'}, {'location': 'Toronto', 'weather': 'Overcast with a light drizzle with a high temperature of 6'}], 'March 3, 2024': [{'location': 'Calgary', 'weather': 'Overcast and windy with a high temperature of 1'}, {'location': 'Toronto', 'weather': 'Sun all day with a high temperature of 9'}], 'March 4, 2024': [{'location': 'Calgary', 'weather': 'Rain in the morning with afternoon sun with a high temperature of 6'}, {'location': 'Toronto', 'weather': 'Rain with some snow flurries with a high temperature of 3'}], 'March 5, 2024': [{'location': 'Calgary', 'weather': 'Sunny and warm with a high temperature of 10'}, {'location': 'Toronto', 'weather': 'Sun and clouds with a high temperature of 12'}]}

Question: Wh

Based on our reference material and the question asked, the answer is exactly correct.

### Final Word

The examples above are highly "idealized" as the reference material provided to the model contains the answer to our questions, often in a very straight-forward way. However, there are several meaningful takeaways from these example that we can note: 

* When provided with meaningful context/reference material the model successfully leverages that context to arrive at better answers to questions than without the context, even in situations where the model theoretically could "know" the answer from its training data.
* Relevant references is an effective way to "patch" or "fill-in" knowledge that the model otherwise does not have access to. These gaps could be due to lack of knowledge of your own proprietary data or due to a natural "aging" process of the model and its "world-view" so to speak.
* Models, or at the very least Falcon, will "hallucinate" or fabricate answers, fairly convincingly rather than state that it doesn't know.
* Finally, the model handles JSON formatting fairly well, which is a nice surprise!