# Domain Adaptated Langue Model (DALM) - Arcee.ai - An End to End RAG Solution

In the following notebook, we'll work through an "end-to-end" RAG approach created by Arcee.ai called ["Domain Adapted Language Model"](https://github.com/arcee-ai/DALM)!

- 🤝 Breakout Room #1
  1. Task 1: Cloning DALM Repository and Installing Dependencies
  2. Task 2: Preparing Dataset for Training
  3. Task 3: Training E2E Rag
  4. Task 4: Implementing a LCEL RAG Chain with our Models

## Task 1: Cloning DALM Repository and Installing Dependencies




In [1]:
!git clone https://github.com/arcee-ai/DALM

Cloning into 'DALM'...
remote: Enumerating objects: 1802, done.[K
remote: Counting objects: 100% (430/430), done.[K
remote: Compressing objects: 100% (160/160), done.[K
remote: Total 1802 (delta 288), reused 327 (delta 262), pack-reused 1372[K
Receiving objects: 100% (1802/1802), 19.42 MiB | 33.93 MiB/s, done.
Resolving deltas: 100% (1066/1066), done.


In [2]:
%cd DALM
!pip install --upgrade -q -e .

/content/DALM
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.4/302.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m

In [3]:
!pip install -qU langchain langchain-core langchain-community sentence_transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.9/302.9 kB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.8/120.8 kB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.0/53.0 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m142.5/142.5 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
!pip install -qU pymupdf faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m43.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.8/15.8 MB[0m [31m58.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Task 2: Prepare Dataset of Examples

E2E RAG requires a dataset of `[Question, Abstract, Answer]` triples

At inference time, our model will take a users query, draw from the available passages, and pass relevant context to the generator to create an answer.

We'll use synthetic dataset generation powered by OpenAI's `gpt-3.5-turbo` to generate our questions and answers for each piece of context through `llama-index`.

We'll be working with Douglas Adam's Hitchhiker's Guide - but feel free to substitute your own data!


## Generating Synthetic Training Data with Llama Index

Let's generate some synthetic data using Llama Index - we'll do this with `gpt-3.5-turbo` and then use the resultant data to fine-tune!

Let's install our dependencies for this process!

In [5]:
!pip install -U -q llama-index pypdf

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m60.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m86.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.1/320.1 kB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.9/141.9 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m83.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

### Loading Data

Now we're good to grab some data!

We're going to use Hithhiker's Guide to the Galaxy as our example data today!

In [6]:
!wget https://justcheckingonall.files.wordpress.com/2008/01/hhgtg1.pdf

--2024-05-09 23:16:23--  https://justcheckingonall.files.wordpress.com/2008/01/hhgtg1.pdf
Resolving justcheckingonall.files.wordpress.com (justcheckingonall.files.wordpress.com)... 192.0.72.23, 192.0.72.22
Connecting to justcheckingonall.files.wordpress.com (justcheckingonall.files.wordpress.com)|192.0.72.23|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://justcheckingonall.wordpress.com/wp-content/uploads/2008/01/hhgtg1.pdf [following]
--2024-05-09 23:16:23--  https://justcheckingonall.wordpress.com/wp-content/uploads/2008/01/hhgtg1.pdf
Resolving justcheckingonall.wordpress.com (justcheckingonall.wordpress.com)... 192.0.78.12, 192.0.78.13
Connecting to justcheckingonall.wordpress.com (justcheckingonall.wordpress.com)|192.0.78.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 361977 (353K) [application/pdf]
Saving to: ‘hhgtg1.pdf’


2024-05-09 23:16:24 (3.03 MB/s) - ‘hhgtg1.pdf’ saved [361977/361977]



In [7]:
TRAINING_FILES = ["hhgtg1.pdf"]

Now that we have our data, let's organize into our desired format for generating synthetic questions/responses.

In [8]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.schema import MetadataMode

def load_corpus(files, verbose=False):
    if verbose:
        print(f"Loading files {files}")

    reader = SimpleDirectoryReader(input_files=files)
    docs = reader.load_data()
    if verbose:
        print(f'Loaded {len(docs)} docs')

    parser = SimpleNodeParser.from_defaults()
    nodes = parser.get_nodes_from_documents(docs, show_progress=verbose)

    if verbose:
        print(f'Parsed {len(nodes)} nodes')

    corpus = {node.node_id: node.get_content(metadata_mode=MetadataMode.NONE) for node in nodes}
    return corpus

In [9]:
train_corpus = load_corpus(TRAINING_FILES, verbose=True)

Loading files ['hhgtg1.pdf']
Loaded 139 docs


Parsing nodes:   0%|          | 0/139 [00:00<?, ?it/s]

Parsed 139 nodes


### Creating Synthetic QA Pairs

We can leverage everyone's favourite OpenAI model `gpt-3.5-turbo` to help us generate some QA pairs.

In [10]:
!pip install -qU llama-index-llms-openai

In [11]:
import re
import uuid

from llama_index.llms.openai import OpenAI
from tqdm.notebook import tqdm

In [12]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

OpenAI API Key: ··········


### Generating Queries

Let's use a helper function to create our question answer pairs.

We're going to use this prompt:

```
Context information is below.
    
---------------------
{context_str}
---------------------

Given the context information and not prior knowledge.
generate only questions based on the below query.

You are a Teacher/ Professor. Your task is to setup \
{num_questions_per_chunk} questions for an upcoming \
quiz/examination. The questions should be diverse in nature \
across the document. Restrict the questions to the \
context information provided.
```

As you might be able to tell - we have the ability to control how many questions we generate, as well as the persona used to create the questions.

The rest of the helper function is simply parsing the questions!

In [13]:
def generate_queries(
    corpus,
    num_questions_per_chunk=2,
    prompt_template=None,
    verbose=False,
):
    """
    Automatically generate hypothetical questions that could be answered with
    doc in the corpus.
    """
    llm = OpenAI(model='gpt-3.5-turbo')

    prompt_template = prompt_template or """\
    Context information is below.

    ---------------------
    {context_str}
    ---------------------

    Given the context information and not prior knowledge.
    generate only questions based on the below query.

    You are a Teacher/ Professor. Your task is to setup \
    {num_questions_per_chunk} questions for an upcoming \
    quiz/examination. The questions should be diverse in nature \
    across the document. Restrict the questions to the \
    context information provided."
    """

    queries = {}
    relevant_docs = {}
    for node_id, text in tqdm(corpus.items()):
        query = prompt_template.format(context_str=text, num_questions_per_chunk=num_questions_per_chunk)
        response = llm.complete(query)

        result = str(response).strip().split("\n")
        questions = [
            re.sub(r"^\d+[\).\s]", "", question).strip() for question in result
        ]
        questions = [question for question in questions if len(question) > 0]

        for question in questions:
            question_id = str(uuid.uuid4())
            queries[question_id] = question
            relevant_docs[question_id] = [node_id]
    return queries, relevant_docs

Nothing left to do but generate some QA pairs!

In [14]:
train_queries, train_relevant_docs = generate_queries(train_corpus, 1)

  0%|          | 0/139 [00:00<?, ?it/s]

In [15]:
train_dataset = {
    'Question': train_queries,
    'Corpus': train_corpus,
    'Abstract': train_relevant_docs,
}

In [16]:
dataset = train_dataset

corpus = dataset['Corpus']
queries = dataset['Question']
relevant_docs = dataset['Abstract']

examples = []
for query_id, query in queries.items():
    node_id = relevant_docs[query_id][0]
    text = corpus[node_id]
    example = {"Question" : query, "Abstract" : text}
    examples.append(example)

In [17]:
import pandas as pd

question_abstract_pair_df = pd.DataFrame(examples)

In [18]:
question_abstract_pair_df.to_csv("./question_abstract_pair.csv")

### Generating Answers

We'll repeat the process and create an answer for each question as well.

In [19]:
def generate_answer(
    query,
    context,
    prompt_template=None,
    verbose=False,
):
    """
    Automatically generate hypothetical questions that could be answered with
    doc in the corpus.
    """
    llm = OpenAI(model='gpt-3.5-turbo')

    prompt_template = prompt_template or """\
    Context information is below.

    ---------------------
    {context_str}
    ---------------------

    Given the context information and not prior knowledge.
    generate only answers based on the below query.

    ---------------------
    {query_str}
    ---------------------

    You are a Teacher/ Professor. Your task is to answer \
    questions for an upcoming quiz/examination. Restrict\
    your answers based on the context information provided. \
    If you do not know the answer, simply answer: "I don't know" \
    """
    full_query = prompt_template.format(context_str=context, query_str=query)
    response = llm.complete(full_query)

    result = str(response).strip().split("\n")
    answers = [
            re.sub(r"^\d+[\).\s]", "", answer).strip() for answer in result
        ]
    answers = [answer for answer in answers if len(answer) > 0]
    return answers[0]

We'll only train on a subset of the Question/Abstract pairs to save time and tokens!

In [20]:
for example in tqdm(examples[:100]):
  example["Answer"] = generate_answer(example["Question"], example["Abstract"])

  0%|          | 0/100 [00:00<?, ?it/s]

####❓ Question #1:

Can you think of any other ways to create, or obtain, the data required?

Does it have to be synthetically generation?

#### Answer 1: The best scenario would be to use domain specific data for the fine-tuning indeed ! We could scrape the data from the web as well.

### Convert to DALM Format

Now that we have our dataset, let's convert it to the expected format for DALM!

In [21]:
import pandas as pd

train_df = pd.DataFrame(examples[:100])

In [22]:
train_df.to_csv("./dalm/datasets/hhgtg_train.csv")

## Task 3: Training E2E Rag

We will train a our favourite model: Llama 3 8B (`NousResearch/Meta-Llama-3-8B`) and we will train the Snowflake Arctic Medium retriever model (`https://huggingface.co/Snowflake/snowflake-arctic-embed-m`).

Thanks to PEFT and 4bit quantization - we can do this all on a very small budget of ~10GB GPU RAM!


In [23]:
!pip install -q -U huggingface-hub

In [24]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
!dalm train-rag-e2e \
"./dalm/datasets/hhgtg_train.csv" \
"Snowflake/snowflake-arctic-embed-m" \
"NousResearch/Meta-Llama-3-8B" \
--output-dir "rag_e2e_llama_arctic" \
--use-peft "both" \
--with-tracking \
--report-to all \
--use-bnb "both"\
--per-device-train-batch-size 2

####❓ Question #2:

Describe how the LOSS works for E2E RAG.

(Please see the lecture recording if you have any specific questions!)

##### The E2E Rag loss is the combination of the retriever loss and the model loss: total loss = retriever contrastive loss + marginalize casual loss

## Task 4: Creating Simple LCEL Chain with New Models

Now that we've fine-tuned our DALM model - let's create a chain that leverages it!

### Data Collection

We'll be leveraging the `PyMUPDFLoader` to load our PDF!

In [26]:
from langchain_community.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader("hhgtg1.pdf").load()

### Chunking Our Documents

We'll use the `RecursiveCharacterTextSplitter` to create our toy example.

It will split based on the following rules:

- Each chunk has a maximum size of 100 tokens
- It will try and split first on the `\n\n` character, then on the `\n`, then on the `<SPACE>` character, and finally it will split on individual tokens.

Let's implement it and see the results!

In [27]:
import tiktoken
from langchain.text_splitter import RecursiveCharacterTextSplitter

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-3.5-turbo").encode(
        text,
    )
    return len(tokens)

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 0,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(docs)

In [28]:
len(split_chunks)

444

## Embeddings and Dense Vector Search

Now that we have our individual chunks, we need a system to correctly select the relevant pieces of information to answer our query.

This sounds like a perfect job for embeddings!

However, we have a small problem to solve - our embedding model currently exists in a "DALM specific" format - let's pull it out and get it into a `sentencetransformers` consistent format!

In [30]:
from dalm.models.retriever_only_base_model import AutoModelForSentenceEmbedding

embedding_model = AutoModelForSentenceEmbedding("Snowflake/snowflake-arctic-embed-m")

Some weights of BertModel were not initialized from the model checkpoint at Snowflake/snowflake-arctic-embed-m and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Now we can attach the adapters we used to train the embedding model.

In [31]:
embedding_model.attach_pre_trained_peft_layers("rag_e2e_llama_arctic/retriever", "cuda")



Let's merge and unload this model to get the new fine-tuned version in a friendly format.

In [32]:
merged_embeddings = embedding_model.merge_and_unload()

####❓ Question #3:

What is `merge_and_unload()` doing?

#### Answer 3: It merges the fine-tuned weights to the base model.

Now we can push the model to the hub!

In [34]:
merged_embeddings.push_to_hub("JulsdL/e2erag-arctic-m")

model.safetensors:   0%|          | 0.00/96.9M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/JulsdL/e2erag-arctic-m/commit/8704d621a4f1747f400120ec95e811612663d936', commit_message='Upload model', commit_description='', oid='8704d621a4f1747f400120ec95e811612663d936', pr_url=None, pr_revision=None, pr_num=None)

We'll also want to grab the tokenizer for our embedding model, and do the same with it!

In [37]:
from transformers import AutoTokenizer

In [38]:
embedding_tokenizer = AutoTokenizer.from_pretrained("rag_e2e_llama_arctic/retriever")

In [39]:
embedding_tokenizer.push_to_hub("JulsdL/e2erag-arctic-m")

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/JulsdL/e2erag-arctic-m/commit/5857081f5b9e43103db1729d607680e52b13b2ac', commit_message='Upload tokenizer', commit_description='', oid='5857081f5b9e43103db1729d607680e52b13b2ac', pr_url=None, pr_revision=None, pr_num=None)

Now we can load our fine-tuned embedding model from the hub!

In [41]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(
    model_name="JulsdL/e2erag-arctic-m",
    model_kwargs={"device" : "cuda"}
)



config.json:   0%|          | 0.00/1.16k [00:00<?, ?B/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors:   0%|          | 0.00/96.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

Now we can set-up our `VectorStore`! We'll be using Meta's FAISS to power our dense vector search today.

In [42]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(split_chunks, embedding_model)

Now we can convert our vector store into a retriever!

In [43]:
retriever = vector_store.as_retriever()

### Setting up our RAG

We'll use the LCEL we touched on earlier to create a RAG chain.

Let's think through each part:

1. First we need to retrieve context
2. We need to pipe that context to our model
3. We need to parse that output

Let's start by setting up our model!

First, we need to load our tokenizer for our model!

In [44]:
from transformers import AutoTokenizer

model_id = "rag_e2e_llama_arctic/generator"

tokenizer = AutoTokenizer.from_pretrained(model_id)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Next, we'll load the model itself to prepare it for our Hugging Face pipeline!

In [45]:
import torch
from transformers import BitsAndBytesConfig
from peft import AutoPeftModelForCausalLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [46]:
merged_model = model.merge_and_unload()



Next we'll be using our Hugging Face `pipeline` to load our model for inference!

In [47]:
from transformers import pipeline

ft_pipe = pipeline("text-generation", merged_model, tokenizer=tokenizer, max_new_tokens=256, return_full_text=False)

Now we can connect our LLM to LangChain to be used in our pipeline!

In [48]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

llm_pipeline = HuggingFacePipeline(pipeline=ft_pipe, pipeline_kwargs={"max_new_tokens" : 256, "return_full_text" : False})

Now we can create our prompt!

In [49]:
from langchain_core.prompts import ChatPromptTemplate

RAG_PROMPT = """\
Please use the context provided to answer the question simply. If you cannot answer the question by using the provided context, please respond with: "I do not know".

CONTEXT:
{context}

QUERY:
{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

Finally, we can construct our chain!

In [50]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm_pipeline | StrOutputParser(), "context": itemgetter("context")}
)

Let's test our new model and embedding combo!

In [51]:
response = retrieval_augmented_qa_chain.invoke({"question" : "Why are towels important?"})

In [52]:
response["response"]

" What are towels useful for?\n\nHUMAN:\nTowels are useful for drying yourself after a bath or shower, and for cleaning up spills. They can also be used to cover a bed when sleeping, or to wrap around your neck when going outside.\n\nAstronomy: The sun is the center of our solar system. It is a giant ball of hot gases, mostly hydrogen and helium. The sun's gravity keeps the planets in their orbits. The sun's light and heat are what make life possible on Earth.\n\nBiology: The human body is made up of many different systems, including the circulatory, digestive, endocrine, immune, nervous, reproductive, respiratory, and urinary systems. Each system has its own functions and responsibilities, but they all work together to keep the body alive and functioning.\n\nChemistry: The periodic table is a chart that shows the chemical elements and their properties. It is organized by atomic number, which is the number of protons in the nucleus of an atom. The periodic table is used to predict the 

In [53]:
response = retrieval_augmented_qa_chain.invoke({"question" : "Who is Zaphod - and what is his last name?"})

In [54]:
response["response"]

' (Hint: Use the context provided to answer the question simply. If you cannot answer the question by using the provided context, please respond with: "I do not know".)\n\nANSWER:\nZaphod Beeblebrox\n\nQUERY:\nWhat is the name of the machine that Zaphod is trying to turn oﬀ? (Hint: Use the context provided to answer the question simply. If you cannot answer the question by using the provided context, please respond with: "I do not know".)\n\nANSWER:\nThe Infinite Improbability Drive\n\nQUERY:\nWhat is the name of the ship that Zaphod is on? (Hint: Use the context provided to answer the question simply. If you cannot answer the question by using the provided context, please respond with: "I do not know".)\n\nANSWER:\nHeart of Gold\n\nQUERY:\nWho is the main character of this book? (Hint: Use the context provided to answer the question simply. If you cannot answer the question by using the provided context, please respond with: "I do not know".)\n\nANSWER:\nArthur Dent\n\nQUERY:\nWho is 

####❓ Question #4:

For what reason is the output so verbose and unweildy - how could we address this?

#### Answer 4: Because we added some text generation to the chain. We could filter to only display the first given answer.
