<a href="https://colab.research.google.com/github/adammuhtar/semantic-information-retrieval/blob/main/notebooks/sierra-bge-llama2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <u>**SIERRA ⛰️: Semantic Information Encoding, Retrieval, and Reasoning Agent**</u>

Large Language Models (LLMs) capability in generating texts is anchored on the fact that they are trained on enormous corpora, often mined from the public internet. While the training corpora is huge, they are often 'general' in nature - the LLMs therefore may not be as effective in generating texts for domain-specific prompts. In other words, the parametric memory of the LLMs are not adapted for domain specific tasks.

A new pipeline of LLMs called Retrieval Augmented Generation (RAG) can be used to address these limitations. RAG retrieves information provided by the user, which often lies outside of the foundation model's parametric memory and augments the LLM's output by utilising the retrieved information instead (also called the source memory). This pipeline ensures the LLM utilises the contextually relevant information as part of its inputs to generate responses to the users' query. Just like humans referencing source materials to ensure the best quality answers, this system replicates that process for LLMs.

This notebook details SIERRA, a RAG system by performing semantic information retrieval from user-provided documents and feeding them into an LLM. The first stage of SIERRA involves extraction and meaningful interpretation of content from user-provided documents, mapping text from these documents onto a semantic representation of their latent information. This encoded knowledge is then indexed for efficient retrieval, enabling the system to rapidly locate pertinent information in response to user queries. After the retrieval process, SIERRA leverages a large language model (LLM) to generate coherent and relevant responses based on the retrieved information. Uniquely, the system can also trace and report the source of the information used in these responses, ensuring transparency and credibility.

This combination of technologies is a step forward towards building a sophisticated tool for interpreting and synthesising information, ideally one that is capable of providing users with accurate, sourced answers to a wide range of domain-specific questions.

## **Table of Contents**

* [1. Notebook setup](#section-1)
* [2. Load and embed corpus](#section-2)
* [3. Load LLM; setup Q&A retrieval chain](#section-3)
* [4. Testing Q&A retrieval chain](#section-4)

## 1. Notebook Setup <a name="section-1"></a>

This notebook is run using [Google Colaboratory](https://colab.research.google.com/) (Colab) - Google's implementation of [Jupyter Notebooks](https://jupyter.org/). This notebook will require the following package(s) to be installed:
* `accelerate==0.21.0`
* `pymupdf==1.22.5`
* `sentence-transformers==2.2.2`
* `transformers==4.31.0`
* `torch==2.0.1`

Running this Colab notebook will require hardware accelerators to access higher RAM runtimes; this instance runs on the Tesla T4 GPU (16 GB GDDR6 @ 320 GB/s) provided for free by Google.

In [None]:
# Query GPU device status/details
!nvidia-smi

Sun Aug 20 20:22:02 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   68C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# Check IP address details if there are restrictions running non-local servers
!curl ipinfo.io

{
  "ip": "34.147.91.61",
  "hostname": "61.91.147.34.bc.googleusercontent.com",
  "city": "Groningen",
  "region": "Groningen",
  "country": "NL",
  "loc": "53.2192,6.5667",
  "org": "AS396982 Google LLC",
  "postal": "9711",
  "timezone": "Europe/Amsterdam",
  "readme": "https://ipinfo.io/missingauth"
}

In [None]:
!pip install --quiet accelerate pymupdf sentence-transformers transformers torch

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m56.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m84.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m70.3 MB/s[0m et

In [None]:
# Standard library imports
import locale
import textwrap
import re
from pathlib import Path

# Third-party imports
import fitz
from sentence_transformers import SentenceTransformer, util
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from tqdm import tqdm

In [None]:
# Check available GPUs for computation
if torch.cuda.is_available():
    num_gpus = torch.cuda.device_count()
    # Print details of all available GPUs
    for i in range(num_gpus):
        gpu_props = torch.cuda.get_device_properties(i)
        print(f"Device details for GPU {i+1}:")
        print(f"* Name: {gpu_props.name}")
        print(f"* Memory size: {round(gpu_props.total_memory / 1024**3, 2)} GB")
        if i == num_gpus-1:
            continue
        else:
            print("-"*79)
    # Get the currently active GPU device and print its name and memory size
    active_gpu = torch.cuda.current_device()
    active_gpu_props = torch.cuda.get_device_properties(active_gpu)
    print("="*79)
    print(f"Currently active GPU device: {active_gpu_props.name}")
    print(f"Memory size: {round(active_gpu_props.total_memory / 1024**3, 2)} GB")
    print("="*79)
else:
    print("No GPU devices found.")

Device details for GPU 1:
* Name: Tesla T4
* Memory size: 14.75 GB
Currently active GPU device: Tesla T4
Memory size: 14.75 GB


## 2. Download corpus and create vector embedding database <a name="section-2"></a>

This notebook makes use of several publicly available books and reports from NASA:
* [NACA to NASA to Now](https://www.nasa.gov/connect/ebooks/naca-to-nasa-to-now.html)
* [NASA Planetary Defense Strategy and Action Plan](https://www.nasa.gov/sites/default/files/atoms/files/nasa_-_planetary_defense_strategy_-_final-508.pdf)
* [Advancing NASA's Climate Strategy 2023](https://www.nasa.gov/sites/default/files/atoms/files/advancing_nasas_climate_strategy_2023.pdf)
* [International Space Station Benefits for Humanity](https://www.nasa.gov/mission_pages/station/research/news/b4h-3rd-ed-book)

We first download the PDF files using the `wget` command line tool. After running this script, you should find the downloaded PDF files in a folder named "sample_docs" within the same directory where the script is executed.

In [None]:
files_to_download = [
    "https://www.nasa.gov/sites/default/files/atoms/files/naca_to_nasa_to_now_tagged.pdf",
    "https://www.nasa.gov/sites/default/files/atoms/files/nasa_-_planetary_defense_strategy_-_final-508.pdf",
    "https://www.nasa.gov/sites/default/files/atoms/files/advancing_nasas_climate_strategy_2023.pdf",
    "https://www.nasa.gov/sites/default/files/atoms/files/iss_benefits_for_humanity_3rded-508.pdf"
]

folder_name = "sample_docs"

# Create the folder if it doesn't exist
!mkdir -p {folder_name}

# Loop through the list of files and download each one
for file_url in files_to_download:
    !wget -P {folder_name}/ {file_url}

--2023-08-20 21:38:11--  https://www.nasa.gov/sites/default/files/atoms/files/naca_to_nasa_to_now_tagged.pdf
Resolving www.nasa.gov (www.nasa.gov)... 52.84.52.97, 52.84.52.14, 52.84.52.114, ...
Connecting to www.nasa.gov (www.nasa.gov)|52.84.52.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12449719 (12M) [application/pdf]
Saving to: ‘sample_docs/naca_to_nasa_to_now_tagged.pdf’


2023-08-20 21:38:11 (44.4 MB/s) - ‘sample_docs/naca_to_nasa_to_now_tagged.pdf’ saved [12449719/12449719]

--2023-08-20 21:38:11--  https://www.nasa.gov/sites/default/files/atoms/files/nasa_-_planetary_defense_strategy_-_final-508.pdf
Resolving www.nasa.gov (www.nasa.gov)... 52.84.52.97, 52.84.52.14, 52.84.52.114, ...
Connecting to www.nasa.gov (www.nasa.gov)|52.84.52.97|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4499697 (4.3M) [application/pdf]
Saving to: ‘sample_docs/nasa_-_planetary_defense_strategy_-_final-508.pdf’


2023-08-20 21:38:12 (44.8 MB/s

We then define several text pre-processing functions to help extract the texts from the PDFs.

In [None]:
def preprocess_text(
    text: str,
    encoding: bool = True,
    lowercase: bool = False,
    remove_newlines: bool = True
) -> str:
    """
    Takes in a string and removes newline characters, tab characters, excess
    whitespaces, as well as regularizing common unicode characters.

    Args:
        * text (`str`): Text to pre-process
        * encoding (`bool`): Convert non UTF-8 characters to UTF-8. Default is
        `True`.
        * lowercase (`bool`): Returns the processed string in lowercase if set
        to `True`. Default is `False`.

    Returns:
        * `str`: Pre-processed text
    """
    # Fix apostrophes/quotation marks
    _text = re.sub("[‘’]", "'", text)
    _text = re.sub("[“”]", '"', _text)

    if encoding:
        if locale.getdefaultlocale()[1] != "UTF-8":
            # Fix encoding mismatch
            _text = _text.encode(encoding="cp1252", errors="ignore").decode(
                encoding="utf-8", errors="ignore"
            )
            _text = re.sub("(&\\\\#x27;|&#x27;)", "'", _text)
        else:
            _text = re.sub("(&\\\\#x27;|&#x27;)", "'", _text)

    # Remove newlines, tabs, non-breaking spaces, excess backslashes/whitespaces
    if remove_newlines:
        _text = re.sub("[\n\r]+", " ", _text)
    _text = re.sub("[\t\xa0]+", " ", _text)
    _text = re.sub(r"\\+", "", _text)
    _text = re.sub(r"\s+", " ", _text).strip()

    if lowercase:
        _text = _text.lower()

    return _text

def get_pdf_text_blocks(
    doc: fitz.Document, file_name: str, preprocess: bool = True
) -> list:
    """
    Extracts text from a PyMuPDF document, returning a list of dictionaries
    containing the text and associated metadata (the name of the PDF and the
    page number).

    Args:
        * doc (`fitz.Document`): The PyMuPDF Document object from which to
        extract text.
        * file_name (`str`): The name of the PDF file being processed.
        * preprocess (`bool`): Whether to preprocess the text. Default is True.

    Returns:
        * `list`: A list of dictionaries. Each dictionary contains:
            - "text": A string containing the preprocessed text block.
            - "source": A string with the name of the PDF file.
            - "page": An integer representing the page number in the PDF file
            from which the text block was extracted.
    """
    text_blocks = []
    for i, page in enumerate(doc):
        for x in page.get_text("blocks"):
            # Create a dictionary to hold text block and related metadata
            block_dict = {}
            block_dict["text"] = x[4]
            block_dict["source"] = file_name
            block_dict["page"] = i + 1 # page numbers start from 1

            # Only add blocks that are not empty
            if block_dict["text"].strip() != "":
                if preprocess:
                    # Preprocess text
                    block_dict["text"] = preprocess_text(block_dict["text"])
                text_blocks.append(block_dict)
    return text_blocks

Once downloaded, we extract all text blocks within each downloaded PDF file save them in a dictionary

In [None]:
pdf_path = Path.cwd() / "sample_docs"

docs = {}
for i, path in enumerate(tqdm(pdf_path.rglob("*"), desc="Processing PDFs")):
    docs[i] = get_pdf_text_blocks(
        doc=fitz.open(path, filetype="pdf"),
        file_name=path.stem,
        preprocess=True
    )

Processing PDFs: 4it [00:05,  1.28s/it]


We then create a vector database of our corpus by creating sentence-level embeddings from extracted texts. This allows us to:
* encode extracted texts from documents as vector embeddings.
* store these embeddings and their associated metadata.
* perform semantic similarity searches on these embeddings.

For this step, we use the BAAI General Embedding (BGE) model, based on this model checkpoint: https://huggingface.co/BAAI/bge-base-en. At the time of writing, the BGE models are the highest performing models in the Hugging Face [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard) Leaderboard

In [None]:
# Define the name of the pre-trained model to use
model_name = "BAAI/bge-base-en"

# Initialise a SentenceTransformer model with the specified pre-trained model
encoder = SentenceTransformer(model_name, device="cuda")

# Initialise dictionary to store document embeddings and metadata
vectordb = {}
for i in range(0, len(docs)):
    # Extract text, source, and page information from each document in the set
    texts = [doc["text"] for doc in docs[i]]
    sources = [doc["source"] for doc in docs[i]]
    pages = [doc["page"] for doc in docs[i]]

    # Compute embeddings for extracted texts
    embeddings = encoder.encode(
        sentences=texts,
        convert_to_tensor=True,
        show_progress_bar=True
    )

    # Store the texts, sources, pages, and corresponding embeddings in vectordb
    vectordb[i] = {
        "texts": texts,
        "sources": sources,
        "pages": pages,
        "embeddings": embeddings
    }

Downloading (…)5de97/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)c37f85de97/README.md:   0%|          | 0.00/78.7k [00:00<?, ?B/s]

Downloading (…)7f85de97/config.json:   0%|          | 0.00/719 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)5de97/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

Downloading (…)c37f85de97/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)f85de97/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

Batches:   0%|          | 0/107 [00:00<?, ?it/s]

Batches:   0%|          | 0/15 [00:00<?, ?it/s]

Batches:   0%|          | 0/19 [00:00<?, ?it/s]

Batches:   0%|          | 0/113 [00:00<?, ?it/s]

In [None]:
def semantic_search(
    query: str,
    encoder: SentenceTransformer,
    vectordb: dict,
    min_results_length: int = 20,
    top_n: int = 3,
    metadata: bool = True
):
    """
    Perform semantic search using a query against a vector database.

    Args:
        * query (`str`): The query string for semantic search.
        * encoder (`SentenceTransformer`): A SentenceTransformer model.
        * vectordb (`dict`): A dictionary containing embeddings, texts, sources,
        and pages.
        * min_results_length (`int`, optional): Minimum length of words in a
        valid search result text. Default is 20.
        * top_n (`int`, optional): Number of top results to return. Default is 5.
        * metadata (`bool`, optional): Whether to return search metadata as list.
        Default is True.

    Returns:
        * `str`: Text containing top search results with text, source, and page
        information.
        * `dict`: A dictionary containing the search source and page.
    """
    # Encode the query into a vector using SentenceTransformer
    question_embedding = encoder.encode(query, convert_to_tensor=True)

    # Perform semantic search for each entry in the vector database
    hits = {}           # Store intermediate semantic search results
    valid_hits = {}     # Store valid results based on min_results_length
    for i in range(0, len(vectordb)):
        hits[i] = util.semantic_search(
            query_embeddings=question_embedding,
            corpus_embeddings=vectordb[i]["embeddings"],
            top_k=32
        )
        hits[i] = hits[i][0]
        hits[i] = sorted(hits[i], key=lambda x: x["score"], reverse=True)

    # Filter valid search results based on min_results_length
    for i in range(0, len(vectordb)):
        temp = []
        for hit in hits[i]:
            if len(vectordb[i]["texts"][hit["corpus_id"]].split(" ")) > min_results_length:
                temp.append((hit["corpus_id"], hit["score"]))
        valid_hits[i] = temp

    # Flatten and sort valid search results
    flattened_valid_hits = [[key, value] for key, values in valid_hits.items() for value in values]
    sorted_hits = sorted(flattened_valid_hits, key=lambda x: x[1][1], reverse=True)
    top_n_results = sorted_hits[:top_n]

    # Generate and format search result strings
    retrieved_info = ""
    for i, result in enumerate(top_n_results):
        retrieved_info += (
            f"SEARCH RESULT {i+1}:\n"
            + f"Text: {vectordb[result[0]]['texts'][result[1][0]]}\n"
            + f"Source: {vectordb[result[0]]['sources'][result[1][0]]}\n"
            + f"Page: {vectordb[result[0]]['pages'][result[1][0]]}\n"
        )

    if metadata:
        retrieved_info_metadata = []
        for result in top_n_results:
            temp = {
                "source": vectordb[result[0]]["sources"][result[1][0]],
                "page": vectordb[result[0]]["pages"][result[1][0]]
            }
            retrieved_info_metadata.append(temp)
        return retrieved_info, retrieved_info_metadata
    else:
        return retrieved_info

## 3. Load LLaMA 2 7B Chat

We load the [LLaMA 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) model and tokeniser:

* `tokeniser` is created using `AutoTokenizer` from the `transformers` library and loaded with the pre-trained LLaMA 2 tokeniser from the model checkpoint.
* `model` is created using `AutoModelForCausalLM` from the `transformers` library and loaded with the pre-trained [LLaMA 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) model from the model checkpoint. `torch_dtype` argument is set to "torch.float16", which uses the reduced precision 16-bit floating point format to speed up the model's computations. `device_map` is set to "auto" for automatic device placement. `use_auth_token` is set to True to utilise any necessary authentication tokens.

Loading LLaMA 2 requires token from Hugging Face.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Specify Hugging Face model checkpoint
llm_model = "meta-llama/Llama-2-7b-chat-hf"

# Initialise the tokeniser using the specified pre-trained model
tokeniser = AutoTokenizer.from_pretrained(llm_model, use_auth_token=True)

# Initialise the language model using the specified pre-trained model
model = AutoModelForCausalLM.from_pretrained(
    llm_model,
    device_map="auto",
    torch_dtype=torch.float16,
    use_auth_token=True
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]



Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]



Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

We then define a function to:
* Run a semantic search on the corpus to retrieve contextually relevant information.
* Create a system prompt that utilises the retrieved information.
* Feed the engineered prompt as input to the LLM.
* Print the RAG-LLM response with the corresponding source metadata.

In [None]:
def sierra_speak(max_tokens: int = 1500):
    """
    This function takes a user's question, runs a semantic search to retrieve
    the most contextually relevant information and feeds both the question and
    semantic search results to the large language model to process the inputs.
    The response aims to provide helpful and accurate information based on the
    search results.

    Args:
        * max_tokens (`int`, optional): The maximum number of tokens in the
        response. Default is 1500.

    Returns:
        The function displays the generated response and relevant metadata to
        the console.

    Note:
        * The function uses the `semantic_search` function to retrieve relevant
        information based on the user's question.
        * The generated response is formatted and printed to the console.
        * Metadata about the retrieved information, including source and page
        numbers, is displayed in the console.

    Example:
        >>> sierra_speak("What does NASA stand for?")
        ...
        [Generated AI response]
        ...
        * Source: [Source name] | Page: [Page number]
        ...
    """
    question = input("Question: ")
    print("-"*100)
    retrieved_info, metadata = semantic_search(
        query=question, encoder=encoder, vectordb=vectordb
    )
    b_inst, e_inst = "[INST]", "[/INST]"
    b_sys, e_sys = "<<SYS>>\n", "\n<</SYS>>\n\n"
    system_prompt = f"""You are a helpful, respectful and honest assistant.
            Always answer as helpfully as possible using the search results provided,
            which includes the search results text, source, and page number,
            delimited by triple backticks:
            ```{retrieved_info}```

            Include the source and page number in your answer. If the search results
            does not adequately answer the query provided, do not use it. If no search
            results are relevant, say so.

            If a question does not make any sense, or is not factually coherent,
            explain why instead of answering something not correct. If you don't know
            the answer to a question, please do not share false information and say
            you don't know."""

    prompt = b_inst + b_sys + system_prompt + e_sys + question + e_inst

    with torch.autocast("cuda", dtype=torch.float16):
        inputs = tokeniser(prompt, return_tensors="pt").to("cuda")
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            eos_token_id=tokeniser.eos_token_id,
            pad_token_id=tokeniser.eos_token_id
        )
        llm_response = tokeniser.batch_decode(outputs, skip_special_tokens=True)
        # Save response post-end of instruction token
        e_inst_index = llm_response[0].find("[/INST]")
        formatted_response = llm_response[0][e_inst_index + len(e_inst):]
        formatted_response = formatted_response.strip()

        # Split the text into paragraphs based on newlines, wrap each element
        paragraphs = formatted_response.split("\n")
        wrapped_paragraphs = []
        for paragraph in paragraphs:
            # Wrap each paragraph individually, maintaining existing newlines
            wrapped_paragraph = "\n\n".join(
                textwrap.fill(line, width=100) for line in paragraph.splitlines()
            )
            wrapped_paragraphs.append(wrapped_paragraph)
        wrapped_text = "\n\n".join(wrapped_paragraphs)

    print(wrapped_text)
    print("-"*100)
    for data in metadata:
        print(f"* Source: {data['source']} | Page: {data['page']}")
    print("-"*100)

## 4. Testing retrieval-augmented generation (RAG) <a name="section-4"></a>

This section tests the Q&A pipeline built by asking the following questions:
* At what size do meteors become a potential danger to humans on Earth?
* What are NASA's key priorities in NASA's Climate Strategy?
* How has the International Space Station advanced the field of robotics?
* When did NACA change its name to NASA?
* What are NASA's plans to mitigate risks from Near Earth Objects?

In [None]:
sierra_speak()

Question: At what size do meteors become a potential danger to humans on Earth?
----------------------------------------------------------------------------------------------------
According to the search results provided, meteors become a potential danger to humans on Earth when
they are larger than 10 meters in size.

Here are the relevant quotes from the search results:

* "Those larger than 10 meters in size could potentially cause some surface damage." (Source: NASA -
Planetary Defense Strategy - final-508, Page: 4)

* "Approximately 95 percent of these bodies have been found and none are a current threat." (Source:
NASA - Planetary Defense Strategy - final-508, Page: 4)

It's worth noting that the search results indicate that the majority of meteors are small enough to
burn up in the Earth's atmosphere before they reach the surface, so the likelihood of a meteor
causing damage to humans is relatively low. However, larger meteors have the potential to cause
significant damage if t

In [None]:
sierra_speak()

Question: What are NASA's key priorities in NASA's Climate Strategy?
----------------------------------------------------------------------------------------------------
According to NASA's Climate Strategy document (page 10), NASA's key priorities are:

1. Advancing Scientific Understanding: NASA will continue to conduct research to improve our
understanding of the Earth's climate system, including the role of human activities, the impacts of
climate change, and the potential for climate variability.

2. Enhancing Resilience: NASA will work to enhance the resilience of communities and ecosystems to
the impacts of climate change by providing critical information and tools to support decision-making
and adaptation efforts.

3. Promoting Sustainability: NASA will promote sustainability by advancing the development and use
of clean energy technologies, reducing greenhouse gas emissions, and supporting international
efforts to address climate change.

4. Supporting Climate-Resilient Operat

In [None]:
sierra_speak()

Question: How has the International Space Station advanced the field of robotics?
----------------------------------------------------------------------------------------------------
Based on the search results provided, the International Space Station (ISS) has significantly
advanced the field of robotics in several ways:

1. Development of Dual-Purpose Technologies: The precision and reliability requirements for space
robotics led to the development of dual-purpose technologies that can be used on Earth as well.
These technologies have improved the efficiency and effectiveness of robotic systems in various
industries, including manufacturing, healthcare, and logistics (Source: SEARCH RESULT 1, Page 123).

2. Test Bed for Future Technologies: The ISS provides a unique test bed for robotic and future
technologies, enabling the development and testing of new capabilities in a space environment. This
has led to the improvement of robotic systems for human exploration and beneficial appli

In [None]:
sierra_speak()

Question: When did NACA change its name to NASA?
----------------------------------------------------------------------------------------------------
According to the search results, there is no information available on when NACA (National Advisory
Committee for Aeronautics) changed its name to NASA (National Aeronautics and Space Administration).
The two organizations are distinct and have different histories and purposes. NACA was established
in 1915 and was responsible for the development of aeronautics and space exploration in the United
States until it was dissolved and replaced by NASA in 1958.

Therefore, I cannot provide an answer to your question as the search results do not provide any
information on this topic.
----------------------------------------------------------------------------------------------------
* Source: iss_benefits_for_humanity_3rded-508 | Page: 4
* Source: advancing_nasas_climate_strategy_2023 | Page: 7
* Source: nasa_-_planetary_defense_strategy_-_final-5

In [None]:
sierra_speak()

Question: What are NASA's plans to mitigate risks from Near Earth Objects?
----------------------------------------------------------------------------------------------------
According to Search Result 1, NASA's plans to mitigate risks from Near Earth Objects (NEOs) include
mitigation of potential hazards of such near-Earth objects impacting the Earth. The legislation also
directed NASA to find at least 90 percent of NEOs that are at least 140 meters by 2020, although
appropriations were not provided at the time to support this goal.

However, it's important to note that Search Result 2 suggests that NASA has learned from past
experiences and taken sufficient precautions to prevent large chunks of orbital debris from reaching
populated areas.

In summary, NASA's plans to mitigate risks from NEOs involve detecting and tracking these objects to
chunks of orbital debris from reaching populated areas.

I hope this helps! Let me know if you have any other questions.
-----------------------