# Chatting with your Data
### From RAG(s) to Riches

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/deptofdefense/LLMs-at-DoD/blob/main/tutorials/Chatting%20with%20your%20Docs.ipynb)

**By: Glenn Parham, [Defense Digital Service](https://dds.mil)**

[Retrieval Augmented Generation (R.A.G.)](https://gpt-index.readthedocs.io/en/latest/getting_started/concepts.html) has been proven to be an extremely valuable paradigm for using Large Language Models with your own (unstructured) data.

In this notebook, we will explore using open-source Large Language Models via RAG over unclassified [DoD Policy documents](https://www.esd.whs.mil/DD/DoD-Issuances/).

This notebooks leverages the following open-source resources:
- Llama-Index
- Mistral-7B

**Note:** If you're running this in Google Colab, please make sure you're only handling unclassified documents.

## Installing Dependencies

In [1]:
## Installing General Dependencies
!pip install huggingface-hub -q
!pip install llama-index -q
!pip install transformers -q

## Installing Dependencies for parsing PDFs
!pip install pypdf -q
!pip install "unstructured[all-docs]" -q
!pip install llama-hub -q
!sudo apt install tesseract-ocr -q
!pip install pytesseract -q
!apt-get install poppler-utils -q

## Installing llama-cpp-python
# GPU llama-cpp-python; Starting from version llama-cpp-python==0.1.79, it supports GGUF
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir


Reading package lists...
Building dependency tree...
Reading state information...
tesseract-ocr is already the newest version (4.1.1-2.1build1).
0 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.
Reading package lists...
Building dependency tree...
Reading state information...
poppler-utils is already the newest version (22.02.0-2ubuntu0.3).
0 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.20.tar.gz (8.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m25.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Collecting numpy>=1.20.

## Formatting Colab Display

In [2]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

## Setting up Llama Index

In [3]:
from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

## Pulling Model Weights

In [4]:
# model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
model_url = "https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral-7b-openorca.Q5_K_M.gguf"


In [5]:
llm = LlamaCPP(
    model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 30},
    # transform inputs
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [6]:
# Non-streaming LLMs
response = llm.complete("Hello! Can you tell me a little about the US Department of Defense?")
print(response)



The U.S. Department of Defense (DoD) is the federal executive department responsible for coordinating and supervising all agencies and functions concerned with national security and the armed forces of the United States. Established in 1947, the DoD oversees the country's military forces, including the Army, Navy, Air Force, Marine Corps, and Coast Guard.

The Department of Defense is headed by the Secretary of Defense, who is a member of the President's Cabinet. The organization is divided into three major components: the Office of the Secretary of Defense (OSD), the Joint Staff, and the Combat Support Agencies.

The OSD is responsible for developing and implementing defense policies and strategies, managing the budget, and overseeing the acquisition and development of weapons systems. The Joint Staff provides integrated military advice to the President, the Secretary of Defense, and other senior officials on matters related to national security. The Combat Support Agencies are resp

In [7]:
## Streaming LLMs
response_iter = llm.stream_complete("Can you write a short poem about the US Department of Defense?")
for response in response_iter:
    print(response.delta, end="", flush=True)

Llama.generate: prefix-match hit




The US Department of Defense,
Protects our land with strength and grace;
Guardians of freedom's cause,
They stand firm in every place.

With courage and commitment,
They serve our nation with pride;
Their dedication is unwavering,
In times of war or peace, they abide.

Throughout the years, they've fought for us,
And kept us safe from harm;
Their sacrifices are immense,
Our gratitude is their reward.

As we stand together,
We honor those who serve;
May their strength and valor never cease,
Long may they endure.

## Configuring Embedding Model

In [8]:
# Use Huggingface embeddings
from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

In [9]:
# BUG: You might need to restart runtime at this point via Menu > Runtime > Restart Runtime.
# Otherwise, you'll get an error with the numpy library.
# Looking into this...

In [10]:
# create a service context
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


# Fetching DoD Policy Documents

For this examples we'll use the following documents:
- [DOD INSTRUCTION 5030.07 COORDINATION OF SIGNIFICANT LITIGATION AND OTHER MATTERS INVOLVING THE DEPARTMENT OF JUSTICE](https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodi/503007p.pdf?ver=FdbnkRjs8wfSzwTV7XNPGw%3d%3d), October 12, 2023
- [DOD INSTRUCTION 6055.15
DOD LASER PROTECTION PROGRAM FOR MILITARY LASERS](https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodi/605515p.pdf?ver=NL-WXDYnI9H5TOwUUi82lw%3d%3d), August 25, 2023

In [11]:
# create "sample_documents" directory
!mkdir sample_documents

In [12]:
import requests

def download_pdf(url, destination_filename):
    """
    Download a PDF from a URL and save it to a specified location in Google Colab.

    Parameters:
    url (str): The URL of the PDF to download.
    destination_filename (str): The filename to save the downloaded PDF as.

    Returns:
    None
    """
    # Send a HTTP request to the URL of the PDF
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for HTTP errors
    except requests.RequestException as e:
        print(f"An HTTP error occurred: {e}")
    else:
        # If the request was successful, write the content to a local file
        with open(destination_filename, 'wb') as pdf_file:
            pdf_file.write(response.content)
        print(f"PDF successfully downloaded and saved as {destination_filename}")


download_pdf("https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodi/503007p.pdf?ver=FdbnkRjs8wfSzwTV7XNPGw%3d%3d", "sample_documents/dod_doj_policy.pdf")
download_pdf("https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodi/605515p.pdf?ver=NL-WXDYnI9H5TOwUUi82lw%3d%3d", "sample_documents/dod_lasers_policy.pdf")



PDF successfully downloaded and saved as sample_documents/dod_doj_policy.pdf
PDF successfully downloaded and saved as sample_documents/dod_lasers_policy.pdf


## Loading Documents into (Llama)Index

In [13]:
from pathlib import Path
from llama_index import download_loader
from llama_index import SimpleDirectoryReader

UnstructuredReader = download_loader('UnstructuredReader')

dir_reader = SimpleDirectoryReader('/content/sample_documents', file_extractor={
  ".pdf": UnstructuredReader(),
})

documents = dir_reader.load_data()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [14]:
# create vector store index
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

In [15]:
# set up query engine
query_engine = index.as_query_engine()

In [16]:
# Sample queries:
# - What happens when DoD senior officials are involved with DOJ litigation? Answer in haiku form.
# - What should I do in the event of some laser incident?

response = query_engine.query("What should I do in the event of some laser incident?")
print(response)

Llama.generate: prefix-match hit


 In the event of a laser incident, you should follow these steps: 1) Assess the situation to determine if there is an immediate danger or harm to any individuals or property. 2) If there is an immediate danger, take appropriate action to mitigate the risk and ensure safety. This may include evacuating the area, alerting emergency responders, or implementing control measures as recommended by the LHA. 3) Document the incident, including details such as the type of laser, location, time, and any injuries or damages sustained. 4) Report the incident to the appropriate authorities, such as the LSRC or LSRA, depending on the severity of the incident and your jurisdiction. 5) Follow up with the responsible party to ensure corrective actions are taken to prevent future incidents. [/INST]


In [17]:
# inspect response
response

Response(response=' In the event of a laser incident, you should follow these steps: 1) Assess the situation to determine if there is an immediate danger or harm to any individuals or property. 2) If there is an immediate danger, take appropriate action to mitigate the risk and ensure safety. This may include evacuating the area, alerting emergency responders, or implementing control measures as recommended by the LHA. 3) Document the incident, including details such as the type of laser, location, time, and any injuries or damages sustained. 4) Report the incident to the appropriate authorities, such as the LSRC or LSRA, depending on the severity of the incident and your jurisdiction. 5) Follow up with the responsible party to ensure corrective actions are taken to prevent future incidents. [/INST]', source_nodes=[NodeWithScore(node=TextNode(id_='ab019b7e-dbfc-4233-a0e0-bdec10368924', embedding=None, metadata={'file_path': '/content/sample_documents/dod_lasers_policy.pdf', 'file_name'

In [18]:
def query_docs(question):
  print(question)
  response = query_engine.query(question)
  print(response)
  return response.response_txt

In [19]:
# Save Index to local storage
index.storage_context.persist("test_index")

In [20]:
# View index in notebook
index.storage_context.vector_store.to_dict()

{'embedding_dict': {'5696e8b7-30e9-4dbb-bf7a-b2e67e6e080f': [-0.054576270282268524,
   -0.03527482971549034,
   -0.013195806182920933,
   -0.038566287606954575,
   0.06982273608446121,
   -0.027088694274425507,
   -0.02037728950381279,
   0.013043100945651531,
   -0.021424023434519768,
   -0.016438685357570648,
   0.05720081552863121,
   0.06085643917322159,
   -0.005900850053876638,
   -4.737916970043443e-05,
   -0.05250061675906181,
   0.03685837239027023,
   0.017539730295538902,
   0.05051422864198685,
   0.06141273304820061,
   0.06942179799079895,
   0.017453324049711227,
   0.03022758662700653,
   0.007991461083292961,
   0.03964424878358841,
   -0.01854620687663555,
   0.04961195960640907,
   -0.04927770048379898,
   -0.0629214346408844,
   -0.031387872993946075,
   -0.1750672310590744,
   0.03773382678627968,
   0.00970580242574215,
   -0.03970843926072121,
   -0.026385482400655746,
   0.017772817984223366,
   0.022741520777344704,
   -0.020993825048208237,
   0.03132896497845

## Gradio

For a better user interface, we can use Gradio to interact with our LLM!

**Note:** In this demo, we are hosting our Gradio app publicly, since this is all unclassified info.  If running this with anything above unclassified, please ensure **share** is set to False.

In [30]:
!pip install -q gradio

In [31]:
!pip install pydantic
import gradio

# IF RUNNING THIS WITH INFO ABOVE UNCLASSIFIED, MAKE SURE share=FALSE
gradio.Interface(fn=query_docs, inputs="text", outputs="text").launch(share=True, debug=True)



ImportError: ignored