# Chatting with your Data
### From RAG(s) to Riches

[Retrieval Augmented Generation (R.A.G.)](https://gpt-index.readthedocs.io/en/latest/getting_started/concepts.html) has been proven to be an extremely valuable paradigm for using Large Language Models with your own (unstructured) data.

In this notebook, we will explore using open-source Large Language Models via RAG over unclassified [DoD Policy documents](https://www.esd.whs.mil/DD/DoD-Issuances/).

## Installing Dependencies

In [None]:
## Installing General Dependencies
!pip install huggingface-hub -q
!pip install llama-index -q
!pip install transformers -q

## Installing Dependencies for parsing PDFs
!pip install pypdf -q
!pip install "unstructured[all-docs]" -q
!pip install llama-hub -q
!sudo apt install tesseract-ocr -q
!pip install pytesseract -q
!apt-get install poppler-utils -q

## Installing llama-cpp-python
# GPU llama-cpp-python; Starting from version llama-cpp-python==0.1.79, it supports GGUF
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir


## Formatting Colab Display

In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

## Setting up Llama Index

In [None]:
from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

## Pulling Model Weights

In [None]:
# model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
model_url = "https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral-7b-openorca.Q5_K_M.gguf"


In [None]:
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 30},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
response = llm.complete("Hello! Can you tell me about the US Department of Defense?")
print(response.text)

In [None]:
response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
    print(response.delta, end="", flush=True)

## Configuring Embedding Model

In [None]:
# Use Huggingface embeddings
from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

In [None]:
# restart runtime at this point
import os
os.kill(os.getpid(), 9)

In [None]:
# create a service context
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


## Loading Documents into Index

In [None]:
from pathlib import Path
from llama_index import download_loader
from llama_index import SimpleDirectoryReader

UnstructuredReader = download_loader('UnstructuredReader')

dir_reader = SimpleDirectoryReader('/content/data', file_extractor={
  ".pdf": UnstructuredReader(),
})

documents = dir_reader.load_data()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
# create vector store index
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

In [None]:
# set up query engine
query_engine = index.as_query_engine(streaming=True)

In [None]:
response = query_engine.query("what is DSCIR")
print(response)

 Defense Support to Cyber Incident Response (DSCIR) refers to the support provided by the Department of Defense (DoD) to assist in responding to cyber incidents or threats that may impact DoD networks, systems, and capabilities. This support can include direct on-location support, remote support, or a combination of both as appropriate. DSCIR is authorized under specific conditions and guidelines, such as written acknowledgment from the entity receiving federal support and permission for DoD to access information and information systems related to the incident.


In [None]:
def query_docs(question):
  print(question)
  response = query_engine.query(question)
  print(response)
  return response.response_txt

In [None]:
index.storage_context.persist("test_index")

In [None]:
index.storage_context.vector_store.to_dict()

Gradio Section

In [None]:
!pip install -q gradio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.3/20.3 MB[0m [31m71.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.9/92.9 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m299.2/299.2 kB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.7/75.7 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.7/138.7 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.5/59.5 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.9/129.9 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m7.1

In [None]:
import gradio

gradio.Interface(fn=query_docs, inputs="text", outputs="text").launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://21a4e424256b36fefa.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
