# Talk to Your Documents

This example of [OnPrem.LLM](https://github.com/amaiya/onprem) demonstrates retrieval augmented generation or RAG.

## Setup the `LLM` instance

In this notebook, we will use a model called **[Zephyr-7B-beta](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)**, which [performs well on RAG tasks](https://www.rungalileo.io/hallucinationindex).  When selecting a model, it is important to inspect the model's home page and identify the correct prompt format.  The prompt format for this model is [located here](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF#prompt-template-zephyr), and we will supply it directly to the `LLM` constructor along with the URL to the specific model file we want (i.e., *zephyr-7b-beta.Q4_K_M.gguf*).  We will offload layers to our GPU(s) to speed up inference using the `n_gpu_layers` parameter. (For more information on GPU acceleration, see [here](https://amaiya.github.io/onprem/#speeding-up-inference-using-a-gpu).) For the purposes of this notebook, we also supply `temperature=0` so that there is no variability in outputs.  You can increase this value for more creativity in the outputs. Finally, we will choose a non-default location for our vector database.

In [None]:
# | notest

from onprem import LLM, utils as U
import tempfile

In [None]:
# | notest

vectordb_path = tempfile.mkdtemp()

llm = LLM(model_url='https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.gguf', 
          prompt_template= "<|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>",
          n_gpu_layers=-1,
          temperature=0,
          store_type='dense',
          vectordb_path=vectordb_path,
         verbose=False)

llama_new_context_with_model: n_ctx_per_seq (3904) < n_ctx_train (32768) -- the full capacity of the model will not be utilized


Since OnPrem.LLM includes built-in support for Zephyr, an easier way to instantiate the LLM with Zephyr is as follows:

```python
llm = LLM(default_model='zephyr', 
          n_gpu_layers=-1,
          temperature=0,
          store_type='dense',
          vectordb_path=vectordb_path)
```



## Ingest Documents

When ingesting documents, they can be stored in one of two ways:
1. a **dense** vector store:  a conventional vector database like Chroma
2. a **sparse** vector store: a keyword-search engine

Sparse vector stores compute embeddings on-the-fly at inference time. As a result, sparse vector stores sacrifice a small amount of inference speed for significant speed ups in ingestion speed.  This makes it better suited for larger document sets.  Note that sparse vector stores include the contraint that any passages considered as sources for answers should have at least one word in common with the question being asked. You can specify the kind of vector store by supplying either `store_type="dense"` or `store_type="sparse"` when creating the `LLM` above.  We use a dense vector store in this example, as shown above.

For this example, we will download the 2024 National Defense Autorization Act (NDAA) report and ingest it.

In [None]:
# | notest

U.download('https://www.congress.gov/118/crpt/hrpt125/CRPT-118hrpt125.pdf', '/tmp/ndaa/ndaa.pdf', verify=True)

[██████████████████████████████████████████████████]

In [None]:
# | notest
llm.ingest("/tmp/ndaa/")

Creating new vectorstore at /tmp/tmp076q1l40/dense
Loading documents from /tmp/ndaa/


Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00,  1.51it/s]
Processing and chunking 672 new documents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.82it/s]


Split into 5202 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:17<00:00,  2.96s/it]

Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods





## Asking Questions to Your Documents

In [None]:
# | notest

result = llm.ask("What is said about artificial intelligence training and education?")


The context provided discusses the implementation of an AI education strategy required by Section 256 of the National Defense Authorization Act for Fiscal Year 2020. The strategy aims to educate servicemembers in relevant occupational fields, with a focus on data literacy across a broader population within the Department of Defense. The committee encourages the Air Force and Space Force to leverage government-owned training platforms informed by private sector expertise to accelerate learning and career path development. Additionally, the committee suggests expanding existing mobile enabled platforms to train and develop the cyber workforce of the Air Force and Space Force. Overall, there is a recognition that AI continues to be central to warfighting and that proper implementation of these new technologies requires a focus on education and training.

The answer is stored in `results['answer']`. The documents retrieved from the vector store used to generate the answer are stored in `results['source_documents']` above.

In [None]:
# | notest


print('REFERENCES')
print()
for d in result['source_documents']:
    print(f"On Page {d.metadata['page']} in {d.metadata['source']}:")
    print(d.page_content)
    print('----------------------------------------')
    print()

REFERENCES

On Page 359 in /tmp/ndaa/ndaa.pdf:
‘‘servicemembers in relevant occupational fields on matters relating 
to artificial intelligence.’’ 
Given the continued centrality of AI to warfighting, the com-
mittee directs the Chief Digital and Artificial Intelligence Officer of 
the Department of Defense to provide a briefing to the House Com-
mittee on Armed Services not later than March 31, 2024, on the 
implementation status of the AI education strategy, with emphasis 
on current efforts underway, such as the AI Primer course within
----------------------------------------

On Page 359 in /tmp/ndaa/ndaa.pdf:
intelligence (AI) and machine learning capabilities available within 
the Department of Defense. To ensure the proper implementation 
of these new technologies, there must be a focus on data literacy 
across a broader population within the Department. Section 256 of 
the National Defense Authorization Act for Fiscal Year 2020 (Pub-
lic Law 116–92) required the Department of D

In [None]:
# | notest

result = llm.ask("What is said about hypersonics?")


The context provided highlights the importance of expanding and fully funding programs related to hypersonic technology. The House Committee on Armed Services has directed the Secretary of Defense to submit a report by December 1, 2023, detailing efforts to ensure the development and sustainment of a future hypersonic workforce. The committee notes concerns about advancements in hypersonic capabilities made by peer and near-peer adversaries, emphasizing the need for investments to enhance the ability to develop, test, and field advanced hypersonic capabilities. The lack of research and development funding directed towards fielding a reusable hypersonic platform with aircraft-like operations and qualities is also raised as a concern. To address this issue, the committee directs the Under Secretary of Defense to develop graduate and pre-doctoral degree programs for the hypersonics workforce and increase funding for advanced hypersonics facilities for research and graduate-level educatio

In [None]:
# | notest

print('REFERENCES')
print()
for d in result['source_documents']:
    print(f"On Page {d.metadata['page']} in {d.metadata['source']}:")
    print(d.page_content)
    print('----------------------------------------')
    print()

REFERENCES

On Page 120 in /tmp/ndaa/ndaa.pdf:
lieves those programs should be expanded and fully funded, par-
ticularly in the field of hypersonic technology. 
Therefore, the committee directs the Secretary of Defense to sub-
mit a report to the House Committee on Armed Services not later 
than December 1, 2023, on the Department’s efforts to ensure the 
development and sustainment of its future hypersonic workforce. 
The report shall include: 
(1) an overview of hypersonic workforce development objectives
----------------------------------------

On Page 81 in /tmp/ndaa/ndaa.pdf:
velopment of carbon-carbon high temperature composites for 
hypersonic weapons. 
Hypersonics test infrastructure 
The committee notes with concern the advancements in 
hypersonic capabilities made by peer and near-peer adversaries. To 
ensure the U.S. military can effectively deter and, if necessary, de-
feat these national security threats, the Department of Defense 
must make investments to enhance its abi

#|hide

## Additional Tips

The `LLM.ask`and `LLM.ingest` methods include many options for more complex scenarios.  

#### LLM.ingest options

- If supplying `infer_table_structure=True` to `LLM.ingest`, the `LLM.ask` method will consider tables within PDFs when answering questions. This behavior can be controlled with the `table_k` and `table_score_threshold` parameters in `LLM.ask`.
- If suppyling `extract_document_titles=True` to `LLM.ingest`, the title of each document will be inferred and added to each document chunk for potentially better retrieval.
- If supplying `caption_tables=True`, an LLM-generated caption will be added to every extracted table for potentially better table retrieval.
- Increasing chunk size of sources for more answer context
#### LLM.ask options
- If supplying `selfask=True` as an argument, a [Self-Ask prompting strategy](https://learnprompting.org/docs/advanced/few_shot/self_ask) will be used to decompose the question into subquestions.
- Adjusting prompts for QA with `prompt_template` argument to `LLM.ask`
- Increasing number of sources to consider (`k` parameter to `LLM.ask`)
- Filtering sources with `filters` and `where_document`
- Adding a score threshold for sources