Here we will demonstrate how to run a local *Retrieval Augmented Generation* (RAG) setup where both the vetor store and the language models (embedding and LLM) are local. This can be desireable if you want to make internal documents *chattable* but do not wish to sent them across the Internet for processing in the cloud.


## Brief techstack overview
- The __Ollama__ official __docker__ image is used managing and running the language models.
- __Weaviate__ is used as vector store
- The __LangChain__ python module is used for orchestration of the RAG setup.
- __FastAPI__ is choosen as web server.

## Deployment of models
Firstly, we build and run the docker images with `make run`. We can then deploy the Ollama models using the command `make fetch_models`.

We will now move on to setting up the vector database containing the internal documents (or our case the wikipedia page on the Napoleonic Wars).

We will now ask the model the following question:

> "Under the Napoleonic wars, which countries took part in the fifth coalition against France?"

As LLMs have randomness built in, we repeat the query five times. The core members of the fifth coalition were the Austrian Empire and United Kingdom. From the five answers below, we see that the model either upfront states that it does not know the answers or generates lists that include more countries than the actual members of the coalition.

A larger model such as ChatGPT would likely have answered correctly but that is besides the point. We are considering the case of internal documnets which neither ChatGPT nor our local llama model knows about.

In [3]:
from backend.rag import query_raw_model

q = "Under the Napoleonic wars, which countries took part in the fifth coalition against France?"

responses = [
    query_raw_model(q) for i in range(5)
]

In [5]:
for i, response in enumerate(responses):
    print(f'Response {i+1}:')
    print(response)
    print("-" * 100)


Response 1:
The Fifth Coalition was formed in 1809 and consisted of Austria, Britain, Russia, Sweden, and the Ottoman Empire.
----------------------------------------------------------------------------------------------------
Response 2:
I cannot verify which country took part in the Fifth Coalition of the Napoleonic Wars.
----------------------------------------------------------------------------------------------------
Response 3:
The fifth coalition against France consisted of Britain and Russia.
----------------------------------------------------------------------------------------------------
Response 4:
The Fifth Coalition was formed in 1809, during the Napoleonic Wars. The countries that took part in this coalition were:

1. Austria
2. Russia
3. Sweden-Norway
4. Great Britain (United Kingdom)
5. Ottoman Empire
----------------------------------------------------------------------------------------------------
Response 5:
The Fifth Coalition was formed against Napoleon's Frenc

## Weaviate vector store

The script `populate_db.py` will fetch Napoleonic Wars Wikipedia page. Each section on the page is split into chunks using an instance of the `RecursiveCharacterTextSplitter` from LangChain. This ensures that chunk divisions follows the section layout of the page wich should ensure higher quality of the chunks.
Each chunk is then stored in the Weaviate database including the relevant section title and an embedding vector.

In [4]:
from scripts import populate_db
populate_db.run()

Imported 155 chunks into the Napoleonic Wars collection


In [2]:
import weaviate

client = weaviate.connect_to_local()
collection = client.collections.get('napoleonic_wars')

for item in collection.iterator(include_vector=True):
    print(f'Section title: {item.properties["title"]}')
    print(f'Chunk text: {item.properties["text"][0:100]} ...')
    print(f'Embedding vector: {item.vector["default"][0:5]} ...')
    break

client.close()

Section title: Invasion of Russia, 1812
Chunk text: The central issue for both Emperor Napoleon I and Tsar Alexander I was control over Poland. Each wan ...
Embedding vector: [-0.00635263929143548, 0.03718530014157295, -0.15025673806667328, -0.012791609391570091, 0.0685529112815857] ...


With the vector database set up, we are now ready to build our RAG pipeline.

## RAG

In [2]:
from backend import rag

my_RAGAgent = rag.LocalRAGAgent()

In [2]:
docs = my_RAGAgent.vector_store.similarity_search_with_score("Under the Napoleonic wars, which countries took part in the fifth coalition against France?")

In [3]:
docs[0]

(Document(metadata={'title': 'War of the Fifth Coalition, 1809'}, page_content="The Fifth Coalition (1809) of Britain and Austria against France formed as Britain engaged in the Peninsular War in Spain and Portugal. The sea became a major theatre of war against Napoleon's allies. Austria, previously an ally of France, took the opportunity to attempt to restore its imperial territories in Germany as held prior to Austerlitz. During the time of the Fifth Coalition, the Royal Navy won a succession of victories in the French colonies. On land the major battles included Battles"),
 1.0)

In [4]:
query = "Under the Napoleonic wars, which countries took part in the fifth coalition against France?"

for event in my_RAGAgent.stream(query):
    event["messages"][-1].pretty_print()




Under the Napoleonic wars, which countries took part in the fifth coalition against France?
Tool Calls:
  retrieve_context (7669cd28-30c3-475f-b084-1130d8573b48)
 Call ID: 7669cd28-30c3-475f-b084-1130d8573b48
  Args:
    query: Fifth Coalition against France
Name: retrieve_context

Source: {'title': 'War of the Fifth Coalition, 1809'}
Content: The Fifth Coalition (1809) of Britain and Austria against France formed as Britain engaged in the Peninsular War in Spain and Portugal. The sea became a major theatre of war against Napoleon's allies. Austria, previously an ally of France, took the opportunity to attempt to restore its imperial territories in Germany as held prior to Austerlitz. During the time of the Fifth Coalition, the Royal Navy won a succession of victories in the French colonies. On land the major battles included Battles

Source: {'title': 'War of the Fifth Coalition, 1809'}
Content: On land, the Fifth Coalition attempted few extensive military endeavours. One, the Walche

In [7]:
# query = "what was the battle of waterloo?"

# for event in my_RAGAgent.stream(query):
#     event["messages"][-1].pretty_print()


In [6]:
# from backend.rag import query_raw_model

# query_raw_model("what was the battle of waterloo?")