RAGnarok

RAGnarok is a Retrieval-Augmented Generation chatbot frontend for Nemesis. It allows you to ask questions about text extracted from compatible documents processed by Nemesis.

RAG

Short explanation: The general idea with Retrieval-Augmented Generation (RAG) is to allow a large language model (LLM) to answer questions about documents you've indexed.

Medium explanation: RAG involves processing and turning text inputs into set-length vectors via an embedding model, which are then stored in a backend vector database. Questions to the LLM are then used to look up the "most similiar" chunks of text which are then fed into the context prompt for a LLM.

Source

Longer explanation in the rest of the section :)

Even Longer explanation in this blog post.

Indexing

Retrieval-augumented generation is an architecture where documents being processed undergo the following process:

Plaintext is extracted from any incoming documents.
- Nemesis uses Apache Tika to extract text from compatible documents.
The text is tokenized into chunks of up to X tokens, where X depends on the context window of the embedding model used.
- Nemesis uses Langchain's TokenTextSplitter, a chunk size of 510 tokens, and a 15% overlap between chunks.
Each chunk of text is processed by an embedding model which turns the input text into a fixed-length vector of floats.
- As Pinecone explains, what's cool about embedding models is that the vector representations they produce preserve "semantic similiarity", meaning that more similiar chunks of text will have more similiar vectors.
- Nemesis currently uses the TaylorAI/gte-tiny embedding model as it's fast, but others are possible.
Each vector and associated snippet of text is stored in a vector database.
- Nemesis uses Elasticsearch for vector storage.

Semantic Search

This is the initial indexing process that Nemesis has been performing for a while. However, in order to complete a RAG-pipeline, the next steps are:

Take an input prompt, such as "What is a certificate?" and run it through the same embedding model files were indexed with.
Query the vector database (e.g., Elasticsearch) for the nearest k vectors + associated text chunks that are "closest" to the prompt input vector.
- This will return the k chunks of text that are the most similiar to the input query.
We also use Elasticsearch's traditional(-ish) BM25 text search over the text for each chunk.
- These two lists of results are combined with Reciprocal Rank Fusion, and the top results from the fused list are returned.
- Note: steps 6 and 7 happen in the nlp container in Nemesis. This is exposed at http://<nemesis>/nlp/hybrid_search

Reranking

We now have the k most chunks of text most simliar to our input query. If we want to get a bit facier, we can execute what's called reranking.

With reranking, the the prompt question and text results are paired up (question, text) and fed into a more powerful model (well, more powerful than the embedding model) tuned and known as a reranker. The reranker generates a simliarity score of the input prompt and text chunk.
- RAGnarok uses an adapted version of BAAI/bge-reranker-base for reranking.
The results are then reranked and the top X number of results are selected.

LLM Processing

Finally, the resulting texts are combined with a prompt to the (local) LLM. Think something along the lines of "Given these chunks of text {X}, answer this question {Y}".

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ragnarok		ragnarok
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
launch.sh		launch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ragnarok

ragnarok

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

launch.sh

launch.sh

Repository files navigation

RAGnarok

RAG

Indexing

Semantic Search

Reranking

LLM Processing

About

Releases

Packages

Languages

License

GhostPack/RAGnarok

Folders and files

Latest commit

History

Repository files navigation

RAGnarok

RAG

Indexing

Semantic Search

Reranking

LLM Processing

About

Resources

License

Stars

Watchers

Forks

Languages