Skip to content

GhostPack/RAGnarok

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAGnarok

RAGnarok is a Retrieval-Augmented Generation chatbot frontend for Nemesis. It allows you to ask questions about text extracted from compatible documents processed by Nemesis.

RAG

Short explanation: The general idea with Retrieval-Augmented Generation (RAG) is to allow a large language model (LLM) to answer questions about documents you've indexed.

Medium explanation: RAG involves processing and turning text inputs into set-length vectors via an embedding model, which are then stored in a backend vector database. Questions to the LLM are then used to look up the "most similiar" chunks of text which are then fed into the context prompt for a LLM.

Source

Longer explanation in the rest of the section :)

Even Longer explanation in this blog post.

Indexing

Retrieval-augumented generation is an architecture where documents being processed undergo the following process:

  1. Plaintext is extracted from any incoming documents.
    • Nemesis uses Apache Tika to extract text from compatible documents.
  2. The text is tokenized into chunks of up to X tokens, where X depends on the context window of the embedding model used.
    • Nemesis uses Langchain's TokenTextSplitter, a chunk size of 510 tokens, and a 15% overlap between chunks.
  3. Each chunk of text is processed by an embedding model which turns the input text into a fixed-length vector of floats.
    • As Pinecone explains, what's cool about embedding models is that the vector representations they produce preserve "semantic similiarity", meaning that more similiar chunks of text will have more similiar vectors.
    • Nemesis currently uses the TaylorAI/gte-tiny embedding model as it's fast, but others are possible.
  4. Each vector and associated snippet of text is stored in a vector database.
    • Nemesis uses Elasticsearch for vector storage.

Semantic Search

This is the initial indexing process that Nemesis has been performing for a while. However, in order to complete a RAG-pipeline, the next steps are:

  1. Take an input prompt, such as "What is a certificate?" and run it through the same embedding model files were indexed with.
  2. Query the vector database (e.g., Elasticsearch) for the nearest k vectors + associated text chunks that are "closest" to the prompt input vector.
    • This will return the k chunks of text that are the most similiar to the input query.
  3. We also use Elasticsearch's traditional(-ish) BM25 text search over the text for each chunk.
    • These two lists of results are combined with Reciprocal Rank Fusion, and the top results from the fused list are returned.
    • Note: steps 6 and 7 happen in the nlp container in Nemesis. This is exposed at http://<nemesis>/nlp/hybrid_search

Reranking

We now have the k most chunks of text most simliar to our input query. If we want to get a bit facier, we can execute what's called reranking.

  1. With reranking, the the prompt question and text results are paired up (question, text) and fed into a more powerful model (well, more powerful than the embedding model) tuned and known as a reranker. The reranker generates a simliarity score of the input prompt and text chunk.
  2. The results are then reranked and the top X number of results are selected.

LLM Processing

  1. Finally, the resulting texts are combined with a prompt to the (local) LLM. Think something along the lines of "Given these chunks of text {X}, answer this question {Y}".

About

A Nemesis powered Retrieval-Augmented Generation (RAG) chatbot proof-of-concept.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published