# Retrieval Augmented Generation (RAG)

## What is bodhilib?
**bodhilib** is an Open Source (MIT License), Plugin Architecture based, Pythonic and Composable LLM Library.

**Bodhi** is a Sanskrit term for deep insight into reality. With bodhilib, we aspire to provide tools for a deeper understanding of the data-rich world around us.

### Plugin Architecture?
**bodhilib** in itself, only defines the models and interfaces. All the implementations are provided by plugins installed separately.

### Pythonic?
**bodhilib** prefers Pythonic (over Java-like) syntax, uses the Python's dynamic language power, follows conservative in what you send, liberal in what you accept (Postel's Law).

### Composable?
The interfaces take inspiration from functional languages, to create composability.

### LLM Library?
**bodhilib** aspires to the LLM library of choice, developer-friendly, and feature rich through plugin extensions.

---

## What is Retrieval Augmented Generation (RAG)?
Uses LLMs to retrieve relevant information from large corpus of data.

## What are the components of RAG?
1. **DataLoader** - loads the data from variety of sources
1. **Splitter** - splits the **Documents** into processible entities **Node**
1. **Embedder** - embeds the **Nodes** into a vector representation or **Embedding**
1. **VectorDB** - stores the **Embedding**, along with metadata (*insert*), also retrieves based on similarity (*query*)
1. **LLM** - a Large Language Model, to generate a response given an input or **Prompt**

## How does ingestion work in RAG?
1. **Data** is converted to **Documents** using **DataLoader**
1. **Documents** are converted to **Nodes** using **Splitter**
1. **Nodes** are enriched with **Embedding** using **Embedder**
1. **Nodes** and **Embeddings** are inserted as **Records** using **VectorDB** *insert*

![RAG Ingestion Pipeline](../images/rag-ingestion.png)

## How does query work in RAG?
1. **User** provides **Input Query**
1. **Input Query** is converted into **Embedding** using **Embedder**
1. **Embedding** is used to fetch similar **Records** using **VectorDB** *query*
1. **Input Query** and **Records** are used to create **Prompt** using **PromptTemplate**
1. **Prompt** is used to generate **Response** using **LLM**

![RAG Query Pipeline](../images/rag-query.png)


# RAG using bodhilib
## 1. Installation
Install the required libraries:

1. `bodhilib` - the core library that defines the models and interfaces
1. `bodhiext.openai` - the plugin implementing **LLM** interface for **OpenAI**
1. `bodhiext.qdrant` - the plugin implementing **VectorDB** interface for **Qdrant**
1. `bodhiext.sentence_transformers` - the plugin implementing **Embedder** interface to use **Sentence Transformers**
1. `bodhiext.file` - the plugin implementing **DataLoader** interface to load local files (packaged with bodhilib)
1. `bodhiext.text_splitter` - the plugin implementing **Splitter** interface to split based on sentences (packaged with bodhilib)
1. `python-dotenv` - utility library to load environment variables from local `.env` file
1. `fn.py` - Python library providing functional programming constructs, optional, used for Composability demo

In [1]:
!pip install -q bodhilib bodhiext.openai bodhiext.qdrant bodhiext.sentence_transformers python-dotenv fn.py

## 2. API Keys

In [2]:
import os
from getpass import getpass

from dotenv import load_dotenv

load_dotenv()
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

In [3]:
# utility method
import textwrap
from reprlib import repr

def wrap_text(text, width=100):
    wrapped_lines = []
    for line in text.splitlines():
        wrapped_lines.extend(textwrap.fill(line, width=width).splitlines())
    return '\n'.join(wrapped_lines)

def trim_text(text):
    return repr(text)

## 3. Initialize the Components

In [4]:
from bodhilib import (
    Distance,
    get_data_loader,
    get_embedder,
    get_llm,
    get_splitter,
    get_vector_db,
)

data_loader = get_data_loader("file")
splitter = get_splitter("text_splitter", max_len=300, overlap=30)
embedder = get_embedder("sentence_transformers")
vector_db = get_vector_db("qdrant", location=":memory:")
llm = get_llm('openai_chat', model='gpt-3.5-turbo')

# recreate vectordb database
collection_name = "test_collection"

if "test_collection" in vector_db.get_collections():
    vector_db.delete_collection("test_collection")
vector_db.create_collection(
    collection_name=collection_name,
    dimension=embedder.dimension,
    distance=Distance.COSINE,
)

True

## 4. RAG Ingestion

### 4.1 Load the data as Documents

In [5]:
import os
from pathlib import Path

current_dir = Path(os.getcwd())
data_dir = current_dir / ".." / "data" / "data-loader"
data_loader.add_resource(dir=str(data_dir))
docs = data_loader.load()
len(docs)

2

### 4.2 Split the Document into Nodes

In [6]:
nodes = splitter.split(docs)
len(nodes)

47

### 4.3 Enrich the Nodes with Embeddings

In [7]:
_ = embedder.embed(nodes)

print(trim_text(nodes[0].embedding))

[-0.046271130442619324, -0.0969996377825737, 0.09419207274913788, 0.014190234243869781, 0.02280914969742298, -0.01539283711463213, ...]


### 4.4 Insert the Nodes into VectorDB

In [8]:
_ = vector_db.upsert(collection_name, nodes)

print(vector_db.client.get_collection(collection_name).vectors_count)

47


## 5. RAG Query

### 5.1 Input Query

In [9]:
input_query = "According to Paul Graham, how to tackle when you are in doubt?"

### 5.2 Embed the Query

In [10]:
query_embedding = embedder.embed(input_query)

print(trim_text(query_embedding[0].embedding))

[-0.043427370488643646, 0.03586525842547417, 0.00045173554099164903, -0.00947035662829876, -0.021341439336538315, 0.02608679234981537, ...]


### 5.3 Get Similar Records

In [11]:
records = vector_db.query(collection_name, query_embedding[0].embedding, limit=5)

print(wrap_text(records[0].text))

who sits back and offers sophisticated-sounding criticisms of them. "It's easy to criticize" is true
in the most literal sense, and the route to great work is never easy.
There may be some jobs where it's an advantage to be cynical and pessimistic, but if you want to do
great work it's an advantage to be optimistic, even though that means you'll risk looking like a
fool sometimes. There's an old tradition of doing the opposite. The Old Testament says it's better
to keep quiet lest you look like a fool. But that's advice for seeming smart. If you actually want
to discover new things, it's better to take the risk of telling people your ideas.
Some people are naturally earnest, and with others it takes a conscious effort. Either kind of
earnestness will suffice. But I doubt it would be possible to do great work without being earnest.
It's so hard to do even if you are. You don't have enough margin for error to accommodate the
distortions introduced by being affected, intellectually dishon

### 5.4 Compose the Prompt

In [12]:
# prepare prompt template
from bodhilib import PromptTemplate

template = """Below are the text chunks from a blog/article. 
1. Read and understand the text chunks
2. After the text chunks, there are list of questions starting with `Question:`
3. Answer the questions from the information given in the text chunks
4. If you don't find the answer in the provided text chunks, say 'I couldn't find the answer to this question in the given text'


{% for text in texts %}
### START
{{ text }}
### END
{% endfor %}

Question: {{ query }}
Answer: 
"""

prompt_template = PromptTemplate(template=template, format='jinja2')

In [13]:
# compose the prompt
texts = [r.text for r in records]
prompt = prompt_template.to_prompts(texts=texts, query=input_query)

print(wrap_text(prompt[0].text))

Below are the text chunks from a blog/article.
1. Read and understand the text chunks
2. After the text chunks, there are list of questions starting with `Question:`
3. Answer the questions from the information given in the text chunks
4. If you don't find the answer in the provided text chunks, say 'I couldn't find the answer to this
question in the given text'
### START
who sits back and offers sophisticated-sounding criticisms of them. "It's easy to criticize" is true
in the most literal sense, and the route to great work is never easy.
There may be some jobs where it's an advantage to be cynical and pessimistic, but if you want to do
great work it's an advantage to be optimistic, even though that means you'll risk looking like a
fool sometimes. There's an old tradition of doing the opposite. The Old Testament says it's better
to keep quiet lest you look like a fool. But that's advice for seeming smart. If you actually want
to discover new things, it's better to take the risk of tel

### 5.5 Generate Response from LLM

In [14]:
response = llm.generate(prompt)

print(wrap_text(response.text))

According to the text, when in doubt, it is recommended to optimize for interestingness.


---

# Benefits of Bodhilib

## 1. Benefits of Plugin Architecture

- Modular and clean core
- Easy to understand, decluttered core
- Easy to extend for specific use-case using custom plugins
- Interchangeable components, uniform interface
- Selective integration, install only what you need
- Democratic and Distributed development, no single-repo for all implementation
- No PR queue on single repo, open-issues, few core-committers
- Stable core library, scheduled releases
- Independent plugin library fixes and releases
- No preferred partner integration, common interface for 3rd party to implement
- No pay-wall, plus-offering, walled-garden approach
- No re-inventing the wheel, cornered custom eco-system

## 2. Composable Functional Interface

See [Composability](../deep-dive/Composability.ipynb) notebook for the full example.

In [15]:
from fn import F # fn.py

if "test_collection" in vector_db.get_collections():
    vector_db.delete_collection("test_collection")
vector_db.create_collection(
    collection_name=collection_name,
    dimension=embedder.dimension,
    distance=Distance.COSINE,
)

# RAG Ingestion Pipeline
f = (
    F(data_loader.load)
    >> F(splitter.split)
    >> F(embedder.embed)
    >> F(lambda nodes: vector_db.upsert(collection_name=collection_name, nodes=nodes))
)()

# RAG Query Pipeline
response = (
    F(embedder.embed)
    >> F(
        lambda e: vector_db.query(
            collection_name=collection_name, embedding=e[0].embedding, limit=5
        )
    )
    >> F(lambda nodes: prompt_template.to_prompts(query=input_query, texts = [node.text for node in nodes]))
    >> F(llm.generate)
)(input_query)

print(wrap_text(response.text))

According to Paul Graham, when in doubt, one should optimize for interestingness and give different
types of work a chance to show what they're like.


---

# Roadmap and Contributing

Refer to [Roadmap](../contributing/roadmap.md) page for draft roadmap and contributing to Bodhilib.

# Contact

Github: [https://github.com/BodhiSearch/bodhilib](https://github.com/BodhiSearch/bodhilib)


Guide: [https://github.com/BodhiSearch/bodhilib-guide](https://github.com/BodhiSearch/bodhilib-guide)


Follow on Twitter [@BodhilibAI](https://twitter.com/BodhilibAI)


Join us on Bodhilib WhatsApp Community - [https://bit.ly/bodhilib-wa](https://bit.ly/bodhilib-wa)


![Bodhilib WhatsApp Community](../images/bodhilib-wa.svg)

---

🙏🏽 Thanks 🙏🏽