# Build a Contextual RAG System with Hybrid Search and Reranking

![](https://i.imgur.com/BNeL4OZ.png)

## Install OpenAI, and LangChain dependencies

In [None]:
!pip install langchain==0.3.4
!pip install langchain-openai==0.2.3
!pip install langchain-community==0.3.3
!pip install jq==1.8.0
!pip install pymupdf==1.24.12
!pip install httpx==0.27.2
!pip install -qU langchain-openai

Collecting langchain==0.3.4
  Downloading langchain-0.3.4-py3-none-any.whl.metadata (7.1 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.3.4)
  Downloading langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Downloading langchain-0.3.4-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langsmith-0.1.147-py3-none-any.whl (311 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.8/311.8 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langsmith, langchain
  Attempting uninstall: langsmith
    Found existing installation: langsmith 0.2.10
    Uninstalling langsmith-0.2.10:
      Successfully uninstalled langsmith-0.2.10
  Attempting uninstall: langchain
    Found existing installation: langchain 0.3.14
    Uninstalling langchain-0.3.14:
      Successfully uninstalled langchain-0.3.14
Successfully installed langchain-0.

## Install Chroma Vector DB LangChain wrapper

In [None]:
!pip install langchain-chroma==0.1.4

Collecting langchain-chroma==0.1.4
  Downloading langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0 (from langchain-chroma==0.1.4)
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting fastapi<1,>=0.95.2 (from langchain-chroma==0.1.4)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting build>=1.0.3 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma==0.1.4)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb!=0.5.4,!=0.

## Install BM25 dependencies

In [None]:
!pip install rank_bm25==0.2.2

Collecting rank_bm25==0.2.2
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


## Enter OPENROUTER API Key

In [None]:
import getpass
import os

if not os.environ.get("OPENROUTER_API_KEY"):
  os.environ["OPENROUTER_API_KEY"] = getpass.getpass("Enter API key for OpenRouter: ")

Enter API key for OpenRouter: ··········


### Get the dataset

In [None]:
!unzip rag_docs.zip

Archive:  rag_docs.zip
   creating: rag_docs/
  inflating: rag_docs/buddhism_wikidata.jsonl  
  inflating: rag_docs/Deepseekqna.pdf  
  inflating: rag_docs/norbuqna.pdf   


### Load and Process JSON Documents

In [None]:
from langchain.document_loaders import JSONLoader

loader = JSONLoader(file_path='./rag_docs/buddhism_wikidata.jsonl',
                    jq_schema='.',
                    text_content=False,
                    json_lines=True)
wiki_docs = loader.load()

In [None]:
len(wiki_docs)

100

In [None]:
wiki_docs[3]

Document(metadata={'source': '/content/rag_docs/buddhism_wikidata.jsonl', 'seq_num': 4}, page_content='{"id": "BUD004", "title": "Non-existence Craving", "paragraphs": ["Vibhava-tanha represents the craving for non-existence or self-annihilation, often arising from aversion to suffering and the desire to escape from unpleasant experiences."]}')

In [None]:
import json
from langchain.docstore.document import Document
wiki_docs_processed = []

for doc in wiki_docs:
    doc = json.loads(doc.page_content)
    metadata = {
        "title": doc['title'],
        "id": doc['id'],
        "source": "Wikipedia",
        "page": 1
    }
    data = ' '.join(doc['paragraphs'])
    wiki_docs_processed.append(Document(page_content=data, metadata=metadata))

In [None]:
wiki_docs_processed[3]

Document(metadata={'title': 'Non-existence Craving', 'id': 'BUD004', 'source': 'Wikipedia', 'page': 1}, page_content='Vibhava-tanha represents the craving for non-existence or self-annihilation, often arising from aversion to suffering and the desire to escape from unpleasant experiences.')

### Load and Process PDF documents

#### Create Chunk Contexts for Contextual Retrieval

![](https://i.imgur.com/cqJqEv0.png)


- Prepend chunk-specific explanatory context to each chunk before creating the vector DB embeddings and TF-IDF vectors.
- Helps with having keywords or phrases in each chunk based on its relevance to the overall document.
- Improves retrieval performance quite a bit, which also helps with the overall RAG generation results because of better context.
- The contextual chunking prompt can be built in various ways depending on your use-case.

In [None]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(
    base_url="https://api.deepseek.com",
    api_key=os.environ["OPENROUTER_API_KEY"],
    model="deepseek-chat",
)

In [None]:
# create chunk context generation chain
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser


def generate_chunk_context(document, chunk):

    chunk_process_prompt = """You are an AI assistant specializing in research paper and books analysis.
                            Your task is to provide brief, relevant context for a chunk of text
                            based on the following research paper and books.

                            Here is the research paper/books:
                            <paper>
                            {paper}
                            </paper>

                            Here is the chunk we want to situate within the whole document:
                            <chunk>
                            {chunk}
                            </chunk>

                            Provide a concise context (3-4 sentences max) for this chunk,
                            considering the following guidelines:

                            - Give a short succinct context to situate this chunk within the overall document
                            for the purposes of improving search retrieval of the chunk.
                            - Answer only with the succinct context and nothing else.
                            - Context should be mentioned like 'Focuses on ....'
                            do not mention 'this chunk or section focuses on...'

                            Context:
                        """

    prompt_template = ChatPromptTemplate.from_template(chunk_process_prompt)

    agentic_chunk_chain = (prompt_template
                                |
                            chatgpt
                                |
                            StrOutputParser())

    context = agentic_chunk_chain.invoke({'paper': document, 'chunk': chunk})

    return context

In [None]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import uuid

def create_contextual_chunks(file_path, chunk_size=3500, chunk_overlap=0):

    print('Loading pages:', file_path)
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()

    print('Chunking pages:', file_path)
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                              chunk_overlap=chunk_overlap)
    doc_chunks = splitter.split_documents(doc_pages)

    print('Generating contextual chunks:', file_path)
    original_doc = '\n'.join([doc.page_content for doc in doc_chunks])
    contextual_chunks = []
    for chunk in doc_chunks:
        chunk_content = chunk.page_content
        chunk_metadata = chunk.metadata
        chunk_metadata_upd = {
            'id': str(uuid.uuid4()),
            'page': chunk_metadata['page'],
            'source': chunk_metadata['source'],
            'title': chunk_metadata['source'].split('/')[-1]
        }
        context = generate_chunk_context(original_doc, chunk_content)
        contextual_chunks.append(Document(page_content=context+'\n'+chunk_content,
                                          metadata=chunk_metadata_upd))
    print('Finished processing:', file_path)
    print()
    return contextual_chunks

In [None]:
# Loading back from the JSON file
def load_chunks_from_json(file_path):
    from langchain.schema import Document
    with open(file_path, 'r') as f:
        data = json.load(f)
    return [Document(page_content=item["page_content"], metadata=item["metadata"]) for item in data]

# Loading chunks
paper_docs = load_chunks_from_json("contextual_chunks.json")

In [None]:
from glob import glob

pdf_files = glob('./rag_docs/*.pdf')
pdf_files

['./rag_docs/norbuqna.pdf', './rag_docs/Deepseekqna.pdf']

In [None]:
#paper_docs = []
for fp in pdf_files:
    paper_docs.extend(create_contextual_chunks(file_path=fp, chunk_size=3500))

Loading pages: ./rag_docs/norbuqna.pdf
Chunking pages: ./rag_docs/norbuqna.pdf
Generating contextual chunks: ./rag_docs/norbuqna.pdf
Finished processing: ./rag_docs/norbuqna.pdf

Loading pages: ./rag_docs/Deepseekqna.pdf
Chunking pages: ./rag_docs/Deepseekqna.pdf
Generating contextual chunks: ./rag_docs/Deepseekqna.pdf
Finished processing: ./rag_docs/Deepseekqna.pdf



In [None]:
import json

# Convert chunks to a serializable format
def save_chunks_to_json(chunks, file_path):
    serializable_chunks = [
        {
            "page_content": chunk.page_content,
            "metadata": chunk.metadata
        } for chunk in chunks
    ]
    with open(file_path, 'w') as f:
        json.dump(serializable_chunks, f, indent=4)

# Saving to a file
save_chunks_to_json(paper_docs, "/content/contextual_chunks.json")

In [None]:
len(paper_docs)

503

In [None]:
paper_docs[0]

Document(metadata={'id': '93e2fe8f-c971-41ac-944e-38f400b09801', 'page': 0, 'source': './rag_docs/dependant-ori-multi-views.pdf', 'title': 'dependant-ori-multi-views.pdf'}, page_content="Focuses on introducing the research paper's thesis and scope regarding Pratityasamutpada (Dependent Origination) in Buddhist philosophy. Presents the title page, authors from Nalanda University, and abstract that outlines the study's examination of causality, liberation, and the twelve links of dependent origination. The abstract emphasizes how understanding Pratityasamutpada provides insights into breaking the cycle of suffering and achieving enlightenment through awareness of impermanence and interconnectivity.\nwww.ijcrt.org                                                              © 2023 IJCRT | Volume 11, Issue 10 October 2023 | ISSN: 2320-2882 \nIJCRT2310409 \nInternational Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org \nd626 \n \n“Analyzing The Concept Of Pratityasamutpada \n(De

### Combine all document chunks in one list

In [None]:
len(wiki_docs_processed)

100

In [None]:
total_docs = wiki_docs_processed + paper_docs
len(total_docs)

603

## Index Document Chunks and Embeddings in Vector DB

Here we initialize a connection to a Chroma vector DB client, and also we want to save to disk, so we simply initialize the Chroma client and pass the directory where we want the data to be saved to.

### Open AI Embedding Models

LangChain enables us to access Open AI embedding models which include the newest models: a smaller and highly efficient `text-embedding-3-small` model, and a larger and more powerful `text-embedding-3-large` model.

In [None]:
!pip install langchain_huggingface

Collecting langchain_huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting tokenizers>=0.19.1 (from langchain_huggingface)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading langchain_huggingface-0.1.2-py3-none-any.whl (21 kB)
Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tokenizers, langchain_huggingface
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.20.3
    Uninstalling tokenizers-0.20.3:
      Successfully uninstalled tokenizers-0.20.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.5.23 requires

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

# Load a pre-trained model from Hugging Face
hf_embed_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Vector DB Indexing for Semantic Search

In [None]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
chroma_db = Chroma.from_documents(documents=total_docs,
                                  collection_name='my_context_db',
                                  embedding=hf_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./my_context_db")

### Load Vector DB from disk

This is just to show once you have a vector database on disk you can just load and create a connection to it anytime

In [None]:
# load from disk
chroma_db = Chroma(persist_directory="./my_context_db",
                   collection_name='my_context_db',
                   embedding_function=hf_embed_model)

In [None]:
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x7970f16914b0>

### Semantic Similarity based Retrieval

We use simple cosine similarity here and retrieve the top 5 similar documents based on the user input query

In [None]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

### BM25 Indexing for Keyword based Retrieval

In [None]:
from langchain.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents=total_docs,
                                              k=5)
bm25_retriever

BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x797100fdf490>, k=5)

## Build Retrieval Strategy

### Build Base Ensemble Retriever

In [None]:
from langchain.retrievers import EnsembleRetriever

# reciprocal rank fusion
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, similarity_retriever],
    weights=[0.5, 0.5]
)
ensemble_retriever

EnsembleRetriever(retrievers=[BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x797100fdf490>, k=5), VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7970f16914b0>, search_kwargs={'k': 5})], weights=[0.5, 0.5])

### Chained Retrieval with Reranker

In [None]:
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain.retrievers import ContextualCompressionRetriever

# download an open-source reranker model - BAAI/bge-reranker-v2-m3
reranker = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-v2-m3")
reranker_compressor = CrossEncoderReranker(model=reranker, top_n=5)
# Retriever 2 - Uses a Reranker model to rerank retrieval results from the previous retriever
final_retriever = ContextualCompressionRetriever(
    base_compressor=reranker_compressor,
    base_retriever=ensemble_retriever
)
final_retriever

config.json:   0%|          | 0.00/795 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

ContextualCompressionRetriever(base_compressor=CrossEncoderReranker(model=HuggingFaceCrossEncoder(client=<sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder object at 0x7971096f1030>, model_name='BAAI/bge-reranker-v2-m3', model_kwargs={}), top_n=5), base_retriever=EnsembleRetriever(retrievers=[BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x79710c86bf70>, k=5), VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x7971096f3e80>, search_kwargs={'k': 5})], weights=[0.5, 0.5]))

In [None]:
from IPython.display import display, Markdown

def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content[:1000]))
        print()

In [None]:
query = "what is dependent origination?"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '781d93f3-7e60-421b-beb6-9f7a6cffa4dd', 'page': 10, 'source': './rag_docs/Early_Meanings_of_Dependent_Origination.pdf', 'title': 'Early_Meanings_of_Dependent_Origination.pdf'}
Content Brief:


Focuses on analyzing the meaning and scope of the abstract formula of dependent-origination in early Buddhism. Examines textual evidence to argue that dependent-origination deals exclusively with mental conditioning through the 12 links, rather than being a general principle about all phenomena. Challenges the common interpretation that dependent-origination refers broadly to how all things exist in dependence on other things.
that is—exists in dependence.’’ We are concerned with the meaning of the
abstract formula quoted above, as well as with the meaning of the term
idappaccayata¯.
I am arguing that the abstract formula of dependent-origination deals exclu-
sivelywiththeprocessencapsulatedinthe12links.WhentheBuddhasays‘‘When
this is, that is, etc.,’’ he is speaking only of mental conditioning, and is saying
absolutely nothing about existence per se. The most signiﬁcant evidence for this
fact is that the phrase ‘‘imasmim: sati idam: hoti…’’ never occurs detached from the
articulation 


Metadata: {'id': '2f182097-e0f6-4ec0-b7a5-9d2dc5f9707e', 'page': 3, 'source': './rag_docs/8678-Article Text-8486-1-10-20110301.pdf', 'title': '8678-Article Text-8486-1-10-20110301.pdf'}
Content Brief:


Focuses on introducing and defining the special theory of dependent origination (pratityasamutpada) in Buddhism, which is presented as one of Buddhism's most central theories. The text appears as the main article in an academic Buddhist studies journal (JIABS Volume 9, 1986) and begins exploring the twelve components (nidanas) of dependent origination, starting with an examination of nescience (avidya). Serves as the foundational opening to a comprehensive analysis of how beings arise in samsara and potentially achieve liberation.
The Special Theory of Pratityasamutpada: 
The Cycle of Dependent Origination1 
by Geshe Lhundub Sopa 
"Whoever sees dependent origination 
sees the Dharma. Whoever sees the 
Dharma sees the Buddha." (Majjhima 
Nikdya, 1:28) 
The idea of dependent origination, the seeing of which is 
said to be coextensive with the seeing of the Dharma itself, is 
clearly one of the most central theories in all of Buddhism. 
There is both the general theory and a special theor


Metadata: {'id': 'e28cf316-597f-4a5e-b975-37053a14d24b', 'page': 2, 'source': './rag_docs/Early_Meanings_of_Dependent_Origination.pdf', 'title': 'Early_Meanings_of_Dependent_Origination.pdf'}
Content Brief:


Focuses on challenging the common scholarly interpretation of dependent-origination (pratītya-samutpāda) in early Buddhism. The author argues against the widespread view that dependent-origination describes how all phenomena exist in dependence on other phenomena. Instead, the author contends that dependent-origination originally dealt solely with mental conditioning and the nature of self, rather than being a general ontological principle about reality.
scholars support such a claim.3 Most popular is the view that the teaching of the
12 links of dependent-origination—which as we will soon see discusses the
workings of the mind—is a ‘‘particular case’’ of the more general principle of
idappaccayata¯ (‘‘dependence’’) and of the abstract formula quoted above.
This paper will claim that the reading of dependent-origination thus
described deviates signiﬁcantly from the initial meaning of the concept.
Although the teaching does have ontological implications, it is not an
ontological teachin


Metadata: {'id': 'ccea3e88-cddf-4253-9a8c-bb9afd885529', 'page': 5, 'source': './rag_docs/MN 121_ The Shorter Discourse on EmptinessBhikkhu Sujato.pdf', 'title': 'MN 121_ The Shorter Discourse on EmptinessBhikkhu Sujato.pdf'}
Content Brief:


Focuses on the progression through meditative states in the Buddha's teaching on emptiness. Describes the transition from perceptions of earth and space to the dimension of infinite consciousness, explaining how the meditator recognizes what is empty and what remains. Details how stress is reduced as coarser perceptions fall away and more subtle states of consciousness emerge.
So ‘suññamidaṃ saññāgataṃ araññasaññāyā’ti pajānāti, ‘suññamidaṃ saññāgataṃ
pathavīsaññāyā’ti pajānāti, ‘atthi cevidaṃ asuññataṃ yadidaṃ—
SC 6.7
Oere is only this that is not emptiness, namely the oneness dependent on
the perception of the dimension of inﬁnite space.’
ākāsānañcāyatanasaññaṃ paṭicca ekattan’ti.
SC 6.8
And so they regard it as empty of what is not there, but as to what remains
they understand that it is present.
Iti yañhi kho tattha na hoti tena taṃ suññaṃ samanupassati, yaṃ pana tattha
avasiṭṭhaṃ hoti taṃ ‘santamidaṃ atthī’ti pajānāti.
SC 6.9
Oat’s how emptiness is born in them—genuine, undistorte


Metadata: {'id': 'ae36dcae-4f00-4c00-ac69-e5ed2602e754', 'page': 7, 'source': './rag_docs/MN 121_ The Shorter Discourse on EmptinessBhikkhu Sujato.pdf', 'title': 'MN 121_ The Shorter Discourse on EmptinessBhikkhu Sujato.pdf'}
Content Brief:


Focuses on the progression through meditative states in the Buddha's teaching on emptiness, specifically the transition from infinite space and consciousness to the dimension of nothingness. This is part of a systematic meditation instruction where each level transcends the previous one by recognizing what is empty and what remains. The passage describes how the meditator moves beyond perceptions of infinite space and consciousness to focus solely on nothingness, while maintaining awareness of what is present and absent.
SC 8.4
‘Here there is no stress due to the perception of the dimension of inﬁnite
space or the perception of the dimension of inﬁnite consciousness.
‘ye assu darathā ākāsānañcāyatanasaññaṃ paṭicca tedha na santi, ye assu darathā
viññāṇañcāyatanasaññaṃ paṭicca tedha na santi, atthi cevāyaṃ darathamattā
yadidaṃ—
SC 8.5
Oere is only this modicum of stress, namely the oneness dependent on
the perception of the dimension of nothingness.’
ākiñcaññāyatanasaññaṃ paṭicca ekatta




## Build the RAG Pipeline

In [None]:
from langchain_core.prompts import ChatPromptTemplate

rag_prompt = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of retrieved context.
                If the answer is not in the context, do not make up answers, just say that you don't know.
                Keep the answer detailed and well formatted based on the information from the context.

                Question:
                {question}

                Context:
                {context}

                Answer:
            """

rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

In [None]:
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_rag_chain = (
    {
        "context": (final_retriever
                      |
                    format_docs),
        "question": RunnablePassthrough()
    }
      |
    rag_prompt_template
      |
    chatgpt
)

In [None]:
from IPython.display import display, Markdown

query = "What is dependent origination?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

Dependent origination, or *pratītyasamutpāda* in Sanskrit, is a central concept in Buddhism that explains the interdependent nature of phenomena, particularly focusing on the process of mental conditioning and the cycle of suffering (*samsara*). According to the context provided, dependent origination is not a general ontological principle about all phenomena, as commonly interpreted, but rather a specific framework that deals exclusively with the mental conditioning process encapsulated in the **12 links of dependent origination**.

### Key Points:
1. **Exclusive Focus on Mental Conditioning**:  
   The abstract formula of dependent origination, often summarized as "When this is, that is; when this arises, that arises; when this ceases, that ceases," is interpreted as referring solely to the 12 links of mental conditioning. These links describe how ignorance (*avidya*) leads to mental formations (*samskara*), which in turn lead to consciousness (*vijnana*), and so on, culminating in aging, death, and suffering. This process is cyclical and perpetuates the cycle of rebirth (*samsara*).

2. **12 Links of Dependent Origination**:  
   The 12 links are:
   - Ignorance (*avidya*)
   - Mental formations (*samskara*)
   - Consciousness (*vijnana*)
   - Name and form (*nama-rupa*)
   - Six sense bases (*sadayatana*)
   - Contact (*sparsha*)
   - Feeling (*vedana*)
   - Craving (*trishna*)
   - Clinging (*upadana*)
   - Becoming (*bhava*)
   - Birth (*jati*)
   - Aging and death (*jara-marana*)

   These links illustrate how mental states and actions condition future experiences, leading to suffering and rebirth.

3. **Not a General Ontological Principle**:  
   The context challenges the widespread view that dependent origination describes how all phenomena exist in dependence on other phenomena. Instead, it argues that the concept originally dealt with the workings of the mind and the nature of the self (or lack thereof). It is not a teaching about the nature of reality in general but rather an inquiry into the process of mental conditioning and the absence of an inherent self (*anatta*).

4. **Special Theory of Dependent Origination**:  
   The "special theory" of dependent origination applies specifically to the genesis of a sentient being in *samsara* and the means of liberation from it. This theory is encapsulated in the 12 links, which explain how beings are bound to the cycle of birth and death and how they can achieve liberation (*nirvana*) by breaking this chain.

5. **Implications for Suffering and Liberation**:  
   Dependent origination is not just a theoretical framework but also a practical tool for understanding the causes of suffering and the path to liberation. By recognizing the interdependent nature of mental states and actions, one can work to eliminate ignorance and craving, thereby breaking the cycle of suffering.

### Conclusion:
Dependent origination is a profound teaching in Buddhism that explains the process of mental conditioning and the cycle of suffering through the 12 links. It is not a general principle about the interdependence of all phenomena but rather a specific framework focused on the mind and the nature of the self. Understanding dependent origination is essential for comprehending the Buddhist path to liberation from suffering.

In [None]:
from IPython.display import display, Markdown

query = "Craving as the attachment to inherent existence versus craving as the desire for sensual pleasure and existence."
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

The question explores the distinction between craving as attachment to inherent existence (emphasized in Mahāyāna Buddhism) and craving as desire for sensual pleasure and existence (emphasized in Theravāda Buddhism). Here’s a detailed explanation based on the provided context:

---

### **Craving as Attachment to Inherent Existence (Mahāyāna Perspective)**
In **Mahāyāna Buddhism**, particularly in the **Madhyamaka** school founded by **Nāgārjuna**, craving is understood as rooted in the misconception of **inherent existence** (*svabhāva*). This view is tied to the concept of **emptiness** (*śūnyatā*), which asserts that all phenomena lack an intrinsic, independent nature. 

- **Key Insight**: Craving arises from **ignorance** (*avidyā*)—the mistaken belief that things, including the self, possess inherent existence. This attachment to the illusion of permanence and independence leads to suffering.
- **Textual Reference**: In the *Mūlamadhyamakakārikā*, Nāgārjuna explains that phenomena are "dependently co-arisen" and thus empty of inherent existence. Craving is a result of failing to recognize this emptiness.
- **Practical Implication**: To overcome craving, one must realize the empty nature of all phenomena, thereby dissolving the attachment to inherent existence.

---

### **Craving as Desire for Sensual Pleasure and Existence (Theravāda Perspective)**
In **Theravāda Buddhism**, craving (*taṇhā*) is categorized into three types, focusing on psychological and experiential aspects:
1. **Kāma-taṇhā**: Craving for sensual pleasures (e.g., pleasant sights, sounds, tastes).
2. **Bhava-taṇhā**: Craving for existence or becoming (desire for continued existence in a particular state).
3. **Vibhava-taṇhā**: Craving for non-existence or annihilation (desire to escape unpleasant experiences).

- **Key Insight**: Craving is identified as the root cause of suffering (*dukkha*) in the **Second Noble Truth**. It perpetuates the cycle of birth and death (*saṃsāra*) by fueling attachment and clinging.
- **Textual Reference**: In the *Dhammacakkappavattana Sutta* (SN 56.11), the Buddha describes craving as the origin of suffering, linking it to renewed existence and the pursuit of pleasure.
- **Practical Implication**: Theravāda practitioners focus on mindfulness, ethical conduct, and insight meditation (*vipassanā*) to observe and let go of craving, recognizing the impermanent and selfless nature of phenomena.

---

### **Comparison and Synthesis**
- **Mahāyāna** delves into the **philosophical underpinnings** of craving, emphasizing the misconception of inherent existence as its root cause.
- **Theravāda** focuses on the **psychological manifestations** of craving, describing what we crave (sensual pleasures, existence, non-existence).
- **Common Ground**: Both traditions agree that craving is central to suffering and must be overcome for liberation. The Mahāyāna view can be seen as a deeper analysis of why we crave, while the Theravāda view describes what we crave.

---

### **Conclusion**
The two perspectives are complementary rather than contradictory. Mahāyāna’s emphasis on **emptiness** provides a profound understanding of the nature of craving, while Theravāda’s focus on **sensual and existential desires** offers practical guidance for overcoming craving in daily life. Both paths ultimately lead to the cessation of suffering through the realization of impermanence and the absence of inherent existence.

In [None]:
from IPython.display import display, Markdown

query = "How is the concept of compassion understood in Mahāyāna and Theravāda Buddhism?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

In both Mahāyāna and Theravāda Buddhism, compassion (*karuṇā*) is a central virtue, but its interpretation, emphasis, and role in practice differ between the two traditions.

### **Theravāda Buddhism**
In Theravāda, compassion is one of the four *Brahmavihāras* (divine abodes), alongside loving-kindness (*mettā*), sympathetic joy (*muditā*), and equanimity (*upekkhā*). It is understood as the desire to alleviate the suffering of others and is cultivated through meditation practices aimed at developing empathy and a heart free from ill-will.

- **Scope**: Compassion in Theravāda is often directed toward individuals and is practiced as part of the path to personal liberation (*arahantship*). The focus is on developing a mind free from hatred and capable of feeling empathy for others' suffering.
- **Practice**: Compassion is cultivated through specific meditation practices, such as *mettā-bhāvanā* (loving-kindness meditation) and *karuṇā-bhāvanā* (compassion meditation), where practitioners extend feelings of goodwill and compassion toward themselves and others.
- **Role in Path**: While compassion is important, it is not the primary focus of the path to enlightenment. Wisdom (*paññā*) is often emphasized more, and compassion is seen as one of several qualities to be developed.

### **Mahāyāna Buddhism**
In Mahāyāna, compassion takes on a more expansive and central role, often equated with the essence of the Bodhisattva path. The Bodhisattva ideal is to attain enlightenment not just for oneself but for the benefit of all sentient beings.

- **Scope**: Compassion in Mahāyāna is universal and all-encompassing. It extends to all beings without exception, and the ultimate expression of compassion is the Bodhisattva's vow to postpone their own final enlightenment until all beings are liberated.
- **Practice**: Compassion is deeply integrated into all aspects of Mahāyāna practice. The cultivation of *bodhicitta* (the mind of enlightenment) is central, involving the aspiration to achieve Buddhahood for the sake of all beings. Practices such as the *Six Perfections* (*Pāramitās*)—giving, ethical conduct, patience, effort, meditation, and wisdom—are all infused with compassion. Specific practices like *tonglen* (giving and taking) in Tibetan Buddhism are designed to cultivate compassion.
- **Figures**: Mahāyāna Buddhism emphasizes compassionate beings like Avalokiteśvara (the Bodhisattva of Compassion), who embodies the ideal of compassion and is often invoked for help and guidance.

### **Key Differences**
1. **Focus**: Theravāda emphasizes personal liberation with compassion as a supportive quality, while Mahāyāna places compassion at the core of the Bodhisattva's mission to save all beings.
2. **Scope**: Compassion in Theravāda is more individual and immediate, whereas in Mahāyāna, it is universal and all-encompassing.
3. **Role in Path**: In Theravāda, compassion is one of many qualities to be developed on the path to enlightenment. In Mahāyāna, compassion is inseparable from the path itself, as the Bodhisattva's primary motivation is the alleviation of all beings' suffering.

Despite these differences, both traditions agree on the fundamental importance of compassion as a key ethical and spiritual virtue in the Buddhist path.

In [None]:
from IPython.display import display, Markdown

query = "What role does the concept of emptiness (śūnyatā) play in Mahāyāna versus Theravāda Buddhism?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

The concept of **emptiness (śūnyatā)** plays a significant role in both **Mahāyāna** and **Theravāda Buddhism**, but its interpretation and emphasis differ between the two traditions.

### **Theravāda Buddhism**
In Theravāda, emptiness is primarily understood as the **absence of a permanent, independent self (anattā)** in all phenomena. This is closely tied to the **three characteristics of existence**: impermanence (*anicca*), suffering (*dukkha*), and non-self (*anattā*). 

- **Emptiness of Self**: Theravāda focuses on the emptiness of a self or soul in persons, emphasizing that what we consider the "self" is actually a collection of impermanent, interdependent processes (the five aggregates: form, sensation, perception, mental formations, and consciousness).
- **Practical Application**: Theravāda uses the concept of emptiness to support **insight meditation (Vipassanā)**, where practitioners observe the impermanent and selfless nature of phenomena to develop wisdom and attain liberation (*Nibbāna*).
- **Key Texts**: The *Cūḷasuññata Sutta* (MN 121) and *Mahāsuññata Sutta* (MN 122) in the *Majjhima Nikāya* discuss emptiness in this context.

### **Mahāyāna Buddhism**
In Mahāyāna, emptiness is a **central philosophical concept**, especially in the **Madhyamaka school** founded by Nāgārjuna. Here, emptiness is understood as the **lack of inherent existence in all phenomena**, not just the self.

- **Emptiness of Phenomena**: Mahāyāna extends the concept of emptiness to all phenomena, asserting that everything is empty of inherent existence and arises dependently (*pratītyasamutpāda*).
- **Philosophical Depth**: The Madhyamaka school elaborates on emptiness using dialectical methods to deconstruct inherent existence, arguing that true reality is beyond conceptual elaboration.
- **Practical Application**: In Mahāyāna, understanding emptiness is crucial for developing **bodhicitta** (the mind of enlightenment) and practicing the **Bodhisattva path**. It helps practitioners transcend dualistic thinking and cultivate compassion by seeing the interconnectedness of all beings.
- **Key Texts**: The *Heart Sutra* famously declares, "Form is emptiness, emptiness is form," and Nāgārjuna's *Mūlamadhyamakakārikā* explores the relationship between emptiness and dependent origination.

### **Key Differences**
1. **Scope**: Theravāda focuses on the **emptiness of self (anattā)**, while Mahāyāna extends this to the **emptiness of all phenomena (śūnyatā)**.
2. **Philosophical Depth**: Mahāyāna provides a more detailed and philosophical exploration of emptiness, particularly through the Madhyamaka school.
3. **Practical Emphasis**: In Theravāda, emptiness is primarily a tool for **insight and liberation**. In Mahāyāna, it is integral to the **Bodhisattva path**, emphasizing **compassion** and the **interconnectedness of all beings**.

### **Conclusion**
Both traditions use the concept of emptiness to help practitioners overcome attachment, develop wisdom, and realize the ultimate nature of reality. However, **Theravāda** emphasizes the **emptiness of self** within the framework of impermanence and non-self, while **Mahāyāna** expands emptiness to **all phenomena**, integrating it into a broader philosophical and soteriological framework centered on compassion and the Bodhisattva path.

In [None]:
from IPython.display import display, Markdown

query = "What is the nature of enlightenment according to Mahāyāna Buddhism versus Theravāda Buddhism?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

The nature of enlightenment in **Mahāyāna Buddhism** and **Theravāda Buddhism** differs in scope, path, and philosophical emphasis, though both traditions share the ultimate goal of liberation from suffering and the cycle of rebirth (saṃsāra). Below is a detailed comparison based on the provided context:

---

### **Theravāda Buddhism**
1. **Goal of Enlightenment**:  
   - The ultimate goal is **Nibbāna (Nirvāṇa)**, the cessation of suffering and the end of the cycle of rebirth.
   - Enlightenment is achieved through **individual liberation**, focusing on personal efforts to eradicate defilements (kilesas).

2. **Path to Enlightenment**:  
   - The **Arahant path** is emphasized, where one becomes an **Arahant**—a being who has eradicated all defilements and attained liberation.
   - The path involves progressing through the **four stages of enlightenment**: Sotāpanna (stream-enterer), Sakadāgāmī (once-returner), Anāgāmī (non-returner), and Arahant.
   - Enlightenment is achieved through **direct insight** into the three characteristics of existence: impermanence (anicca), suffering (dukkha), and non-self (anattā).

3. **Philosophical Emphasis**:  
   - Theravāda focuses on **personal liberation** and the **Noble Eightfold Path** as the means to achieve enlightenment.
   - The Dhammapada encapsulates this view: "The cessation of desire, of hatred and of delusion: this indeed is called Nibbāna."

---

### **Mahāyāna Buddhism**
1. **Goal of Enlightenment**:  
   - The ultimate goal is **Buddhahood**, becoming a fully enlightened Buddha for the benefit of all sentient beings.
   - Enlightenment is not just personal liberation but also involves **universal liberation**, helping all beings achieve enlightenment.

2. **Path to Enlightenment**:  
   - The **Bodhisattva path** is central, where one vows to attain enlightenment for the sake of all beings.
   - The path emphasizes **bodhicitta** (the aspiration for enlightenment) and the **Six Perfections** (e.g., generosity, ethical conduct, patience).
   - Enlightenment involves realizing the **two-fold emptiness**: the emptiness of self and phenomena (śūnyatā).

3. **Philosophical Emphasis**:  
   - Mahāyāna introduces concepts like **Buddha-nature**, the idea that all beings possess the innate potential for enlightenment.
   - It emphasizes **non-duality**, transcending dualistic thinking to realize the non-dual nature of reality.
   - The **Trikāya doctrine** describes the three bodies of the Buddha: Dharmakāya (truth body), Sambhogakāya (enjoyment body), and Nirmāṇakāya (emanation body).
   - The Heart Sutra encapsulates this view: "Form is emptiness, emptiness is form. ... There is no attainment, with nothing to attain."

---

### **Key Differences**
1. **Scope of Enlightenment**:  
   - Theravāda focuses on **individual liberation**, while Mahāyāna emphasizes **universal liberation** and the realization of the ultimate nature of reality.

2. **Path and Practices**:  
   - Theravāda emphasizes the **Arahant path** and the **Noble Eightfold Path**, while Mahāyāna emphasizes the **Bodhisattva path** and practices like developing bodhicitta and the Six Perfections.

3. **Philosophical Elaboration**:  
   - Mahāyāna places greater emphasis on **non-duality**, **Buddha-nature**, and more extensive philosophical systems around the nature of enlightenment.

---

### **Shared Goal**
Both traditions ultimately aim to overcome suffering and realize the true nature of reality, reflecting their distinct philosophical and ethical approaches. While Theravāda focuses on personal liberation, Mahāyāna extends this to include the liberation of all beings and the realization of ultimate reality.

In [None]:
from IPython.display import display, Markdown

query = "How would a practitioner in the Theravāda tradition approach meditation compared to a Mahāyāna practitioner?"
result = qa_rag_chain.invoke(query)
display(Markdown(result.content))

A practitioner in the Theravāda tradition approaches meditation with a focus on **insight (vipassanā)** and **concentration (samatha)** as the primary means to achieve **personal liberation (Nibbāna)**. Theravāda meditation emphasizes:

1. **Samatha (Calm Abiding)**:
   - **Purpose**: To develop deep states of concentration and mental tranquility.
   - **Practices**: Common techniques include mindfulness of breathing (ānāpānasati) and cultivating the four divine abodes (brahmavihāras): loving-kindness (mettā), compassion (karuṇā), sympathetic joy (muditā), and equanimity (upekkhā).
   - **Outcome**: Achieving jhānas (meditative absorptions), which are states of deep mental focus and stillness.

2. **Vipassanā (Insight)**:
   - **Purpose**: To develop insight into the true nature of reality, particularly the three marks of existence: impermanence (anicca), suffering (dukkha), and not-self (anattā).
   - **Practices**: Mindfulness (sati) is central, often practiced through the Four Foundations of Mindfulness (satipaṭṭhāna): mindfulness of the body, feelings, mind, and mental objects.
   - **Outcome**: Direct realization of the nature of phenomena, leading to wisdom (paññā) and ultimately Nibbāna.

3. **Integration**:
   - **Path**: The Noble Eightfold Path guides the practitioner, with right concentration (sammā samādhi) and right mindfulness (sammā sati) being key components.
   - **Goal**: Personal liberation and the cessation of suffering through the eradication of defilements (kilesa).

In contrast, a **Mahāyāna practitioner** integrates meditation with the **Bodhisattva path**, emphasizing **bodhicitta** (the mind of enlightenment) and the realization of **emptiness (śūnyatā)**. Mahāyāna meditation includes:

1. **Samatha and Vipassanā**:
   - Similar to Theravāda, but often with different objects of meditation.

2. **Bodhicitta Cultivation**:
   - Practices like tonglen (giving and taking) to develop the aspiration for enlightenment for the benefit of all beings.

3. **Emptiness Meditation**:
   - Contemplation on the empty nature of phenomena, as described in texts like the Heart Sutra.

4. **Visualization Practices**:
   - Especially in Vajrayāna traditions, practitioners may visualize deities or mandalas.

5. **Kōan Practice**:
   - In Zen traditions, practitioners meditate on paradoxical statements or questions.

6. **Mindfulness in Action**:
   - Emphasis on maintaining meditative awareness during daily activities.

7. **Guru Yoga**:
   - In some traditions, meditation on one's teacher as an embodiment of enlightenment.

8. **Group Practice**:
   - While individual practice is important, there is often more emphasis on group meditation sessions.

### Key Differences:
1. **Goal Orientation**: Theravāda meditation aims at **individual liberation**, while Mahāyāna emphasizes **liberation for the benefit of all beings**.
2. **Objects of Meditation**: Mahāyāna includes a wider range of meditation objects, such as visualizations and abstract concepts.
3. **Philosophical Framework**: Theravāda is framed within the **Four Noble Truths** and **Noble Eightfold Path**, while Mahāyāna emphasizes concepts like **emptiness** and **Buddha-nature**.
4. **Role of Devotion**: Mahāyāna incorporates more devotional elements, especially in Pure Land traditions.
5. **Scope of Practice**: Mahāyāna integrates meditation more fully into daily life activities.

In summary, while both traditions share foundational practices like samatha and vipassanā, Theravāda focuses on personal liberation through insight and concentration, whereas Mahāyāna integrates meditation with the Bodhisattva path, emphasizing universal liberation, emptiness, and diverse meditation techniques.