# RAGBuilder Optimization Demo

In [None]:
!uv venv

In [None]:
!source .venv/bin/activate

In [None]:
# First clone the RAGBuilder repo
!uv pip install ragbuilder

## Quickstart - Basic Configuration

In [1]:
from ragbuilder import RAGBuilder

In [2]:
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
llm=AzureChatOpenAI(model='gpt-4o-mini', temperature=0.2)
emb=AzureOpenAIEmbeddings(model='text-embedding-3-large')

In [7]:
builder = RAGBuilder.from_source_with_defaults(
    input_source='https://lilianweng.github.io/posts/2023-06-23-agent/',
    test_dataset="rag_test_data_lilianweng_gpt-4o_1721032414.736622.csv",
    default_llm=llm,
    default_embeddings=emb,
    n_trials=5
)

In [8]:
results =builder.optimize()

Output()

[I 2024-12-31 22:20:39,355] A new study created in RDB with name: data_ingest_1735663839280


  0%|          | 0/5 [00:00<?, ?it/s]

Output()

[I 2024-12-31 22:20:47,057] Trial 0 finished with value: 0.8156415328383447 and parameters: {'chunk_size': 1000}. Best is trial 0 with value: 0.8156415328383447.


[I 2024-12-31 22:20:50,786] Trial 1 finished with value: 0.7929137279589971 and parameters: {'chunk_size': 3000}. Best is trial 0 with value: 0.8156415328383447.


Output()

[I 2024-12-31 22:20:56,317] Trial 2 finished with value: 0.7926271418730417 and parameters: {'chunk_size': 2500}. Best is trial 0 with value: 0.8156415328383447.


[I 2024-12-31 22:20:56,336] Trial 3 finished with value: 0.7926271418730417 and parameters: {'chunk_size': 2500}. Best is trial 0 with value: 0.8156415328383447.


[I 2024-12-31 22:20:56,351] Trial 4 finished with value: 0.7929137279589971 and parameters: {'chunk_size': 3000}. Best is trial 0 with value: 0.8156415328383447.


[I 2024-12-31 22:20:59,424] A new study created in RDB with name: retriever_1735663859509


  0%|          | 0/5 [00:00<?, ?it/s]

Loading TransformerRanker model BAAI/bge-reranker-base


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:21:10,368] Trial 0 finished with value: 0.9999999999716667 and parameters: {'n_retrievers': 1, 'retriever_0_index': 0, 'final_k': 5}. Best is trial 0 with value: 0.9999999999716667.


Loading TransformerRanker model BAAI/bge-reranker-base


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:21:17,823] Trial 1 finished with value: 0.9411764705636293 and parameters: {'n_retrievers': 2, 'retriever_0_index': 1, 'retriever_1_index': 0, 'final_k': 3}. Best is trial 0 with value: 0.9999999999716667.


Loading TransformerRanker model BAAI/bge-reranker-base


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:21:25,303] Trial 2 finished with value: 0.8749999999702256 and parameters: {'n_retrievers': 1, 'retriever_0_index': 1, 'final_k': 3}. Best is trial 0 with value: 0.9999999999716667.


Loading TransformerRanker model BAAI/bge-reranker-base


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:21:32,750] Trial 3 finished with value: 0.9411764705611687 and parameters: {'n_retrievers': 1, 'retriever_0_index': 0, 'final_k': 3}. Best is trial 0 with value: 0.9999999999716667.


[I 2024-12-31 22:21:32,770] Trial 4 finished with value: 0.8749999999702256 and parameters: {'n_retrievers': 1, 'retriever_0_index': 1, 'final_k': 3}. Best is trial 0 with value: 0.9999999999716667.


Loading TransformerRanker model BAAI/bge-reranker-base


Map:   0%|          | 0/15 [00:00<?, ? examples/s]

Evaluating:   0%|          | 0/15 [00:00<?, ?it/s]

In [9]:
response = results.invoke("What is HNSW?")

In [10]:
print(f"Question: {response['question']}\nAnswer: {response['answer']}")

Question: What is HNSW?
Answer: HNSW (Hierarchical Navigable Small World) is a data structure inspired by small world networks, where most nodes can be reached from any other node within a small number of steps. It builds hierarchical layers of small-world graphs, with the bottom layers containing actual data points. The middle layers create shortcuts to speed up search. During a search, HNSW starts from a random node in the top layer and navigates towards the target, moving down layers until it reaches the bottom layer. Moves in the upper layers can cover large distances in the data space, while moves in the lower layers refine the search quality.


In [11]:
import json
print(json.dumps(results.summary(), indent=2))

{
  "data_ingest": {
    "score": 0.8156415328383447,
    "optimization_time": 16.987402,
    "config": {
      "document_loader": "unstructured",
      "chunking_strategy": "RecursiveCharacterTextSplitter",
      "chunk_size": 1000,
      "chunk_overlap": 100,
      "embedding_model": "EmbeddingType.HUGGINGFACE:mixedbread-ai/mxbai-embed-large-v1",
      "vector_database": "chroma"
    },
    "metrics": {
      "avg_latency": 127.93433333333333,
      "error_rate": 0.0
    }
  },
  "retrieval": {
    "score": 0.9999999999716667,
    "optimization_time": 33.336164,
    "config": {
      "retrievers": [
        "vector_similarity"
      ],
      "top_k": 5,
      "rerankers": [
        "BAAI/bge-reranker-base"
      ]
    },
    "metrics": {
      "avg_latency": 451.14433333333335,
      "error_rate": 0.0
    }
  },
  "generation": {
    "score": 0.7403833697261465,
    "optimization_time": 68.449513,
    "config": {
      "model": "AzureChatOpenAI:gpt-4o-mini",
      "temperature": 0.2,

## Advanced Configuration

In [None]:
!uv pip install pymupdf pypdf

In [12]:
adv_builder = RAGBuilder(
    default_llm=llm,
    default_embeddings=emb,
    n_trials=5
)

In [13]:
from ragbuilder.config import (
    DataIngestOptionsConfig,
    RetrievalOptionsConfig,
    GenerationOptionsConfig
)

We start by defining a data ingestion configuration with more options (Eg: multiple parsers/loaders, knowledge graph, etc.)

In [24]:
data_ingest_config = DataIngestOptionsConfig(
    input_source="lillog_agents.pdf",
    document_loaders=[
        {"type": "pymupdf"},
        {"type": "unstructured"},
        {"type": "pypdf"}
    ],
    chunking_strategies=[
        {
            "type": "RecursiveCharacterTextSplitter",
            "chunker_kwargs": {"separators": ["\n\n", "\n", " ", ""]}
        }
    ],
    chunk_size={
        "min": 500,
        "max": 3000,
        "stepsize": 500
    },
    chunk_overlap=[100],
    embedding_models=[
        {
            "type": "azure_openai",
            "model_kwargs": {
                "model": "text-embedding-3-large",
            }
        }
    ],
    vector_databases=[
        {
            "type": "chroma",
            "vectordb_kwargs": {
                'persist_directory': 'chroma_sample2123',
                'collection_metadata': {'hnsw:space': 'cosine'}
            }
        }
    ],
    graph={
        "type": "neo4j", # Note that you will need to have a neo4j instance running. 
                         # There's a docker compose file in the repo.
                         # You will also need to set the NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD environment variables.
    },
    optimization={
        "n_trials": 10,
        "n_jobs": 1,
        "study_name": "lillog_agents_study",
        "optimization_direction": "maximize"
    },
    evaluation_config={
        "type": "similarity",
        "test_dataset": "rag_test_data_lilianweng_gpt-4o_1721032414.736622.csv",
        "evaluator_kwargs": {
            "top_k": 3,
            "relevance_threshold": 0.2,   # Minimum relevance to consider for scoring
            "position_weights": [1.0, 0.5, 0.3]  # More weight to top results
        }
    },
    database_logging=True,
    database_path="eval.db"
)

Now let's run only the data-ingestion related optimization

In [25]:
adv_builder.optimize_data_ingest(data_ingest_config)

[I 2024-12-31 22:29:27,421] A new study created in RDB with name: lillog_agents_study


  0%|          | 0/10 [00:00<?, ?it/s]

Output()

[I 2024-12-31 22:29:36,421] Trial 0 finished with value: 0.6993674072954389 and parameters: {'document_loader_index': 2, 'chunk_size': 500}. Best is trial 0 with value: 0.6993674072954389.


Output()

[I 2024-12-31 22:29:40,881] Trial 1 finished with value: 0.7543484701703952 and parameters: {'document_loader_index': 2, 'chunk_size': 3000}. Best is trial 1 with value: 0.7543484701703952.


Output()

[I 2024-12-31 22:29:45,793] Trial 2 finished with value: 0.7606029867262851 and parameters: {'document_loader_index': 1, 'chunk_size': 1500}. Best is trial 2 with value: 0.7606029867262851.


Output()

[I 2024-12-31 22:29:54,125] Trial 3 finished with value: 0.6949703637441452 and parameters: {'document_loader_index': 0, 'chunk_size': 500}. Best is trial 2 with value: 0.7606029867262851.


Output()

[I 2024-12-31 22:29:59,009] Trial 4 finished with value: 0.7509295219266461 and parameters: {'document_loader_index': 2, 'chunk_size': 1500}. Best is trial 2 with value: 0.7606029867262851.


Output()

[I 2024-12-31 22:30:07,339] Trial 5 finished with value: 0.7083337836795384 and parameters: {'document_loader_index': 1, 'chunk_size': 500}. Best is trial 2 with value: 0.7606029867262851.


[I 2024-12-31 22:30:07,363] Trial 6 finished with value: 0.7606029867262851 and parameters: {'document_loader_index': 1, 'chunk_size': 1500}. Best is trial 2 with value: 0.7606029867262851.


Output()

[I 2024-12-31 22:30:13,127] Trial 7 finished with value: 0.7347048121641531 and parameters: {'document_loader_index': 1, 'chunk_size': 1000}. Best is trial 2 with value: 0.7606029867262851.


Output()

[I 2024-12-31 22:30:17,423] Trial 8 finished with value: 0.7669465389795974 and parameters: {'document_loader_index': 1, 'chunk_size': 3000}. Best is trial 8 with value: 0.7669465389795974.


[I 2024-12-31 22:30:17,452] Trial 9 finished with value: 0.7543484701703952 and parameters: {'document_loader_index': 2, 'chunk_size': 3000}. Best is trial 8 with value: 0.7669465389795974.


Output()

100%|██████████| 14/14 [01:18<00:00,  5.57s/it]


DataIngestResults(best_config=DataIngestConfig(input_source='lillog_agents.pdf', document_loader=LoaderConfig(type=<ParserType.UNSTRUCTURED: 'unstructured'>, loader_kwargs=None, custom_class=None), chunking_strategy=ChunkingStrategyConfig(type=<ChunkingStrategy.RECURSIVE: 'RecursiveCharacterTextSplitter'>, chunker_kwargs={'separators': ['\n\n', '\n', ' ', '']}, custom_class=None), chunk_size=3000, chunk_overlap=100, embedding_model=EmbeddingConfig(type=<EmbeddingType.AZURE_OPENAI: 'azure_openai'>, model_kwargs={'model': 'text-embedding-3-large'}, custom_class=None), vector_database=VectorDBConfig(type=<VectorDatabase.CHROMA: 'chroma'>, vectordb_kwargs={'persist_directory': 'chroma_sample2123/9', 'collection_metadata': {'hnsw:space': 'cosine'}}, custom_class=None), sampling_rate=None), best_score=0.7669465389795974, best_pipeline=<ragbuilder.data_ingest.pipeline.DataIngestPipeline object at 0x56a02c740>, n_trials=10, completed_trials=10, optimization_time=50.021133, avg_latency=349.0873

Excellent! We now have an optimized vector store with optimized retrievability. 
Now let's proceed to define the retrieval options config:

In [26]:
retrieval_config = RetrievalOptionsConfig(
    retrievers=[
        {
            "type": "vector_similarity", # Vector similarity search
            "retriever_k": [20],
            "weight": 0.25
        },
        {
            "type": "bm25",  # BM25 keyword search retriever
            "retriever_k": [20],
            "weight": 0.25
        },
        {
            "type": "parent_doc_large", # Parent doc retriever with large chunks as parent docs
            "retriever_k": [20],
            "weight": 0.25
        },
        {
            "type": "graph",  # Graph retriever using the knowledge graph
            "retriever_k": [20],
            "weight": 0.25
        }
    ],
    rerankers=[
        {"type": "BAAI/bge-reranker-base"},
        {"type": "mixedbread-ai/mxbai-rerank-large-v1"}
    ],
    top_k=[3, 5, 10]
)

In [27]:
adv_builder.optimize_retrieval(retrieval_config)

[I 2024-12-31 22:32:30,424] A new study created in RDB with name: retriever_1735664550667


  0%|          | 0/5 [00:00<?, ?it/s]

Loading TransformerRanker model mixedbread-ai/mxbai-rerank-large-v1


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:33:02,285] Trial 0 finished with value: 0.9787234042476052 and parameters: {'n_retrievers': 3, 'retriever_0_index': 2, 'retriever_1_index': 1, 'retriever_2_index': 3, 'use_rerankers': True, 'reranker_index': 1, 'final_k': 10}. Best is trial 0 with value: 0.9787234042476052.


Loading TransformerRanker model mixedbread-ai/mxbai-rerank-large-v1


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:33:17,136] Trial 1 finished with value: 0.9999999999805556 and parameters: {'n_retrievers': 3, 'retriever_0_index': 1, 'retriever_1_index': 1, 'retriever_2_index': 3, 'use_rerankers': True, 'reranker_index': 1, 'final_k': 3}. Best is trial 1 with value: 0.9999999999805556.


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:33:22,825] Trial 2 finished with value: 0.9999999999736111 and parameters: {'n_retrievers': 2, 'retriever_0_index': 1, 'retriever_1_index': 1, 'use_rerankers': False, 'final_k': 10}. Best is trial 1 with value: 0.9999999999805556.


Loading TransformerRanker model BAAI/bge-reranker-base


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:33:35,182] Trial 3 finished with value: 0.9999999999833333 and parameters: {'n_retrievers': 3, 'retriever_0_index': 1, 'retriever_1_index': 3, 'retriever_2_index': 0, 'use_rerankers': True, 'reranker_index': 0, 'final_k': 3}. Best is trial 3 with value: 0.9999999999833333.


Loading TransformerRanker model BAAI/bge-reranker-base


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

[I 2024-12-31 22:33:45,328] Trial 4 finished with value: 0.9999999999805556 and parameters: {'n_retrievers': 1, 'retriever_0_index': 2, 'use_rerankers': True, 'reranker_index': 0, 'final_k': 3}. Best is trial 3 with value: 0.9999999999833333.


Loading TransformerRanker model BAAI/bge-reranker-base


RetrievalResults(best_config=RetrievalConfig(retrievers=[BaseRetrieverConfig(type=<RetrieverType.BM25: 'bm25'>, retriever_kwargs={}, custom_class=None, retriever_k=[20], weight=0.25), BaseRetrieverConfig(type=<RetrieverType.GRAPH_RETRIEVER: 'graph'>, retriever_kwargs={}, custom_class=None, retriever_k=[20], weight=0.25), BaseRetrieverConfig(type=<RetrieverType.VECTOR_SIMILARITY: 'vector_similarity'>, retriever_kwargs={}, custom_class=None, retriever_k=[20], weight=0.25)], rerankers=[RerankerConfig(type=<RerankerType.BGE_BASE: 'BAAI/bge-reranker-base'>, reranker_kwargs={}, custom_class=None)], top_k=3), best_score=0.9999999999833333, best_pipeline=<ragbuilder.retriever.pipeline.RetrieverPipeline object at 0x307f1ad80>, n_trials=5, completed_trials=5, optimization_time=74.894355, avg_latency=1613.4706666666668, error_rate=0.0)

In [29]:
adv_builder.optimized_retriever.invoke("What is HNSW?")

[Document(metadata={'source': 'lillog_agents.pdf', 'relevance_score': 1.8486328125}, page_content="experiences) and semantic memory (facts and concepts).\n\nImplicit / procedural memory: This type of memory is unconscious and involves skills and\n\nroutines that are performed automatically, like riding a bike or typing on a keyboard.\n\nFig. 8. Categorization of human memory.\n\nWe can roughly consider the following mappings:\n\nSensory memory as learning embedding representations for raw inputs, including text, image or\n\nother modalities;\n\nShort-term memory as in-context learning. It is short and finite, as it is restricted by the finite\n\ncontext window length of Transformer.\n\nLong-term memory as the external vector store that the agent can attend to at query time,\n\naccessible via fast retrieval.\n\nLil'Log\n\nMaximum Inner Product Search (MIPS)\n\nThe external memory can alleviate the restriction of finite attention span. A standard practice is to\n\nsave the embedding repr

Great! We now have an optimized retrieval setup with optimized F1 score (Precision & recall).

Now let's proceed to define the LLM generation options config:

In [32]:
from ragbuilder.config.base import LLMConfig

gen_config = GenerationOptionsConfig(
    llms = [
        LLMConfig(type="azure_openai", model_kwargs={'model':'gpt-4o-mini', 'temperature':0.2}),
        LLMConfig(type="azure_openai", model_kwargs={'model':'gpt-4o', 'temperature':0.2}),
    ],
    optimization={
        "n_trials": 12, 
        "n_jobs": 1,
        "study_name": "lillog_agents_study",
        "optimization_direction": "maximize"
    },
    evaluation_config={"type": "ragas"},
)

In [33]:
adv_builder.optimize_generation(gen_config)

Map:   0%|          | 0/33 [00:00<?, ? examples/s]

Evaluating:   0%|          | 0/33 [00:00<?, ?it/s]

GenerationResults(best_config=GenerationConfig(llm=LLMConfig(type=<LLMType.AZURE_OPENAI: 'azure_openai'>, model_kwargs={'model': 'gpt-4o', 'temperature': 0.2}), prompt_template='You are a helpful assistant. Answer any questions solely based on the context provided below. \nIf the provided context does not have the relevant facts to answer the question, say "I don\'t know."\n\n<context>\n{context}\n</context>\n', prompt_key='default_informative'), best_score=0.8154456849114275, best_pipeline={
  context: ContextualCompressionRetriever(base_compressor=DocumentCompressorPipeline(transformers=[RerankerLangChainCompressor(model=<rerankers.models.transformer_ranker.TransformerRanker object at 0x569de2480>, kwargs={}, k=3)]), base_retriever=EnsembleRetriever(retrievers=[BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x56a0d2fc0>), Neo4jGraphRetriever(graph=<langchain_community.graphs.neo4j_graph.Neo4jGraph object at 0x3744347a0>, embeddings=AzureOpenAIEmbeddings(client=<openai.resour

In [34]:
adv_results = adv_builder.optimization_results

In [35]:
response = adv_results.invoke("What is HNSW?")

In [36]:
print(f"Question: {response['question']}\nAnswer: {response['answer']}")

Question: What is HNSW?
Answer: HNSW (Hierarchical Navigable Small World) is inspired by the idea of small world networks where most nodes can be reached by any other nodes within a small number of steps, such as the “six degrees of separation” feature of social networks. HNSW builds hierarchical layers of these small-world graphs, where the bottom layers contain the actual data points. The layers in the middle create shortcuts to speed up search. When performing a search, HNSW starts from a random node in the top layer and navigates towards the target. When it can’t get any closer, it moves down to the next layer, until it reaches the bottom layer. Each move in the upper layers can potentially cover a large distance in the data space, and each move in the lower layers refines the search quality.


In [38]:
print(json.dumps(adv_results.summary(), indent=2))

{
  "data_ingest": {
    "score": 0.7669465389795974,
    "optimization_time": 50.021133,
    "config": {
      "document_loader": "unstructured",
      "chunking_strategy": "RecursiveCharacterTextSplitter",
      "chunk_size": 3000,
      "chunk_overlap": 100,
      "embedding_model": "EmbeddingType.AZURE_OPENAI:text-embedding-3-large",
      "vector_database": "chroma"
    },
    "metrics": {
      "avg_latency": 349.08733333333333,
      "error_rate": 0.0
    }
  },
  "retrieval": {
    "score": 0.9999999999833333,
    "optimization_time": 74.894355,
    "config": {
      "retrievers": [
        "bm25",
        "graph",
        "vector_similarity"
      ],
      "top_k": 3,
      "rerankers": [
        "BAAI/bge-reranker-base"
      ]
    },
    "metrics": {
      "avg_latency": 1613.4706666666668,
      "error_rate": 0.0
    }
  },
  "generation": {
    "score": 0.8154456849114275,
    "optimization_time": 234.716064,
    "config": {
      "model": "LLMType.AZURE_OPENAI:gpt-4o",
  

We can also access individual module components**

In [39]:
adv_results.data_ingest.best_index.similarity_search_with_relevance_scores("What is HNSW?")

[(Document(metadata={'doc_id': 'b841a780-cbf5-40fc-9921-029b8ba33b8b', 'source': 'lillog_agents.pdf'}, page_content="Sensory memory typically only lasts for up to a few seconds. Subcategories include iconic\n\nmemory (visual), echoic memory (auditory), and haptic memory (touch).\n\n\x00. Short-Term Memory (STM) or Working Memory: It stores information that we are currently\n\naware of and needed to carry out complex cognitive tasks such as learning and reasoning.\n\nShort-term memory is believed to have the capacity of about 7 items (Miller 1956) and lasts for\n\n20-30 seconds.\n\n\x00. Long-Term Memory (LTM): Long-term memory can store information for a remarkably long\n\ntime, ranging from a few days to decades, with an essentially unlimited storage capacity. There\n\nare two subtypes of LTM:\n\nExplicit / declarative memory: This is memory of facts and events, and refers to those\n\nmemories that can be consciously recalled, including episodic memory (events and\n\nexperiences) and 

In [40]:
adv_results.retrieval.invoke("What is HNSW?")

[Document(metadata={'source': 'lillog_agents.pdf', 'relevance_score': 1.8486328125}, page_content="experiences) and semantic memory (facts and concepts).\n\nImplicit / procedural memory: This type of memory is unconscious and involves skills and\n\nroutines that are performed automatically, like riding a bike or typing on a keyboard.\n\nFig. 8. Categorization of human memory.\n\nWe can roughly consider the following mappings:\n\nSensory memory as learning embedding representations for raw inputs, including text, image or\n\nother modalities;\n\nShort-term memory as in-context learning. It is short and finite, as it is restricted by the finite\n\ncontext window length of Transformer.\n\nLong-term memory as the external vector store that the agent can attend to at query time,\n\naccessible via fast retrieval.\n\nLil'Log\n\nMaximum Inner Product Search (MIPS)\n\nThe external memory can alleviate the restriction of finite attention span. A standard practice is to\n\nsave the embedding repr

In [41]:
import nest_asyncio
nest_asyncio.apply()

In [42]:
builder.serve()
# This will return API endpoint
# localhost:60001/retreive -- retriever only
# localhost:60001/invoke -- RAG

INFO:     Started server process [3063]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8005 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [3063]


KeyboardInterrupt: 