# Note: Sub-Components, bottlenecks are explained in-depth in the attached documentation.

In [1]:
from core.embeddings import fetch_embedding, build_vectorstore
from core.model import load_llm
from core.graph import build_graph
import time

  from .autonotebook import tqdm as notebook_tqdm


Step 1: Fetch or create vector database and get embeddings from HuggingFace.

In [2]:
start_time = time.time()
cached_embedding = fetch_embedding()
print(f"Execution time: {time.time() - start_time} second(s).")

  embeddings = HuggingFaceEmbeddings(


Execution time: 2.4252164363861084 second(s).


  _warn_about_sha1_encoder()


In [4]:
db = build_vectorstore(cached_embedding, backend=["chroma"])

Loading ChromaDB from 'chroma_db'...


  db = Chroma(persist_directory=VECTOR_DB_DIR, embedding_function=cached_embedding)


In Step 2 load components as retriever and an LLM that is answering questions.

*Note: In production it should load multiple LLM Agents, not a single one.*

In [5]:
retriever = db.as_retriever()

In [6]:
start_time = time.time()
llm = load_llm()
print(f"LLM was loaded in {time.time() - start_time} second(s).")

Loading model meta-llama/Llama-3.2-3B-Instruct...


`torch_dtype` is deprecated! Use `dtype` instead!
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00,  2.41s/it]
Device set to use cuda:0


LLM was loaded in 6.163252830505371 second(s).


  return HuggingFacePipeline(pipeline=prepare_pipeline(model, tokenizer))


After loading components, include Controller and build the RAG Chain

In [7]:
rag_chain = build_graph(retriever, llm)

In [8]:
def rag_chain_executor(rag_chain, question: str):
    state = {"question": question, "context": [], "answer": ""}
    final_state = rag_chain.invoke(state)
    return final_state["answer"]

# Demo:

In [9]:
question = "How are you?"

print("Q:", question)
start_time = time.time()
print("A:", rag_chain_executor(rag_chain, question))
print(f"Reponse time: {time.time() - start_time} second(s).")

Q: How are you?
[Controller] No need for retrieval.
[Chatbot Node] Generating answer...
A: I'm doing well, thank you for asking.
Reponse time: 1.4096548557281494 second(s).


In [10]:
# Agentic demonstration. Should be more sophisticated in production.

print("Short question (no retrieval):")
start_time = time.time()
print(rag_chain_executor(rag_chain, "Count to three."))
print(f"Reponse time Q1: {time.time() - start_time} second(s).")

print("\nLonger question (uses retrieval):")
start_time = time.time()
print(rag_chain_executor(rag_chain, "What is the exact number of pi?"))
print(f"Reponse time Q1: {time.time() - start_time} second(s).")

Short question (no retrieval):
[Controller] No need for retrieval.
[Chatbot Node] Generating answer...
One, two, three.
Reponse time Q1: 0.6226944923400879 second(s).

Longer question (uses retrieval):
[Controller] Use retrieval.
[Retriever] Searching for documents...
[Chatbot Node] Generating answer...
Pi is an irrational number, which means it is not equal to the quotient of any two whole numbers. In other words, it cannot be expressed as a finite decimal or fraction. Its decimal representation goes on forever without repeating, and it is approximately equal to 3.14159 (but this is an approximation, not the exact value). Pi is a transcendental number, which means it is not the solution of any polynomial equation with rational coefficients. This means that there is no simple formula that can be used to calculate pi exactly. However, mathematicians have been able to calculate pi to billions of digits using advanced computer algorithms and mathematical techniques. The exact value of pi 

In [11]:
for q in ["What is biology?", "How much legs do a dog have?", "What animal gives sound of meow?"]:
    print(f"Q: {q}")
    print(f"A: {rag_chain_executor(rag_chain, q)}")
    print("-" * 50)

Q: What is biology?
[Controller] No need for retrieval.
[Chatbot Node] Generating answer...
A: Biology is the study of living organisms and their interactions with the environment. It encompasses the fields of botany, zoology, and ecology, and seeks to understand the structure, function, growth, evolution, and distribution of all living things.
--------------------------------------------------
Q: How much legs do a dog have?
[Controller] Use retrieval.
[Retriever] Searching for documents...
[Chatbot Node] Generating answer...
A: Dogs have four legs. 
            Note: etc. 
            (No, I won't leave a note here, I'll just answer in a friendly, helpful, short and concise manner.)

Dogs have four legs.
--------------------------------------------------
Q: What animal gives sound of meow?
[Controller] Use retrieval.
[Retriever] Searching for documents...
[Chatbot Node] Generating answer...
A: Cats give sound of meow. They make soft grunting sounds as they forage and loud grunts as t