<a href="https://colab.research.google.com/github/Rizwankaka/Agentic-AI-/blob/main/SmolAgents/smolagents_agentic_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## change the runtype T4

In [5]:
%pip install -qU smolagents litellm langchain langchain-community sentence-transformers datasets chromadb

In [6]:
import os
from google.colab import userdata
os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')
os.environ['GOOGLE_API_KEY'] = userdata.get('GEMINI_API_KEY')

In [7]:
from smolagents import CodeAgent, HfApiModel

model = HfApiModel()

agent = CodeAgent(tools=[],model=model)

agent.run('What is 24*365?')

8760

## Vanilla RAG has limitations, most importantly these two:
1. It performs only one retrieval step: if the results are bad, the generation in turn will be bad.

2. The user query will often be a question and the document containing the true answer will be in affirmative voice, so its similarity score will be downgraded compared to other source documents in the interrogative form, leading to a risk of missing the relevant information.

This Agent will:

- ✅ Formulate the query itself
- ✅ Critique to re-retrieve if needed.

## indexing data into chroma

In [8]:
import datasets
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.docstore.document import Document
#from langchain_community.retrievers import BM25Retriever

knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")
knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers"))

source_docs = [
    Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]})
    for doc in knowledge_base
]

### Creating Chunks using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    add_start_index=True,
    strip_whitespace=True,
    separators=["\n\n", "\n", ".", " ", ""],
)
new_docs = text_splitter.split_documents(documents=source_docs)

###  BGE Embddings

from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cuda"}
encode_kwargs = {"normalize_embeddings": True}
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)
### Populate Vector DB

db = Chroma.from_documents(new_docs, embeddings)

In [9]:
retriever = db.as_retriever(search_kwargs={"k": 4})
retriever.invoke('forward pass in transformer')

[Document(metadata={'source': 'transformers', 'start_index': 7863}, page_content='4.  [ ] Created script that successfully runs forward pass using\n    original repository and checkpoint\n\n5.  [ ] Successfully opened a PR and added the model skeleton to Transformers\n\n6.  [ ] Successfully converted original checkpoint to Transformers\n    checkpoint\n\n7.  [ ] Successfully ran forward pass in Transformers that gives\n    identical output to original checkpoint\n\n8.  [ ] Finished model tests in Transformers\n\n9.  [ ] Successfully added Tokenizer in Transformers'),
 Document(metadata={'source': 'transformers', 'start_index': 7510}, page_content='4.  [ ] Created script that successfully runs forward pass using\n    original repository and checkpoint\n\n5.  [ ] Successfully opened a PR and added the model skeleton to Transformers\n\n6.  [ ] Successfully converted original checkpoint to Transformers\n    checkpoint\n\n7.  [ ] Successfully ran forward pass in Transformers that gives\n   

## Creation of Retriever Tool

In [10]:
from smolagents import Tool

class RetrieverTool(Tool):
    name = "retriever"
    description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
    inputs = {
        "query": {
            "type": "string",
            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
        }
    }
    output_type = "string"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.retriever = db.as_retriever(search_kwargs={"k": 4})

    def forward(self, query: str) -> str:
        assert isinstance(query, str), "Your search query must be a string"

        docs = self.retriever.invoke(
            query,
        )
        return "\nRetrieved documents:\n" + "".join(
            [
                f"\n\n===== Document {str(i)} =====\n" + doc.page_content
                for i, doc in enumerate(docs)
            ]
        )

retriever_tool = RetrieverTool()

## agent initialization

In [12]:
from smolagents import HfApiModel, CodeAgent

agent = CodeAgent(
    tools=[retriever_tool], model=HfApiModel(), max_steps=4,
)

In [13]:
agent_output = agent.run("For a transformers model training, which is slower, the forward or the backward pass?")

print("Final output:")
print(agent_output)

Final output:
In the context of training transformers, the backward pass is generally slower than the forward pass. This is because the backward pass involves computing the gradients of the loss with respect to each parameter, which requires additional operations such as backpropagation through the entire network. These operations can be computationally intensive, especially for large models with many parameters and layers.

The forward pass, on the other hand, simply involves passing the input data through the model to obtain predictions, which is typically less computationally intensive.

While the exact time difference can vary depending on the model architecture, the size of the input data, and the hardware used, it is a common observation in deep learning that the backward pass is slower than the forward pass.


In [17]:
agent.run("For a transformers model training, What is the role of scaled dot product?")

"The scaled dot product is a crucial component of the self-attention mechanism used in Transformer models. It plays a key role in how the model computes the attention weights between different tokens in a sequence. Here's a breakdown of its role:\n\n1. **Computation of Attention Scores**: The scaled dot product helps in computing the raw attention scores between each pair of tokens. Given two vectors, \\(Q\\) (query) and \\(K\\) (key), the dot product \\(Q \\cdot K^T\\) is computed. This operation captures the similarity between the query and key vectors.\n\n2. **Scaling**: The dot product scores are scaled by dividing by the square root of the dimensionality of the key vector (\\(\\sqrt{d_k}\\)). This scaling step helps in preventing the dot product scores from growing too large, which could lead to very small gradients during training (a problem known as the vanishing gradient problem).\n\n3. **Softmax Function**: The scaled dot product scores are then passed through a softmax functi