# **Agentic AI RAG with VectorDB:**

**Steps:** <br>

1. Load the LLM & Embeddings (from HF),

2. Setup the Vector Database, i.e, Pinecone,
  * Create Index,
  * Setup Vector_DB for Knowledgebase.

3. Use the PDF and Create a KnowledgeBase,
  * Load PDFs,
  * Apply different types of Chunking Method,
  * Store the data into Vector DB

4. Now, create a RAG Agent using the knowldgebase, model, embeddings, prompts (set of instructions).

In [None]:
# install necessary libaries:


%pip install --upgrade --quiet sentence_transformers
%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-google-genai langchain-chroma bs4 boto3
%pip install --upgrade --quiet langchain-aws pinecone-client
%pip install --upgrade --quiet langgraph langsmith langchain_anthropic
%pip install --upgrade --quiet sentence-transformers langchain_groq
%pip install --upgrade --quiet "pinecone[grpc]"
%pip install --upgrade --quiet phidata duckduckgo-search yfinance

In [2]:
import os
from google.colab import userdata


os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN') # HF_TOKEN
os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY') # GROQ_API_KEY
os.environ["PHIDATA_API_KEY"] = userdata.get('PHIDATA_API_KEY') # PHIDATA_API_KEY
os.environ["PINECONE_API_KEY"] = userdata.get('PINECONE_API_KEY') # PINECONE_API_KEY

## **Step 1: Load the LLM & Embeddings (from HF):**

In [3]:
# Load model/LLM from Groq

from phi.model.groq import Groq


model = Groq(
    id="llama-3.3-70b-versatile",
    max_tokens=512,
    temperature=0.5,
)

model

Groq(id='llama-3.3-70b-versatile', name='Groq', provider='Groq', metrics={}, response_format=None, tools=None, tool_choice=None, run_tools=True, show_tool_calls=None, tool_call_limit=None, functions=None, function_call_stack=None, system_prompt=None, instructions=None, session_id=None, structured_outputs=None, supports_structured_outputs=False, frequency_penalty=None, logit_bias=None, logprobs=None, max_tokens=512, presence_penalty=None, seed=None, stop=None, temperature=0.5, top_logprobs=None, top_p=None, user=None, extra_headers=None, extra_query=None, request_params=None, api_key=None, base_url=None, timeout=None, max_retries=None, default_headers=None, default_query=None, http_client=None, client_params=None, client=None, async_client=None)

In [4]:
# Load Embeddings from HF:

from phi.embedder.sentence_transformer import SentenceTransformerEmbedder


embedder = SentenceTransformerEmbedder(
    id="all-MiniLM-L6-v2",
    dims=384,
)

embedder

SentenceTransformerEmbedder(dimensions=1536, model='sentence-transformers/all-MiniLM-L6-v2', sentence_transformer_client=None)

In [5]:
len(embedder.get_embedding("Tata By by Good By!!!"))

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

384

## **Step 2: Setup the Vector Database, i.e. Pinecone:**

### **Create Index:**

In [6]:
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec


PC = Pinecone()

In [42]:
# Create Index:

index_name = "pdfs"

PC.create_index(
  name=index_name,
  dimension=384,
  metric="cosine",
  spec=ServerlessSpec(
    cloud="aws",
    region="us-east-1"
  ),
  deletion_protection="disabled" # enabled means index never deleted, disabled means index can be deleted.
)

In [43]:
# Check if index_name exist:

PC.has_index(index_name)

True

### **Setup the Vector DB for Knowledgebase:**

In [None]:
%pip install --upgrade --quiet pinecone-text

In [9]:
from phi.vectordb.pineconedb import PineconeDB


index_name = "pdfs"

vector_db = PineconeDB(
    name=index_name,
    dimension=384,
    metric="cosine",
    spec={"serverless": {"cloud": "aws", "region": "us-east-1"}},
    use_hybrid_search=True,
    hybrid_alpha=0.5,
    embedder=embedder, # Define which embeddings I am using.
    # namespace="" # Namespace you can define
)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


## **Step 3: Create Knowledgebase:**

In [10]:
from phi.knowledge.pdf import PDFKnowledgeBase, PDFReader
from phi.document.chunking.fixed import FixedSizeChunking

import nltk

nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [11]:
# Setup the Knowledgebase:

pdf_knowledge_base = PDFKnowledgeBase(
    path="/content/LLM_Questions.pdf", # Path of PDF File.
    vector_db=vector_db,  # Vector Database for the Knowledge Base.
    chunking_strategy = FixedSizeChunking(), # The chunking strategy to use. Fixed size chunking is a method of splitting documents into smaller chunks of a specified size, with optional overlap between chunks.
    reader=PDFReader(chunk=True), # A PDFReader that converts the PDFs into Documents for the vector database.
)


# Create or Store data into Pinecone, Knowledgebase:
pdf_knowledge_base.load(upsert=True)

## **Step 4: Create RAG Agent:**

In [12]:
from phi.agent import Agent

In [17]:
agentic_ai_rag = Agent(
    description="You are a LLM-QA Bot, helps to solve only the LLM related Query.",
    instructions=["Solve the AI/ML related query."],
    model=model,
    knowledge=pdf_knowledge_base,
    search_knowledge=True,
    show_tool_calls=True,
    markdown=True,
)

In [18]:
agentic_ai_rag.print_response("What are different types of Foundation Models?", stream=True)

Output()

In [19]:
agentic_ai_rag.print_response("Tell me a 2 sentence horror story.", stream=True)

Output()

## **Note:**

* You can store data, embeddings in to Vector DB using Langchain traditional Approach,

* Then create a knowledgebase using LancgChainKnowledgebase,

* Then You can Create Agentic RAG using Phidata.

https://docs.phidata.com/knowledge/langchain
