## Agentic RAG With Multiple Documents

In [2]:
import dotenv
%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [3]:
import nest_asyncio
nest_asyncio.apply()

In [4]:
papers = [
    "./datasets/lora_paper.pdf",
    "./datasets/longlora_efficient_fine_tuning.pdf"
]

In [5]:
from utils import create_doc_tools
from pathlib import Path

In [6]:
paper_to_tools_dict = {}


for paper in papers:
    print(f"Creating {paper} tool")
    path = Path(paper)
    vector_tool, summary_tool = await create_doc_tools(doc_name=path.stem, document_fp=path)
    paper_to_tools_dict[path.stem] = [vector_tool, summary_tool]

Creating ./datasets/lora_paper.pdf tool
Creating ./datasets/longlora_efficient_fine_tuning.pdf tool


In [7]:
paper_to_tools_dict

{'lora_paper': [<llama_index.core.tools.query_engine.QueryEngineTool at 0x7e37acfd20b0>,
  <llama_index.core.tools.query_engine.QueryEngineTool at 0x7e37acfd2500>],
 'longlora_efficient_fine_tuning': [<llama_index.core.tools.query_engine.QueryEngineTool at 0x7e37ab5e0b80>,
  <llama_index.core.tools.query_engine.QueryEngineTool at 0x7e37ab5e0c40>]}

In [8]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[Path(paper).stem]]
print(initial_tools)

[<llama_index.core.tools.query_engine.QueryEngineTool object at 0x7e37acfd20b0>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x7e37acfd2500>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x7e37ab5e0b80>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x7e37ab5e0c40>]


In [9]:
len(initial_tools)

4

#### Agent Worker

In [10]:
from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-3.5-turbo")

In [12]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner


agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools,
    llm=llm,
    verbose=True
)

agent = AgentRunner(agent_worker)

In [13]:
response = agent.query(
    "Explain to me what is the Lora and why it's being used."
    "Explain to me what is LongLoRA and why it's being used."
    "Compare and contract LongLoRA and Lora."
)


print(str(response))

Added user message to memory: Explain to me what is the Lora and why it's being used.Explain to me what is LongLoRA and why it's being used.Compare and contract LongLoRA and Lora.
=== Calling Function ===
Calling function: lora_paper_summary_query_engine_tool with args: {"input": "Explain what is Lora and why it's being used."}
=== Function Output ===
LoRA, or Low-Rank Adaptation, is a method utilized for adapting large-scale pre-trained language models to specific tasks or domains. It involves introducing trainable rank decomposition matrices into each layer of the Transformer architecture while freezing the pre-trained model weights. This approach significantly reduces the number of trainable parameters for downstream tasks, making it more efficient and cost-effective to fine-tune large models like GPT-3 with a high number of parameters. LoRA allows for the efficient adaptation of models to new tasks while preserving the learned knowledge from pre-training, ultimately improving model