In [1]:
import os

import dotenv
%load_ext dotenv
%dotenv

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [4]:
from llama_index.core import SimpleDirectoryReader

# load lora_paper.pdf documents
documents = SimpleDirectoryReader(input_files=["./documents/lora_paper.pdf"]).load_data()

from llama_index.core.node_parser import SentenceSplitter

# chunk_size of 1024 is a good default value
splitter = SentenceSplitter(chunk_size=1024)
# Create nodes from documents
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
from llama_index.core import set_global_service_context
from my_lib.openai_config import service_context_openai

set_global_service_context(service_context_openai)

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

# summary index
summary_index = SummaryIndex(nodes)
# vector store index
vector_index = VectorStoreIndex(nodes)

# summary query engine
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)

# vector query engine
vector_query_engine = vector_index.as_query_engine()

In [7]:
from llama_index.core import Settings

llm = Settings.llm

In [9]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the Lora paper."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the the Lora paper."
    ),
)

In [10]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tools=[vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [11]:
response = agent.query(
    "Explicame que es Lora y por que se esta usando. No son suficientemente buenas las soluciones existentes?"
) 

Added user message to memory: Explicame que es Lora y por que se esta usando. No son suficientemente buenas las soluciones existentes?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "What is Lora and why is it being used? Are existing solutions not good enough?"}
=== Function Output ===
Lora is a method that enhances the adaptation of large language models to specific tasks or domains efficiently by injecting trainable rank decomposition matrices into each layer of the Transformer architecture. It aims to amplify task-specific directions in pre-trained models for better adaptation to downstream tasks, emphasizing important but not emphasized directions in the original model. Lora is being used to improve model performance, reduce memory requirements, maintain model quality, and enable efficient task-switching without introducing additional latency during inference. Existing solutions like adapter layers or optimizing input layer activations have limit

In [12]:
print(str(response))

Lora is a method that enhances the adaptation of large language models to specific tasks or domains efficiently by injecting trainable rank decomposition matrices into each layer of the Transformer architecture. It aims to amplify task-specific directions in pre-trained models for better adaptation to downstream tasks, emphasizing important but not emphasized directions in the original model. Lora is being used to improve model performance, reduce memory requirements, maintain model quality, and enable efficient task-switching without introducing additional latency during inference.

Existing solutions like adapter layers or optimizing input layer activations have limitations such as introducing inference latency, reducing model quality, or facing challenges in optimizing prompts. This makes Lora a valuable approach for improving model adaptation, especially in low-data scenarios where existing solutions may not be sufficient.


In [13]:
response = agent.chat(
    "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"
)

print(str(response))

Added user message to memory: Explain to me what is Lora and why it's being used. Are existing solutions not good enough?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "What is Lora and why is it being used?"}
=== Function Output ===
Lora is a method used for adapting large pre-trained language models to specific tasks or domains by introducing trainable rank decomposition matrices into each layer of the Transformer architecture. It significantly reduces the number of trainable parameters for downstream tasks, making it more memory-efficient and computationally efficient. Lora is being used to fine-tune models for better performance on specific tasks without the need for extensive retraining from scratch, addressing the challenge of adapting large models like GPT-3 with a massive number of parameters in a more feasible and cost-effective manner.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Are existing solutions 

In [14]:
response = agent.chat(
    "What was my last question to you?"
)

print(str(response))

Added user message to memory: What was my last question to you?
=== LLM Response ===
Your last question was: "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"
Your last question was: "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"


In [15]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [16]:
task = agent.create_task(
    "Explain to me what is Lora and why it's being used."
    "Are existing solutions not good enough?"
)

In [26]:
completed_steps = agent.get_completed_steps(task.task_id)

print(f"Number of completed steps for tasksID {task.task_id} is {len(completed_steps)}")

if len(completed_steps) > 0:
    print(completed_steps[0].output.sources[0].raw_output)

Number of completed steps for tasksID ed01f8a8-74b7-435b-a3df-f40f01bd40ab is 2
Lora is a method used to adapt large language models to specific tasks efficiently by freezing pre-trained model weights and injecting trainable rank decomposition matrices into each layer of the Transformer architecture. It significantly reduces the number of trainable parameters for downstream tasks, making it more feasible to adapt large models without the need for full fine-tuning. Lora is used to reduce the computational cost and memory requirements associated with fine-tuning large models like GPT-3, while maintaining or even improving model quality on various tasks.


In [27]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Number of completed steps for tasksID {task.task_id} is {len(upcoming_steps)}")

if len(upcoming_steps) > 0:
    print(upcoming_steps[0].input)

Number of completed steps for tasksID ed01f8a8-74b7-435b-a3df-f40f01bd40ab is 1
None


In [28]:
step_output = agent.run_step(task.task_id)

=== LLM Response ===
Lora is a method used to adapt large language models efficiently by reducing the number of trainable parameters for downstream tasks. It helps in maintaining or improving model quality while reducing computational costs and memory requirements. Existing solutions for language model adaptation may have limitations in terms of inference latency, model quality, and scalability, which is why Lora is being used to address these challenges and provide a more efficient adaptation strategy.


In [29]:
print(step_output.is_last)

True


In [30]:
task = agent.create_task(
    "Explain to me what is Lora and why it's being used."
    "Are existing solutions not good enough?"
)

In [31]:
step_output = agent.run_step(
    task.task_id, input="Explain to me the dataset used to fine-tune in the Lora paper."
)

Added user message to memory: Explain to me the dataset used to fine-tune in the Lora paper.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Dataset used to fine-tune in the Lora paper"}
=== Function Output ===
The dataset used for fine-tuning in the LoRA paper is MultiNLI.


In [33]:
step_output= agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
The dataset used for fine-tuning in the Lora paper is MultiNLI.
True


In [34]:
response = agent.finalize_response(task.task_id)
print(str(response))

The dataset used for fine-tuning in the Lora paper is MultiNLI.
