In [1]:
import dotenv
%load_ext dotenv
%dotenv

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
from llama_index.core import SimpleDirectoryReader

# load lora_paper.pdf documents
documents = SimpleDirectoryReader(input_files=["./datasets/lora_paper.pdf"]).load_data()

In [4]:
from llama_index.core.node_parser import SentenceSplitter

# chunk_size of 1024 is a good default value
splitter = SentenceSplitter(chunk_size=1024)
# Create nodes from documents
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# LLM model
Settings.llm = OpenAI(model="gpt-3.5-turbo")
# embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

# summary index
summary_index = SummaryIndex(nodes)
# vector store index
vector_index = VectorStoreIndex(nodes)

In [7]:
# summary query engine
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)

# vector query engine
vector_query_engine = vector_index.as_query_engine()

In [9]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

In [10]:
from llama_index.core.tools import QueryEngineTool

In [11]:
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the Lora paper."
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the the Lora paper."
    ),
)

#### Creating Agent Worker

In [12]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tools=[vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [15]:
response = agent.query(
    "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"
)

Added user message to memory: Explain to me what Lora and why it's being used. Are existing solutions not good enough?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Explain what Lora is and why it is being used."}
=== Function Output ===
Lora is a method used to adapt large-scale pre-trained language models to specific tasks or domains by introducing trainable rank decomposition matrices into each layer of the Transformer architecture. It aims to improve performance on target tasks by updating the pre-trained model weights efficiently without the need for extensive retraining from scratch. Lora significantly reduces the number of trainable parameters, making it more memory and computationally efficient for downstream tasks. It allows for switching tasks efficiently by replacing the rank decomposition matrices, reducing storage requirements and task-switching overhead. Additionally, Lora enables training with fewer GPUs, avoids I/O bottlenecks, and p

In [17]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: lora_paper.pdf
file_path: datasets/lora_paper.pdf
file_type: application/pdf
file_size: 1609513
creation_date: 2024-05-10
last_modified_date: 2024-05-10

LORA: L OW-RANK ADAPTATION OF LARGE LAN-
GUAGE MODELS
Edward Hu∗Yelong Shen∗Phillip Wallis Zeyuan Allen-Zhu
Yuanzhi Li Shean Wang Lu Wang Weizhu Chen
Microsoft Corporation
{edwardhu, yeshe, phwallis, zeyuana,
yuanzhil, swang, luw, wzchen }@microsoft.com
yuanzhil@andrew.cmu.edu
(Version 2)
ABSTRACT
An important paradigm of natural language processing consists of large-scale pre-
training on general domain data and adaptation to particular tasks or domains. As
we pre-train larger models, full ﬁne-tuning, which retrains all model parameters,
becomes less feasible. Using GPT-3 175B as an example – deploying indepen-
dent instances of ﬁne-tuned models, each with 175B parameters, is prohibitively
expensive. We propose Low-RankAdaptation, or LoRA, which freezes the pre-
trained model weights and injects trainable ran

#### Using Chat Memory

In [18]:
response = agent.chat(
    "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"
)

print(str(response))

Added user message to memory: Explain to me what is Lora and why it's being used. Are existing solutions not good enough?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Explain what Lora is and why it is being used."}
=== Function Output ===
Lora is a method used in deep learning for adapting large-scale pre-trained language models to specific tasks or domains. It involves freezing the pre-trained model weights and introducing trainable rank decomposition matrices into each layer of the Transformer architecture. This approach significantly reduces the number of trainable parameters for downstream tasks, making training more efficient and lowering hardware requirements. Lora is employed to fine-tune models for improved performance on specific tasks without the need to retrain the entire model from scratch, thus optimizing computational resources while enhancing task-specific capabilities.
=== Calling Function ===
Calling function: query_engine_tool wi

In [19]:
response = agent.chat(
    "What was my last question to you?"
)

print(str(response))

Added user message to memory: What was my last question to you?
=== LLM Response ===
Your last question to me was: "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"
assistant: Your last question to me was: "Explain to me what is Lora and why it's being used. Are existing solutions not good enough?"


#### Low-Level Understanding

In [24]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

##### Creating A Step

In [29]:
task = agent.create_task(
    "Explain to me what is Lora and why it's being used."
    "Are existing solutions not good enough?"
)

In [30]:
completed_steps = agent.get_completed_steps(task.task_id)

print(f"Number of completed steps for tasksID {task.task_id} is {len(completed_steps)}")

if len(completed_steps) > 0:
    print(completed_steps[0].output.sources[0].raw_output)

Number of completed steps for tasksID 21671153-bb37-4ef2-a653-bd9090c39d00 is 0


In [33]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Number of completed steps for tasksID {task.task_id} is {len(upcoming_steps)}")

if len(upcoming_steps) > 0:
    print(upcoming_steps[0].input)

Number of completed steps for tasksID 21671153-bb37-4ef2-a653-bd9090c39d00 is 1
Explain to me what is Lora and why it's being used.Are existing solutions not good enough?


##### Execute A Step

In [34]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Explain to me what is Lora and why it's being used.Are existing solutions not good enough?
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Explain what Lora is and why it is being used."}
=== Function Output ===
Lora is a method used to adapt large-scale pre-trained language models to specific tasks or domains by introducing trainable rank decomposition matrices into each layer of the Transformer architecture. It aims to reduce the number of trainable parameters for downstream tasks, making training more efficient and lowering hardware requirements. Lora allows for quick task-switching without introducing additional inference latency by updating only the injected low-rank matrices instead of all model parameters. It is being used to address the challenge of fine-tuning large models like GPT-3, which can be costly to deploy independently fine-tuned models for each task. Lora enhances model performance by amplifying task-sp

In [37]:
completed_steps = agent.get_completed_steps(task.task_id)

print(f"Number of completed steps for tasksID {task.task_id} is {len(completed_steps)}")

if len(completed_steps) > 0:
    print(completed_steps[0].output.sources[0].raw_output)

Number of completed steps for tasksID 21671153-bb37-4ef2-a653-bd9090c39d00 is 1
Lora is a method used to adapt large-scale pre-trained language models to specific tasks or domains by introducing trainable rank decomposition matrices into each layer of the Transformer architecture. It aims to reduce the number of trainable parameters for downstream tasks, making training more efficient and lowering hardware requirements. Lora allows for quick task-switching without introducing additional inference latency by updating only the injected low-rank matrices instead of all model parameters. It is being used to address the challenge of fine-tuning large models like GPT-3, which can be costly to deploy independently fine-tuned models for each task. Lora enhances model performance by amplifying task-specific directions in the model's feature space, improving its ability to handle specific tasks effectively and efficiently.


In [38]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Number of completed steps for tasksID {task.task_id} is {len(upcoming_steps)}")

if len(upcoming_steps) > 0:
    print(upcoming_steps[0].input)

Number of completed steps for tasksID 21671153-bb37-4ef2-a653-bd9090c39d00 is 1
None


In [39]:
step_output = agent.run_step(task.task_id)

=== LLM Response ===
Lora is a method used to adapt large-scale pre-trained language models to specific tasks or domains by introducing trainable rank decomposition matrices into each layer of the Transformer architecture. It aims to reduce the number of trainable parameters for downstream tasks, making training more efficient and lowering hardware requirements. Lora allows for quick task-switching without introducing additional inference latency by updating only the injected low-rank matrices instead of all model parameters. It is being used to address the challenge of fine-tuning large models like GPT-3, which can be costly to deploy independently fine-tuned models for each task. Lora enhances model performance by amplifying task-specific directions in the model's feature space, improving its ability to handle specific tasks effectively and efficiently.

Existing solutions may not be good enough for Lora technology, as indicated by the need for advancements like Lora to address speci

In [40]:
print(step_output.is_last)

True


#### Human Feedback In The Loop

In [42]:
task = agent.create_task(
    "Explain to me what is Lora and why it's being used."
    "Are existing solutions not good enough?"
)

In [43]:
step_output = agent.run_step(
    task.task_id, input="Explain to me the dataset used to fine-tune in the Lora paper."
)

Added user message to memory: Explain to me the dataset used to fine-tune in the Lora paper.
=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "Dataset used to fine-tune in the Lora paper"}
=== Function Output ===
The datasets used for fine-tuning in the LoRA paper are GLUE benchmark, WikiSQL, MultiNLI, SAMSum, E2E NLG Challenge, and GPT-3 175B.


In [45]:
step_output = agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
The dataset used for fine-tuning in the Lora paper includes the GLUE benchmark, WikiSQL, MultiNLI, SAMSum, E2E NLG Challenge, and GPT-3 175B.
True


In [46]:
response = agent.finalize_response(task.task_id)
print(str(response))

assistant: The dataset used for fine-tuning in the Lora paper includes the GLUE benchmark, WikiSQL, MultiNLI, SAMSum, E2E NLG Challenge, and GPT-3 175B.
