In [1]:
import dotenv
%load_ext dotenv
%dotenv

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=["./datasets/lora_paper.pdf"]).load_data()

In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
node_metadata = nodes[0].get_content(metadata_mode=True)
print(str(node_metadata))

page_label: 1
file_name: lora_paper.pdf
file_path: datasets/lora_paper.pdf
file_type: application/pdf
file_size: 1609513
creation_date: 2024-06-08
last_modified_date: 2024-06-08

LORA: L OW-RANK ADAPTATION OF LARGE LAN-
GUAGE MODELS
Edward Hu∗Yelong Shen∗Phillip Wallis Zeyuan Allen-Zhu
Yuanzhi Li Shean Wang Lu Wang Weizhu Chen
Microsoft Corporation
{edwardhu, yeshe, phwallis, zeyuana,
yuanzhil, swang, luw, wzchen }@microsoft.com
yuanzhil@andrew.cmu.edu
(Version 2)
ABSTRACT
An important paradigm of natural language processing consists of large-scale pre-
training on general domain data and adaptation to particular tasks or domains. As
we pre-train larger models, full ﬁne-tuning, which retrains all model parameters,
becomes less feasible. Using GPT-3 175B as an example – deploying indepen-
dent instances of ﬁne-tuned models, each with 175B parameters, is prohibitively
expensive. We propose Low-RankAdaptation, or LoRA, which freezes the pre-
trained model weights and injects trainable ran

In [6]:
len(nodes)

38

In [8]:
###LLM AND EMBEDDING MODEL
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embedding = OpenAIEmbedding(model="text-embedding-ada-002")

In [9]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes=nodes)
vector_index = VectorStoreIndex(nodes=nodes)

CREATING QUERY ENGINES

In [11]:
summary_query_engine = summary_index.as_query_engine(
    response_model = "tree_summary",
    use_async=True,
)

vector_query_engine = vector_index.as_query_engine()

#### QUERY TOOL

In [16]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine, 
    description=(
        "Useful for summarization of the lora paper."
    )
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context related to the lora paper."
    )
)

In [17]:
#router query engine
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[summary_tool, vector_tool],
    verbose=True,
)

In [18]:
response = query_engine.query("What is the lora paper about?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization of the lora paper..
[0mThe LoRA paper introduces a novel adaptation strategy called Low-Rank Adaptation, which aims to address the challenges of fine-tuning large language models for specific tasks. LoRA freezes the pre-trained model weights and introduces trainable rank decomposition matrices into each layer of the Transformer architecture. This approach significantly reduces the number of trainable parameters for downstream tasks, leading to improved efficiency in terms of memory requirements and training throughput. The paper demonstrates that LoRA outperforms traditional fine-tuning methods on various language models like RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters. Additionally, LoRA does not introduce any additional inference latency, making it a promising solution for adapting large language models to specific tasks efficiently.


In [21]:
len(response.source_nodes)

2

In [19]:
response = query_engine.query("What eval datasets where used in the lora paper?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: The question is asking about specific datasets used in the lora paper, which is more related to retrieving specific context rather than summarization..
[0mThe evaluation datasets used in the LoRA paper include MNLI, STS-B, WikiSQL, SAMSum, E2E NLG Challenge, DART, and WebNLG.


In [20]:
print(response.source_nodes[0].get_content())

MNLI-
ndescribes a subset with ntraining examples. We evaluate with the full validation set. LoRA
performs exhibits favorable sample-efﬁciency compared to other methods, including ﬁne-tuning.
To be concrete, let the singular values of Ui⊤
AUj
Bto beσ1,σ2,···,σpwherep= min{i,j}. We
know that the Projection Metric Ham & Lee (2008) is deﬁned as:
d(Ui
A,Uj
B) =√p−p∑
i=1σ2
i∈[0,√p]
23
