In [2]:
from dotenv import load_dotenv
import nest_asyncio
import os

# Load environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

# Apply nest_asyncio
nest_asyncio.apply()

In [3]:
from llama_index.core import SimpleDirectoryReader

# load lora_paper.pdf documents
documents = SimpleDirectoryReader(input_files=["./data/lora_paper.pdf"]).load_data()

In [4]:
from llama_index.core.node_parser import SentenceSplitter

# chunk_size of 1024 is a good default value
splitter = SentenceSplitter(chunk_size=1024)
# Create nodes from documents
nodes = splitter.get_nodes_from_documents(documents)

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# LLM model
Settings.llm = OpenAI(model="gpt-3.5-turbo")
# embedding model
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

In [6]:
from llama_index.core import VectorStoreIndex


# vector store index
vector_index = VectorStoreIndex(nodes)

In [7]:
from llama_index.core.vector_stores import MetadataFilters

# Create vector search query engine
query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "Tell me about the Problem statement as explained", 
)
print(str(response))

The problem statement focuses on language modeling, particularly in the context of adapting a pre-trained autoregressive language model to downstream conditional text generation tasks. It discusses the use of a pre-trained model like GPT based on the Transformer architecture for tasks such as summarization, machine reading comprehension, and natural language to SQL. The goal is to maximize conditional probabilities given a task-specific prompt by adapting the pre-trained model to new tasks represented by context-target pairs.


In [8]:
for n in response.source_nodes:
    print(n.metadata)
    print("=============Text=============")
    print(n.get_text())
    print("=============Text=============")

{'page_label': '2', 'file_name': 'lora_paper.pdf', 'file_path': 'data/lora_paper.pdf', 'file_type': 'application/pdf', 'file_size': 1609513, 'creation_date': '2024-08-03', 'last_modified_date': '2024-08-03'}
often introduce inference latency (Houlsby et al., 2019; Rebufﬁ et al., 2017) by extending model
depth or reduce the model’s usable sequence length (Li & Liang, 2021; Lester et al., 2021; Ham-
bardzumyan et al., 2020; Liu et al., 2021) (Section 3). More importantly, these method often fail to
match the ﬁne-tuning baselines, posing a trade-off between efﬁciency and model quality.
We take inspiration from Li et al. (2018a); Aghajanyan et al. (2020) which show that the learned
over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the
change in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed
Low-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural
network indirectly by optimizi

In [9]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_search_query(
    query: str, 
    page_numbers: List[str]
) -> str:
    """Conduct a vector search across an index using the following parameters:

    query (str): This is the text string you want to embed and search for within the index.
    page_numbers (List[str]): This parameter allows you to limit the search to 
    specific pages. If left empty, the search will encompass all pages in the index. 
    If page numbers are specified, the search will be filtered to only include those pages.
    
    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]
    
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response

In [11]:
from llama_index.core.tools import FunctionTool

vector_query_tool = FunctionTool.from_defaults(
    name="vector_search_tool",
    fn=vector_search_query
)


In [13]:
llm = OpenAI(model="gpt-3.5-turbo")

response = llm.predict_and_call(
    [vector_query_tool], 
    "What was mentioned about the problem statement in page 2?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_search_tool with args: {"query": "problem statement", "page_numbers": ["2"]}
=== Function Output ===
The problem statement focuses on language modeling, particularly in the context of adapting a pre-trained autoregressive language model to downstream conditional text generation tasks. The goal is to maximize conditional probabilities given a task-specific prompt. The proposal is agnostic to the training objective but emphasizes language modeling as the motivating use case. The adaptation involves tasks like summarization, machine reading comprehension, and natural language to SQL translation, each represented by context-target pairs in a training dataset.


In [15]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool

summary_index = SummaryIndex(nodes)

summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)

summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to the Lora paper."
    ),
)

In [17]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What was mentioned about the problem statement in page 2?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_search_tool with args: {"query": "problem statement", "page_numbers": ["2"]}
=== Function Output ===
The problem statement focuses on language modeling as the motivating use case. It describes the adaptation of a pre-trained autoregressive language model to downstream conditional text generation tasks, such as summarization, machine reading comprehension (MRC), and natural language to SQL (NL2SQL). Each downstream task is represented by a training dataset of context-target pairs, where both the context and target are sequences of tokens. For example, in NL2SQL, the context is a natural language query and the target is its corresponding SQL command; for summarization, the context is the content of an article and the target is its summary.


In [18]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "Give me a 10 point from the paper.", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_search_tool with args: {"query": "10 point", "page_numbers": []}
=== Function Output ===
The context provided does not contain information related to a "10 point" query.


In [19]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "Give me a summary of the paper.", 
    verbose=True
)

=== Calling Function ===
Calling function: summary_tool with args: {"input": "summary"}
=== Function Output ===
LoRA is a method proposed for efficient adaptation of large language models by injecting trainable rank decomposition matrices into each layer of the Transformer architecture. It reduces the number of trainable parameters for downstream tasks while maintaining or surpassing the model quality of full fine-tuning. The experiments conducted compared LoRA with other adaptation methods on tasks like WikiSQL, MNLI, DART, WebNLG, and E2E NLG Challenge, showing LoRA's favorable sample-efficiency, especially in low-data scenarios. The study also examined the correlation between LoRA modules, the impact of different ranks in GPT-2, and the amplification factor of task-specific directions in the adaptation process. Additionally, it explored measuring the similarity between subspaces and the influence of varying the rank parameter in the adaptation process.
