https://www.youtube.com/watch?v=fnDYXE7BFto

In [60]:
import dotenv

%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [61]:
%reload_ext dotenv

In [62]:
import nest_asyncio
nest_asyncio.apply()

In [63]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=["./datasets/lora_paper.pdf"]).load_data()

In [64]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [65]:
note_metadata = nodes[0].get_content(metadata_mode=True)
print(str(note_metadata))

page_label: 1
file_name: lora_paper.pdf
file_path: datasets\lora_paper.pdf
file_type: application/pdf
file_size: 1609513
creation_date: 2024-06-09
last_modified_date: 2024-06-09

LORA: L OW-RANK ADAPTATION OF LARGE LAN-
GUAGE MODELS
Edward Hu∗Yelong Shen∗Phillip Wallis Zeyuan Allen-Zhu
Yuanzhi Li Shean Wang Lu Wang Weizhu Chen
Microsoft Corporation
{edwardhu, yeshe, phwallis, zeyuana,
yuanzhil, swang, luw, wzchen }@microsoft.com
yuanzhil@andrew.cmu.edu
(Version 2)
ABSTRACT
An important paradigm of natural language processing consists of large-scale pre-
training on general domain data and adaptation to particular tasks or domains. As
we pre-train larger models, full ﬁne-tuning, which retrains all model parameters,
becomes less feasible. Using GPT-3 175B as an example – deploying indepen-
dent instances of ﬁne-tuned models, each with 175B parameters, is prohibitively
expensive. We propose Low-RankAdaptation, or LoRA, which freezes the pre-
trained model weights and injects trainable ran

In [66]:
len(nodes)

39

### Create LLM and Embedding Model

In [73]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model='gpt-3.5-turbo')
Settings.embedding = OpenAIEmbedding(model='text-embedding-ada-002')

### Creating Index

In [74]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes=nodes)
vecto_index = VectorStoreIndex(nodes=nodes)

### Creating Querys Engines

In [75]:
summary_query_engine = summary_index.as_query_engine(
    response_model="tree_summary",
    use_async=True,
)

vector_query_engine = vecto_index.as_query_engine()

### Query Tool

In [76]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarizing of the lora paper."
    )
)


vecto_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context related to the lora paper."
    )
)

### Router Query Engine

In [77]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_egine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[summary_tool, vecto_tool],
    verbose=True,
)

In [78]:
response = query_egine.query("Sobre qual é o assunto do lora paper?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: The first choice is more relevant as it focuses on summarizing the lora paper, which would provide an overview of the subject matter..
[0mO assunto do paper "LoRA" é a proposta de uma estratégia eficiente de adaptação para modelos de linguagem, que não introduz latência de inferência e mantém a qualidade do modelo. A abordagem LoRA permite a troca rápida de tarefas quando implantada como um serviço, compartilhando a maioria dos parâmetros do modelo. A pesquisa se concentra em modelos de linguagem Transformer, mas os princípios propostos são aplicáveis a qualquer rede neural com camadas densas.
