# LMQI with LlamaIndex

CONTENT:


---
CONCLUSIONS:

---
---

## VectorStoreIndex with documents vs nodes

https://lmql.ai/docs/latest/lib/integrations/llama_index.html - outdated example with regard to query/query_engine \
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/


NOTE: to runt the notebook
1. Remove 'local:' from llm = lmql.model("local:llama.cpp:/home/dorota/models/mistral-7b-instruct-v0.2.Q6_K.gguf", tokenizer="mistralai/Mistral-7B-Instruct-v0.2"), 
1. start a service in terminal with: lmql serve-model llama.cpp:/home/dorota/models/mistral-7b-instruct-v0.2.Q6_K.gguf --verbose True --n_gpu_layers 20 --n_ctx 0


In [1]:
import lmql
from llama_index.core import GPTVectorStoreIndex, VectorStoreIndex, SimpleDirectoryReader, ServiceContext, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from transformers import AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [57]:
# llama.cpp endpoint: https://lmql.ai/docs/models/llama.cpp.html#running-without-a-model-server
# tokenizer.model from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/tree/main

llm = lmql.model("llama.cpp:/home/dorota/models/mistral-7b-instruct-v0.2.Q6_K.gguf", tokenizer="mistralai/Mistral-7B-Instruct-v0.2", n_gpu_layers=10, n_ctx=0, verbose=False) 

In [58]:
# read in all documents from assigned folder
documents = SimpleDirectoryReader(input_files=["/home/dorota/LLM-diploma-project/00_concept_tests/data/40001_2023_Article_1364.pdf"]).load_data() # -> list of Document objects with 1 doc/page in article with metadata and tags (documents[0].text)

In [59]:
# set global variables to create vector embeddings for text nodes
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2').encode

with documents

In [14]:
index = VectorStoreIndex.from_documents(documents, show_progress=True) #[0:1] # index = VectorStoreIndex(nodes)

Settings.llm = None # =None to enable correct setting in query_engine
query_engine = index.as_query_engine(streaming=True, llm=None) # llm=None sets llm to Settings.llm thus defined as None

Parsing nodes: 100%|██████████| 21/21 [00:00<00:00, 251.36it/s]
Generating embeddings: 100%|██████████| 47/47 [00:00<00:00, 191.11it/s]

LLM is explicitly disabled. Using MockLLM.





In [None]:
# question = "What is the main topic of the article?"
# response = query_engine.query(question)
# response.source_nodes

In [None]:
# print(response.source_nodes[0].node.text)

with Semantic nodes

In [60]:
from llama_index.core.node_parser import SemanticSplitterNodeParser

splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=Settings.embed_model
)

nodes = splitter.get_nodes_from_documents(documents)

index= VectorStoreIndex(nodes)

Settings.llm = None
query_engine = index.as_query_engine(streaming=True, llm=None) 

LLM is explicitly disabled. Using MockLLM.


In [25]:
len(nodes)

96

In [94]:
similarity_top_k = 2

@lmql.query(model=llm)
async def index_query(question: str):
    '''lmql
    "You are a QA bot that helps users answer questions.\n"
    
    # ask the question
    "Question: {question}\n"

    # look up and insert relevant information into the context
    response = query_engine.query(question)
    for s in response.source_nodes:
        print(s.node.get_text())
        print('----------------------------------------------------------')
    information = "\n\n".join([s.node.get_text() for s in response.source_nodes])
    "\nRelevant Information: {information}\n"
    
    # generate a response
    "Your response based on relevant information:[RESPONSE]" where len(RESPONSE) < 200 and STOPS_AT(RESPONSE, ".")
    '''


In [95]:
result = await index_query("What is the main finding?", 
                   output_writer=lmql.stream(variable="RESPONSE"))


Page 14 of 21 Xu et al. European Journal of Medical Research          (2023) 28:461 
and new knowledge that emerged. 
----------------------------------------------------------
Employing a segmentation process, topics exhibit -
ing akin clusters were deftly allocated to cohesive areas, 
thereby engendering a heightened sense of organization 
and a more comprehensive grasp of the underlying data 
(Fig.  8a). In this analysis, a keyword co-occurrence analy -
sis was conducted to identify the most frequently appear -
ing terms. The analysis included five keywords: “breast 
cancer” with 1339 occurrences, “expression” with 831 
occurrences, “cancer” with 407 occurrences, “protein” 
with 358 occurrences, and “translation” with 350 occur -
rences. These results suggest that the analysis primarily 
focused on the relationship between breast cancer and 
protein synthesis, including gene expression, translation, 
and apoptosis. The aim of this analysis was to identify the 
most frequent keywords

The main finding of the study is that the analysis primarily focused on the relationship between breast cancer and protein synthesis, including gene expression, translation, and apoptosis.

## LMQL output in a dataclass for strucutred output
https://lmql.ai/blog/ \
https://www.timlrx.com/blog/generating-structured-output-from-llms#lmql

In [75]:
import lmql
from dataclasses import dataclass

@dataclass
class Ingredient:
    name: str
    weight_in_grams: int

@dataclass
class Recipe:
    recipe_name: str
    servings: int
    ingredient1: Ingredient
    ingredient2: Ingredient
    ingredient3: Ingredient
    ingredient4: Ingredient
    ingredient5: Ingredient
    ingredient6: Ingredient
    ingredient7: Ingredient
    ingredient8: Ingredient
    # list not supported...

@lmql.query(model=llm)
async def spaghetti():
    '''lmql
    "Spaghetti bolognese recipe for a family of 4."
    "[RECIPE_DATA]\\n" where type(RECIPE_DATA) is Recipe
    return RECIPE_DATA
    '''

result = await spaghetti()

Recipe(recipe_name='Spaghetti Bolognese', servings=4, ingredient1=Ingredient(name='Spaghetti', weight_in_grams=450), ingredient2=Ingredient(name='Olive Oil', weight_in_grams=2), ingredient3=Ingredient(name='Onion', weight_in_grams=150), ingredient4=Ingredient(name='Garlic', weight_in_grams=2), ingredient5=Ingredient(name='Carrots', weight_in_grams=150), ingredient6=Ingredient(name='Celery', weight_in_grams=50), ingredient7=Ingredient(name='Ground Beef', weight_in_grams=450), ingredient8=Ingredient(name='Tomato Sauce', weight_in_grams=800))


In [91]:
result

Recipe(recipe_name='Spaghetti Bolognese', servings=4, ingredient1=Ingredient(name='Spaghetti', weight_in_grams=450), ingredient2=Ingredient(name='Olive Oil', weight_in_grams=2), ingredient3=Ingredient(name='Onion', weight_in_grams=150), ingredient4=Ingredient(name='Garlic', weight_in_grams=2), ingredient5=Ingredient(name='Carrots', weight_in_grams=150), ingredient6=Ingredient(name='Celery', weight_in_grams=50), ingredient7=Ingredient(name='Ground Beef', weight_in_grams=450), ingredient8=Ingredient(name='Tomato Sauce', weight_in_grams=800))

In [90]:
result.__dict__

{'recipe_name': 'Spaghetti Bolognese',
 'servings': 4,
 'ingredient1': Ingredient(name='Spaghetti', weight_in_grams=450),
 'ingredient2': Ingredient(name='Olive Oil', weight_in_grams=2),
 'ingredient3': Ingredient(name='Onion', weight_in_grams=150),
 'ingredient4': Ingredient(name='Garlic', weight_in_grams=2),
 'ingredient5': Ingredient(name='Carrots', weight_in_grams=150),
 'ingredient6': Ingredient(name='Celery', weight_in_grams=50),
 'ingredient7': Ingredient(name='Ground Beef', weight_in_grams=450),
 'ingredient8': Ingredient(name='Tomato Sauce', weight_in_grams=800)}

In [92]:
result.recipe_name

'Spaghetti Bolognese'

## Chain of thought example

In [102]:
import nest_asyncio
nest_asyncio.apply()

@lmql.query(model=llm)
def chain_of_thought(question):
    '''lmql
    # Q&A prompt template
    "Q: {question}\n"
    "A: Let's think step by step.\n"
    "[REASONING]"
    "Thus, the answer is:[ANSWER]." where STOPS_AT(ANSWER, ".")

    # return just the ANSWER to the caller
    return ANSWER.strip()
    '''

result = chain_of_thought('Today is the 12th of June, what day was it 1 week ago?')
result

'The day one week ago was the 5th of June.'

---
---

### Read in data with SimpleDirectoryReader
https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/

more readers availble at https://llamahub.ai/

In [None]:
# # You can specify a function that will read each file and extract metadata that gets attached to the resulting Document
# def get_meta(file_path):
#     return {"foo": "bar", "file_path": file_path}


# SimpleDirectoryReader(input_files=["/home/dorota/LLM-diploma-project/00_concept_tests/data/40001_2023_Article_1364.pdf"], file_metadata=get_meta)

In [None]:
# # additional possibilities with SimpleDirectoryReader
# documents = SimpleDirectoryReader(input_dir="/home/dorota/LLM-diploma-project/00_concept_tests/data", recursive=True).load_data(num_workers=4)

In [None]:
documents[0].metadata

In [None]:
print(documents[0].text)

---
---

## Create nodes with:

### 1. SentenceSplitter
The SentenceSplitter attempts to split text in chunks while respecting the boundaries of sentences. \
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [None]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=20,
)
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
# # can be defined globaly
# Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)

# # an be dafound per-index through transformations
# index = VectorStoreIndex.from_documents(
#     documents,
#     transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)],
# )

In [None]:
len(nodes)

In [None]:
print(nodes[2].text)

### 2. SentenceWindowNodeParser
Splits all documents into individual sentences. The resulting nodes also contain the surrounding "window" of sentences around each node in the metadata.\
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [None]:
import nltk
from llama_index.core.node_parser import SentenceWindowNodeParser

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=2,  # how many sentences on either side to capture
    window_metadata_key="window", # the metadata key that holds the window of surrounding sentences
    original_text_metadata_key="original_sentence", # the metadata key that holds the original sentence
)

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
len(nodes)

In [None]:
print(nodes[3])

In [None]:
print(nodes[3].text)

### 3. SemanticSplitterNodeParser
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [None]:
from llama_index.core.node_parser import SemanticSplitterNodeParser

splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=Settings.embed_model
)

In [None]:
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
len(nodes)

In [None]:
print(nodes[2].text)

### 4. HierarchicalNodeParser
Input is chunked into several hierarchies of chunk sizes, with each node containing a reference to it's parent node. https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/ \
When combined with the AutoMergingRetriever, this enables us to automatically replace retrieved nodes with their parents when a majority of children are retrieved. https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever/ (conclusion in tutorial that output quality similar to non hierarchical approach...)

Chunk into parent, child, grandchild (leaf) nodes

In [None]:
from llama_index.core.node_parser import HierarchicalNodeParser

splitter = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128] # chunk size parent, child, grandchild
)

In [None]:
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
len(nodes)

In [None]:
nodes[10]

Isolate grandchild nodes from root nodes

In [None]:
from llama_index.core.node_parser import get_leaf_nodes, get_root_nodes

base_nodes = get_leaf_nodes(nodes)
root_nodes = get_root_nodes(nodes)

len(base_nodes), len(root_nodes)

Load all nodes into SimpleDocumentStore and only leaf nodes into VectoreStore

In [None]:
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core import StorageContext

docstore = SimpleDocumentStore()
docstore.add_documents(nodes)
storage_context = StorageContext.from_defaults(docstore=docstore) # define storage context (will include vector store by default too)

## Load index into vector index
from llama_index.core import VectorStoreIndex

base_index = VectorStoreIndex(
    base_nodes,
    storage_context=storage_context,
)

Define Retriever

In [None]:
from llama_index.core.retrievers import AutoMergingRetriever

base_retriever = base_index.as_retriever(similarity_top_k=3)
retriever = AutoMergingRetriever(base_retriever, storage_context, verbose=True)

# query_str = ("What is the title of the article?")
query_str = ("What is the main topic of the article?")

nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

len(nodes), len(base_nodes)

In [None]:
from llama_index.core.response.notebook_utils import display_source_node
import matplotlib

for node in base_nodes:
    display_source_node(node, source_length=10000)

In [None]:
for node in nodes:
    display_source_node(node, source_length=10000)

---
---
---

TokenTextSplitter https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_metadata_extractor/

In [None]:
# NOTE: seem to be the same output: nodes.get_content(), nodes.text, nodes.get_text()