# LMQI with LlamaIndex

CONTENT:


---
CONCLUSIONS:
* LMQL does not seem to support Pydantic, instead @dataclasse is used
---
---

## VectorStoreIndex with documents vs nodes

https://lmql.ai/docs/latest/lib/integrations/llama_index.html - outdated example with regard to query/query_engine \
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/


NOTE: to runt the notebook
1. Remove 'local:' from llm = lmql.model("local:llama.cpp:/home/dorota/models/mistral-7b-instruct-v0.2.Q6_K.gguf", tokenizer="mistralai/Mistral-7B-Instruct-v0.2"), 
1. start a service in terminal with: lmql serve-model llama.cpp:/home/dorota/models/mistral-7b-instruct-v0.2.Q6_K.gguf --verbose True --n_gpu_layers 20 --n_ctx 0


In [1]:
import lmql
from llama_index.core import GPTVectorStoreIndex, VectorStoreIndex, SimpleDirectoryReader, ServiceContext, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from transformers import AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# llama.cpp endpoint: https://lmql.ai/docs/models/llama.cpp.html#running-without-a-model-server
# tokenizer.model from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/tree/main

llm = lmql.model("llama.cpp:/home/dorota/models/mistral-7b-instruct-v0.2.Q6_K.gguf", tokenizer="mistralai/Mistral-7B-Instruct-v0.2", n_gpu_layers=10, n_ctx=0, verbose=False) 

In [90]:
# read in all documents from assigned folder
documents = SimpleDirectoryReader(input_files=["/home/dorota/LLM-diploma-project/00_concept_tests/data/40001_2023_Article_1364.pdf"]).load_data() # -> list of Document objects with 1 doc/page in article with metadata and tags (documents[0].text)

In [3]:
# set global variables to create vector embeddings for text nodes
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2').encode

with documents

In [14]:
index = VectorStoreIndex.from_documents(documents, show_progress=True) #[0:1] # index = VectorStoreIndex(nodes)

Settings.llm = None # =None to enable correct setting in query_engine
query_engine = index.as_query_engine(streaming=True, llm=None) # llm=None sets llm to Settings.llm thus defined as None

Parsing nodes: 100%|██████████| 21/21 [00:00<00:00, 251.36it/s]
Generating embeddings: 100%|██████████| 47/47 [00:00<00:00, 191.11it/s]

LLM is explicitly disabled. Using MockLLM.





In [None]:
# question = "What is the main topic of the article?"
# response = query_engine.query(question)
# response.source_nodes

In [None]:
# print(response.source_nodes[0].node.text)

with Semantic nodes

In [5]:
from llama_index.core.node_parser import SemanticSplitterNodeParser

splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=Settings.embed_model
)

nodes = splitter.get_nodes_from_documents(documents)

index= VectorStoreIndex(nodes)

Settings.llm = None
query_engine = index.as_query_engine(streaming=True, llm=None) 

LLM is explicitly disabled. Using MockLLM.


In [25]:
len(nodes)

96

In [94]:
similarity_top_k = 2

@lmql.query(model=llm)
async def index_query(question: str):
    '''lmql
    "You are a QA bot that helps users answer questions.\n"
    
    # ask the question
    "Question: {question}\n"

    # look up and insert relevant information into the context
    response = query_engine.query(question)
    for s in response.source_nodes:
        print(s.node.get_text())
        print('----------------------------------------------------------')
    information = "\n\n".join([s.node.get_text() for s in response.source_nodes])
    "\nRelevant Information: {information}\n"
    
    # generate a response
    "Your response based on relevant information:[RESPONSE]" where len(RESPONSE) < 200 and STOPS_AT(RESPONSE, ".")
    '''


In [95]:
result = await index_query("What is the main finding?", 
                   output_writer=lmql.stream(variable="RESPONSE"))


Page 14 of 21 Xu et al. European Journal of Medical Research          (2023) 28:461 
and new knowledge that emerged. 
----------------------------------------------------------
Employing a segmentation process, topics exhibit -
ing akin clusters were deftly allocated to cohesive areas, 
thereby engendering a heightened sense of organization 
and a more comprehensive grasp of the underlying data 
(Fig.  8a). In this analysis, a keyword co-occurrence analy -
sis was conducted to identify the most frequently appear -
ing terms. The analysis included five keywords: “breast 
cancer” with 1339 occurrences, “expression” with 831 
occurrences, “cancer” with 407 occurrences, “protein” 
with 358 occurrences, and “translation” with 350 occur -
rences. These results suggest that the analysis primarily 
focused on the relationship between breast cancer and 
protein synthesis, including gene expression, translation, 
and apoptosis. The aim of this analysis was to identify the 
most frequent keywords

The main finding of the study is that the analysis primarily focused on the relationship between breast cancer and protein synthesis, including gene expression, translation, and apoptosis.

## LMQL output to a  @dataclass for strucutred output
https://lmql.ai/blog/ \
https://www.timlrx.com/blog/generating-structured-output-from-llms#lmql

### Tutorial example

In [3]:
import lmql
from dataclasses import dataclass

@dataclass
class Ingredient:
    name: str
    weight_in_grams: int

@dataclass
class Recipe:
    recipe_name: str
    servings: int
    ingredient1: Ingredient
    ingredient2: Ingredient
    ingredient3: Ingredient
    ingredient4: Ingredient
    ingredient5: Ingredient
    ingredient6: Ingredient
    ingredient7: Ingredient
    ingredient8: Ingredient
    # list not supported...

@lmql.query(model=llm)
async def spaghetti():
    '''lmql
    "Spaghetti bolognese recipe for a family of 4."
    "[RECIPE_DATA]\\n" where type(RECIPE_DATA) is Recipe
    return RECIPE_DATA
    '''

result = await spaghetti()

In [4]:
result

Recipe(recipe_name='Spaghetti Bolognese', servings=4, ingredient1=Ingredient(name='Spaghetti', weight_in_grams=454), ingredient2=Ingredient(name='Ground Beef', weight_in_grams=454), ingredient3=Ingredient(name='Olive Oil', weight_in_grams=2), ingredient4=Ingredient(name='Onion', weight_in_grams=150), ingredient5=Ingredient(name='Garlic', weight_in_grams=3), ingredient6=Ingredient(name='Tomato Sauce', weight_in_grams=845), ingredient7=Ingredient(name='Tomato Paste', weight_in_grams=113), ingredient8=Ingredient(name='Salt', weight_in_grams=2))

In [8]:
result.__dict__

{'recipe_name': 'Spaghetti Bolognese',
 'servings': 4,
 'ingredient1': Ingredient(name='Spaghetti', weight_in_grams=450),
 'ingredient2': Ingredient(name='Olive Oil', weight_in_grams=2),
 'ingredient3': Ingredient(name='Onion', weight_in_grams=150),
 'ingredient4': Ingredient(name='Garlic', weight_in_grams=2),
 'ingredient5': Ingredient(name='Carrots', weight_in_grams=100),
 'ingredient6': Ingredient(name='Celery', weight_in_grams=50),
 'ingredient7': Ingredient(name='Ground Beef', weight_in_grams=400),
 'ingredient8': Ingredient(name='Tomato Sauce', weight_in_grams=800)}

In [11]:
result.ingredient1

Ingredient(name='Spaghetti', weight_in_grams=450)

In [5]:
Recipe.__dataclass_fields__['ingredient1'].name #Recipe.__dataclass_fields__ is a dict

'ingredient1'

In [22]:
Recipe.__annotations__ # dict with name:type 

{'recipe_name': str,
 'servings': int,
 'ingredient1': str,
 'ingredient2': str,
 'ingredient3': str}

### Own example with recipe

In [59]:
import lmql
from dataclasses import dataclass
from typing import List

recipe = """Ingredients for 4 servings:
• 1 cup all-purpose flour
• 3 tablespoons granulated sugar
• 1 teaspoon baking powder
• 1/2 teaspoon baking soda
• 1/2 teaspoon salt
• 1 cup milk
• 1 egg
• 3 tablespoons unsalted butter, melted
• Toppings of your choice (fresh fruit, whipped cream, syrup, etc.)"""


@dataclass
class Recipe:
    recipe_name: str
    servings: int
    ingredient1: str
    ingredient2: str
    ingredient3: str

@lmql.query(model=llm, verbose=False)
async def get_recipe(recipe):
    '''lmql
    "{recipe}"
    "[RECIPE_DATA]\\n" where type(RECIPE_DATA) is Recipe
    return RECIPE_DATA
    '''

result = await get_recipe(recipe)

In [60]:
result

Recipe(recipe_name='Buttermilk Pancakes', servings=4, ingredient1='1 cup all-purpose flour', ingredient2='3 tablespoons granulated sugar', ingredient3='1 teaspoon baking powder')

In [102]:
import lmql
from dataclasses import dataclass
from typing import List

recipe = """Ingredients:
• 1 cup all-purpose flour
• 3 tablespoons granulated sugar
• 1 teaspoon baking powder
• 1/2 teaspoon baking soda
• 1/2 teaspoon salt
• 1 cup milk
• 1 egg
• 3 tablespoons unsalted butter, melted
• Toppings of your choice (fresh fruit, whipped cream, syrup, etc.)"""


@dataclass
class Recipe:
    recipe_name: str
    servings: int
    ingredient1: str
    ingredient2: str
    ingredient3: str

field_descriptions = [
    "Generate a short recipe name",
    "Generate number of servings based on ingredient amounts",
    "Extract ingredient name only",
    "Extract ingredient amount only",
    "Extract ingredient name and ingredient amount",
]

field_prompting = ""
for field_name, field_descr in zip(Recipe.__annotations__.keys(), field_descriptions):
    field_prompting += "For field " + "'" + field_name + "'" + " " + "follow these instructions: " + field_descr + "\n "

@lmql.query(model=llm, verbose=False)
async def get_recipe(recipe, field_prompting):
    '''lmql
    "{recipe}"
    "{field_prompting}"
    "[RECIPE_DATA]\\n" where type(RECIPE_DATA) is Recipe
    return RECIPE_DATA
    '''

result = await get_recipe(recipe, field_prompting)


#????????? hur få in tex WHERE review IN ('Good', 'Bad') eller längden på svaret??????????????

In [103]:
result

Recipe(recipe_name='Buttermilk Pancakes', servings=4, ingredient1='all-purpose flour', ingredient2='3 tbsp', ingredient3='1 cup')

### Own example with article as text

In [93]:
from pypdf import PdfReader 
  
reader = PdfReader('/home/dorota/LLM-diploma-project/00_concept_tests/data/40001_2023_Article_1364.pdf') 
num_pages = len(reader.pages)
TEXT = ""
for page_num in range(1): #change to range(num_pages) for whole document
    page = reader.pages[page_num]  
    TEXT += page.extract_text()

In [106]:
import lmql
from dataclasses import dataclass

@dataclass
class NodeMetadata:
    title: str
    authors: str
    pub_year: int
    key_words: str
    summary: str
    research_area: str
    quality: str
    quality_reason: str

field_descriptions = [
    "extract title from article",
    "extract authors from article",
    "extract publication year",
    "generate 5 new key words based on content in Abstract",
    "generate summary in 3 sentences",
    "generate 1 main research area described in article",
    "select 1 value from ['GOOD', 'BAD', 'EXCELLENT', 'CAN NOT SET QUALITY'] to define quality of article",
    "describe reason for chosen quality_score in 1 sentece",
]

field_prompting = ""
for field_name, field_descr in zip(Recipe.__annotations__.keys(), field_descriptions):
    field_prompting += "For field " + "'" + field_name + "'" + " " + "follow these instructions: " + field_descr + "\n "

@lmql.query(model=llm, verbose=True)
async def get_recipe(TEXT, field_prompting):
    '''lmql
    "{TEXT}"
    "{field_prompting}"
    "[NODE_METADATA]\\n" where type(NODE_METADATA) is NodeMetadata
    return NODE_METADATA
    '''

result = await get_recipe(TEXT, field_prompting)
result


#????????? hur få in tex WHERE review IN ('Good', 'Bad') eller längden på svaret??????????????

lmtp generate: [1, 28814, 28718, 29000, 299, 29000, 282, 28723, 28705, 13, 19298, 276, 9983, 302, 12195, 7982, 2600, 325, 28750, 28734, 28750, 28770, 28731, 28705, 28750, 28783, 28747, 28781, 28784, 28740, 259, 13, 3887, 1508, 2432, 28710, 28723, 1909, 28748, 28740, 28734, 28723, 28740, 28740, 28783, 28784, 28748, 28713, 28781, 28734, 28734, 28734, 28740, 28733, 28734, 28750, 28770, 28733, 28734, 28740, 28770, 28784, 28781, 28733, 28781, 13, 896, 1151, 17046, 13, 21551, 1837, 302, 29000, 2152, 529, 8875, 387, 9646, 14311, 28705, 13, 3467, 448, 21537, 477, 29000, 1237, 29000, 4837, 8524, 302, 29000, 28726, 10798, 14840, 28705, 13, 23206, 13, 28798, 515, 25257, 1500, 28718, 28740, 28725, 17038, 26211, 566, 26424, 28740, 28725, 1500, 23985, 28775, 28710, 602, 1054, 980, 28740, 28725, 816, 335, 980, 320, 602, 28750, 28725, 8693, 28724, 28710, 1500, 28718, 28740, 28725, 1337, 28710, 320, 602, 28740, 28725, 18421, 28744, 23985, 16168, 602, 28740, 28725, 28705, 13, 28828, 28550, 28729, 1724, 

lmtp generate: [1, 28814, 28718, 29000, 299, 29000, 282, 28723, 28705, 13, 19298, 276, 9983, 302, 12195, 7982, 2600, 325, 28750, 28734, 28750, 28770, 28731, 28705, 28750, 28783, 28747, 28781, 28784, 28740, 259, 13, 3887, 1508, 2432, 28710, 28723, 1909, 28748, 28740, 28734, 28723, 28740, 28740, 28783, 28784, 28748, 28713, 28781, 28734, 28734, 28734, 28740, 28733, 28734, 28750, 28770, 28733, 28734, 28740, 28770, 28784, 28781, 28733, 28781, 13, 896, 1151, 17046, 13, 21551, 1837, 302, 29000, 2152, 529, 8875, 387, 9646, 14311, 28705, 13, 3467, 448, 21537, 477, 29000, 1237, 29000, 4837, 8524, 302, 29000, 28726, 10798, 14840, 28705, 13, 23206, 13, 28798, 515, 25257, 1500, 28718, 28740, 28725, 17038, 26211, 566, 26424, 28740, 28725, 1500, 23985, 28775, 28710, 602, 1054, 980, 28740, 28725, 816, 335, 980, 320, 602, 28750, 28725, 8693, 28724, 28710, 1500, 28718, 28740, 28725, 1337, 28710, 320, 602, 28740, 28725, 18421, 28744, 23985, 16168, 602, 28740, 28725, 28705, 13, 28828, 28550, 28729, 1724, 

Task was destroyed but it is pending!
task: <Task cancelling name='lmtp_ws_client_loop' coro=<LMTPDcModel.ws_client_loop() running at /home/dorota/LLM-diploma-project/venv/lib/python3.10/site-packages/lmql/models/lmtp/lmtp_dcmodel.py:108> wait_for=<Future finished result=True>>
Exception ignored in: <coroutine object LMTPDcModel.ws_client_loop at 0x7f6c7df940b0>
Traceback (most recent call last):
  File "/home/dorota/LLM-diploma-project/venv/lib/python3.10/site-packages/lmql/utils/graph.py", line 169, in add_edge
    self.edges.append({ "data": { "source": src, "target": dst } })
RuntimeError: coroutine ignored GeneratorExit


lmtp generate: [1, 28814, 28718, 29000, 299, 29000, 282, 28723, 28705, 13, 19298, 276, 9983, 302, 12195, 7982, 2600, 325, 28750, 28734, 28750, 28770, 28731, 28705, 28750, 28783, 28747, 28781, 28784, 28740, 259, 13, 3887, 1508, 2432, 28710, 28723, 1909, 28748, 28740, 28734, 28723, 28740, 28740, 28783, 28784, 28748, 28713, 28781, 28734, 28734, 28734, 28740, 28733, 28734, 28750, 28770, 28733, 28734, 28740, 28770, 28784, 28781, 28733, 28781, 13, 896, 1151, 17046, 13, 21551, 1837, 302, 29000, 2152, 529, 8875, 387, 9646, 14311, 28705, 13, 3467, 448, 21537, 477, 29000, 1237, 29000, 4837, 8524, 302, 29000, 28726, 10798, 14840, 28705, 13, 23206, 13, 28798, 515, 25257, 1500, 28718, 28740, 28725, 17038, 26211, 566, 26424, 28740, 28725, 1500, 23985, 28775, 28710, 602, 1054, 980, 28740, 28725, 816, 335, 980, 320, 602, 28750, 28725, 8693, 28724, 28710, 1500, 28718, 28740, 28725, 1337, 28710, 320, 602, 28740, 28725, 18421, 28744, 23985, 16168, 602, 28740, 28725, 28705, 13, 28828, 28550, 28729, 1724, 

UnboundLocalError: local variable 'json_payload' referenced before assignment

---
## Chain of thought example

In [102]:
import nest_asyncio
nest_asyncio.apply()

@lmql.query(model=llm)
def chain_of_thought(question):
    '''lmql
    # Q&A prompt template
    "Q: {question}\n"
    "A: Let's think step by step.\n"
    "[REASONING]"
    "Thus, the answer is:[ANSWER]." where STOPS_AT(ANSWER, ".")

    # return just the ANSWER to the caller
    return ANSWER.strip()
    '''

result = chain_of_thought('Today is the 12th of June, what day was it 1 week ago?')
result

'The day one week ago was the 5th of June.'

---
---

### Read in data with SimpleDirectoryReader
https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/

more readers availble at https://llamahub.ai/

In [None]:
# # You can specify a function that will read each file and extract metadata that gets attached to the resulting Document
# def get_meta(file_path):
#     return {"foo": "bar", "file_path": file_path}


# SimpleDirectoryReader(input_files=["/home/dorota/LLM-diploma-project/00_concept_tests/data/40001_2023_Article_1364.pdf"], file_metadata=get_meta)

In [None]:
# # additional possibilities with SimpleDirectoryReader
# documents = SimpleDirectoryReader(input_dir="/home/dorota/LLM-diploma-project/00_concept_tests/data", recursive=True).load_data(num_workers=4)

In [None]:
documents[0].metadata

In [None]:
print(documents[0].text)

---
---

## Create nodes with:

### 1. SentenceSplitter
The SentenceSplitter attempts to split text in chunks while respecting the boundaries of sentences. \
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [None]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=20,
)
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
# # can be defined globaly
# Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)

# # an be dafound per-index through transformations
# index = VectorStoreIndex.from_documents(
#     documents,
#     transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)],
# )

In [None]:
len(nodes)

In [None]:
print(nodes[2].text)

### 2. SentenceWindowNodeParser
Splits all documents into individual sentences. The resulting nodes also contain the surrounding "window" of sentences around each node in the metadata.\
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [None]:
import nltk
from llama_index.core.node_parser import SentenceWindowNodeParser

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=2,  # how many sentences on either side to capture
    window_metadata_key="window", # the metadata key that holds the window of surrounding sentences
    original_text_metadata_key="original_sentence", # the metadata key that holds the original sentence
)

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
len(nodes)

In [None]:
print(nodes[3])

In [None]:
print(nodes[3].text)

### 3. SemanticSplitterNodeParser
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/

In [None]:
from llama_index.core.node_parser import SemanticSplitterNodeParser

splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=Settings.embed_model
)

In [None]:
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
len(nodes)

In [None]:
print(nodes[2].text)

### 4. HierarchicalNodeParser
Input is chunked into several hierarchies of chunk sizes, with each node containing a reference to it's parent node. https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/ \
When combined with the AutoMergingRetriever, this enables us to automatically replace retrieved nodes with their parents when a majority of children are retrieved. https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever/ (conclusion in tutorial that output quality similar to non hierarchical approach...)

Chunk into parent, child, grandchild (leaf) nodes

In [None]:
from llama_index.core.node_parser import HierarchicalNodeParser

splitter = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128] # chunk size parent, child, grandchild
)

In [None]:
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
len(nodes)

In [None]:
nodes[10]

Isolate grandchild nodes from root nodes

In [None]:
from llama_index.core.node_parser import get_leaf_nodes, get_root_nodes

base_nodes = get_leaf_nodes(nodes)
root_nodes = get_root_nodes(nodes)

len(base_nodes), len(root_nodes)

Load all nodes into SimpleDocumentStore and only leaf nodes into VectoreStore

In [None]:
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core import StorageContext

docstore = SimpleDocumentStore()
docstore.add_documents(nodes)
storage_context = StorageContext.from_defaults(docstore=docstore) # define storage context (will include vector store by default too)

## Load index into vector index
from llama_index.core import VectorStoreIndex

base_index = VectorStoreIndex(
    base_nodes,
    storage_context=storage_context,
)

Define Retriever

In [None]:
from llama_index.core.retrievers import AutoMergingRetriever

base_retriever = base_index.as_retriever(similarity_top_k=3)
retriever = AutoMergingRetriever(base_retriever, storage_context, verbose=True)

# query_str = ("What is the title of the article?")
query_str = ("What is the main topic of the article?")

nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

len(nodes), len(base_nodes)

In [None]:
from llama_index.core.response.notebook_utils import display_source_node
import matplotlib

for node in base_nodes:
    display_source_node(node, source_length=10000)

In [None]:
for node in nodes:
    display_source_node(node, source_length=10000)

---
---
---

TokenTextSplitter https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_metadata_extractor/

In [None]:
# NOTE: seem to be the same output: nodes.get_content(), nodes.text, nodes.get_text()