# **LlamaIndex: the ultimate LLM framework for indexing and retrieval**[<sup>source<sup>](https://towardsdatascience.com/llamaindex-the-ultimate-llm-framework-for-indexing-and-retrieval-fa588d8ca03e)

**`LlamaIndex`** , previously known as the **GPT Index**, is a remarkable data framework aimed at helping you build applications with LLMs by providing essential tools that facilitate data ingestion, structuring, retrieval, and integration with various application frameworks. The capabilities offered by LlamaIndex are numerous and highly valuable:

✅ Ingest from different data sources and data formats using `Data connectors` (Llama Hub).

✅ Enable document operations such as inserting, deleting, updating, and refreshing the document index.

✅ Support synthesis over heterogeneous data and multiple documents.

✅ Use “Router” to pick between different query engines.

✅ Allow for the hypothetical document embeddings to enhance output quality

✅ Offer a wide range of integrations with various vector stores, ChatGPT plugins, tracing tools, and LangChain, among others.

✅ Support the brand new OpenAI function calling API.

## Data connectors (LlamaHub)
The [Llama Hub](https://llamahub.ai) offers a wide range of over 100 data sources and formats, allowing LlamaIndex or LangChain to ingest data in a consistent manner. 
 
By default, you can ``` pip install llama-hub```  and use it as a standalone package. You may also choose to use our `download_loader` method to individually download a data loader for use with LlamaIndex.

In [5]:
from llama_hub.wikipedia.base import WikipediaReader
from llama_index import download_loader

# another ways
# WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()
documents = loader.load_data(pages=['Berlin', 'Rome', 'Tokyo', 'Canberra', 'Santiago'])

Cada documento contiene: 
```python 
'id_', 'embedding', 'metadata', 'excluded_embed_metadata_keys', 'excluded_llm_metadata_keys', 'relationships', 'hash', 'text', 'start_char_idx', 'end_char_idx', 'text_template', 'metadata_template', 'metadata_seperator'
```

## Load LLM

In [None]:
from langchain import HuggingFaceHub
import os
from  dotenv import load_dotenv

# load the variables located in the file .env
load_dotenv()
# print(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))


In [1]:
# load the model
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

local_path = 'F:\\DOCUMENTOS\\DATA_SCIENCE\Large Language Models LLM\\ggml-gpt4all-j-v1.3-groovy.bin'
# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]
# Verbose is required to pass to the callback manager
llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True, backend='gptj')
llm_local = GPT4All(model=local_path, callbacks=callbacks, verbose=True, backend='gptj')


Found model file at  F:\\DOCUMENTOS\\DATA_SCIENCE\\Large Language Models LLM\\ggml-gpt4all-j-v1.3-groovy.bin
Found model file at  F:\\DOCUMENTOS\\DATA_SCIENCE\\Large Language Models LLM\\ggml-gpt4all-j-v1.3-groovy.bin


In [2]:
from llama_index.llms import LangChainLLM
from llama_index.indices.service_context import ServiceContext

llm = LangChainLLM(llm_local)

service_context = ServiceContext.from_defaults(llm=llm, embed_model='local')

## Basic query functionalities
**Index, retriever, and query** engine are three basic components for asking questions over your data or documents:

* `Index` is a data structure that allows us to retrieve relevant information quickly for a user query from external documents. Index works by parsing documents into text chunks, which are called ***“Node”*** objects, and then building index from the chunks.

* `Retriever` is used for fetching and retrieving relevant information given user query.

* `Query engine` is built on top of index and retriever providing a generic interface to ask questions about your data.

In [4]:
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("Who is Paul Graham.")


Paul Graham was a prominent Australian businessman who made his fortune in retailing during the 1950s and 1960s through his company "The Piggly Wigglys". He later became involved with various philanthropic organizations such as the National Museum of Australia Foundation and the Canberra Symphony Orchestra, where he served on their board for many years. Graham was also a strong supporter of education and established several schools in Canberra including St John's School (now known as The Australian College) which is now part of the University of Notre Dame Australia. He passed away at his home in Red Hill in 2019 after suffering from Alzheimer's disease, leaving behind an extensive legacy that continues to be celebrated by many people today.

In [12]:
response = query_engine.query("What is the population in Berlin actually ?")

The population of Berlin as of 2019 was approximately 5.2 million people within a city area of 891.1 km2 (344.1 sq mi).

## Handle document updates
Often times, once we create an index for our document, there might be a need to periodically update the document. This process can be costly if we were to recreate the embeddings for the entire document again. LlamaIndex index structure offers a solution by enabling efficient insertion, deletion, update, and refresh operations.

For example, a new document can be inserted as additional nodes (text chunks) without the need to recreate nodes from previous documents:

In [6]:
# Source: https://gpt-index.readthedocs.io/en/latest/how_to/index/document_management.html
from llama_index import ListIndex, Document

index = ListIndex([], service_context=service_context)
text_chunks = ['text_chunk_1', 'text_chunk_2', 'text_chunk_3']

doc_chunks = []
for i, text in enumerate(text_chunks):
    doc = Document(text=text, doc_id=f"doc_id_{i}")
    doc_chunks.append(doc)

# insert
for doc_chunk in doc_chunks:
    index.insert(doc_chunk,)


## Query multiple documents
With LlamaIndex, it’s easy to query multiple documents. This functionality is enabled through the `SubQuestionQueryEngine` class. When given a query, the query engine generates a “query plan” consisting of sub-queries against sub-documents, which are then synthesized to provide the final answer

In [21]:
# Source: https://gpt-index.readthedocs.io/en/latest/examples/usecases/10q_sub_question.html

from llama_index import download_loader, SimpleDirectoryReader
from pathlib import Path
from llama_index import VectorStoreIndex
from llama_index.response.pprint_utils import pprint_response

from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine

# Load data
# PDFReader = download_loader("PDFReader")
# loader = PDFReader()
document1 = SimpleDirectoryReader(input_files=['resources/cv_david_smith.pdf']).load_data()
document2 = SimpleDirectoryReader(input_files=['resources/cv_Jo Brown.pdf']).load_data()


# Build indices
index1 = VectorStoreIndex.from_documents(document1, service_context=service_context,)
index2 = VectorStoreIndex.from_documents(document2, service_context=service_context,)

# Build query engines
engine1 = index1.as_query_engine(similarity_top_k=3)
engine2 = index2.as_query_engine(similarity_top_k=3)

query_engine_tools = [
    QueryEngineTool(
        query_engine=engine2,
        metadata=ToolMetadata(
            name='jo_brow',
            description='Provides information about Jo Brow cuurriculum')),
    QueryEngineTool(
        query_engine=engine1,
        metadata=ToolMetadata(
            name='david_smith',
            description='Provides information about David Smith cuurriculum')), ]

# Run 
from llama_index.question_gen.llm_generators import LLMQuestionGenerator
question_gen = LLMQuestionGenerator.from_defaults(
                        service_context=service_context
                    )
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools,service_context=service_context,question_gen=question_gen)
response = s_engine.query('Give me the names of the candidates you have')

## da RunTime Error al general dos subquestion puede ser producto de que estoy forzaando a usar un LLM Local y un embeddings open source

```json
[
    {
        "sub_question": "What is your name",
        "tool_name": "jo_brow"
    },
    {
        "sub_question": "Who are the candidates you have?",
        "tool_name": "david_smith"
    }
]
```Generated 2 sub questions.


RuntimeError: asyncio.run() cannot be called from a running event loop

## Use “Router” to pick between different query engines

Imagine you are building a bot to retrieve information from both Notion and Slack, how does the language model know which tool to use to search for information? LlamaIndex is like a clever helper that can find things for you, even if they are in different places. Specifically, LlamaIndex’s “Router” is a super simple abstraction that allows “picking” between different query engines.

In this example, we have two document indexes from Notion and Slack, and we create two query engines for each of them. After that, we put all the tools together and create a super tool called RouterQueryEngine, which picks which tool to use based on the description we gave to the individual tools. This way, when we ask a question about Notion, the router will automatically look for information from the Notion documents.

In [None]:

# routing-over-heterogeneous-data
from llama_index.query_engine import RouterQueryEngine
from llama_index import TreeIndex, VectorStoreIndex
from llama_index.tools import QueryEngineTool

# define sub-indices
index1 = VectorStoreIndex.from_documents(document1, service_context=service_context)
index2 = VectorStoreIndex.from_documents(document2, service_context=service_context)

# define query engines and tools
tool1 = QueryEngineTool.from_defaults(
    query_engine=index1.as_query_engine(),
    description="Use this query engine to obtain data about david smith ",
)
tool2 = QueryEngineTool.from_defaults(
    query_engine=index2.as_query_engine(),
    description="Use this query engine to obtain data about jo brow",
)
query_engine = RouterQueryEngine.from_defaults(
    query_engine_tools=[tool1, tool2],service_context=service_context)

response = query_engine.query(
    "In Notion, give me a summary of David."
)


## Hypothetical document embeddings (HyDE)
Typically, when we ask a question about an external document, what we normally do is that we use text **embeddings** to create *vector representations* for both the *question* and the doc*ument. Then we use *semantic search* to find the text chunks that are the most relevant to the question. However, the answer to the question may differ significantly from the question itself. What if we could **generate hypothetical answers** to our question first and then find the text chunks that are most relevant to the hypothetical answer? That’s where hypothetical document embeddings (HyDE) come into play and can potentially improve output quality.

In [29]:
# Source: https://gpt-index.readthedocs.io/en/latest/examples/query_transformations/HyDEQueryTransformDemo.html

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.indices.query.query_transform import HyDEQueryTransform
from llama_index.query_engine.transform_query_engine import TransformQueryEngine
from IPython.display import Markdown, display

# load documents
documents = SimpleDirectoryReader('resources/paul_graham_essay/').load_data()
index = VectorStoreIndex.from_documents(documents,service_context=service_context)

query_str = "what did paul graham do after going to RISD"


# First, we query without transformation: The same query string is used for embedding lookup and also summarization.

query_engine = index.as_query_engine()
response = query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))


After attending RISD for his undergraduate degree in painting, Paul Graham went on to study at Accademia di Belle Arti (ABA) in Florence. He was accepted into their program but had trouble adjusting due to language barriers and cultural differences between the United States and Italy. Despite these challenges, he eventually completed a Master's Degree from ABA with an emphasis in painting. After graduation, Graham returned home to Providence where he worked as a freelance artist for clients such as hedge fund managers and real estate developers. He also continued his studies at RISD by taking classes on color theory and composition.

<b>After attending RISD for his undergraduate degree in painting, Paul Graham went on to study at Accademia di Belle Arti (ABA) in Florence. He was accepted into their program but had trouble adjusting due to language barriers and cultural differences between the United States and Italy. Despite these challenges, he eventually completed a Master's Degree from ABA with an emphasis in painting. After graduation, Graham returned home to Providence where he worked as a freelance artist for clients such as hedge fund managers and real estate developers. He also continued his studies at RISD by taking classes on color theory and composition.</b>

Now, we use HyDEQueryTransform to generate a hypothetical document and use it for embedding lookup.

In [None]:
hyde = HyDEQueryTransform(llm_predictor=llm_local,include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))



In [None]:
#In this example, HyDE improves output quality significantly, by hallucinating accurately what Paul Graham did after RISD (see below), and thus improving the embedding quality, and final output.
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]
hyde_doc

`Failure case` HyDE may mislead when query can be mis-interpreted without context. See more in https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb 

## Use LlamaIndex with LangChain

`LangChain`, with its extensive list of features, casts a wider net, concentrating on the use of chains and agents to connect with external APIs. On the other hand, `LlamaIndex` has a narrower focus shining in the area of data indexing and document retrieval.

Here is an example where we used LlamaIndex to keep the chat history when using a LangChain agent. When we ask “what’s my name?” in the second round of conversation, the language model knows that “I am Bob” from the first round of conversation:
https://github.com/jerryjliu/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb


In [5]:
# Using LlamaIndex as a memory module
from langchain import OpenAI
from langchain.llms import OpenAIChat
from langchain.agents import initialize_agent

from llama_index import ListIndex
from llama_index.langchain_helpers.memory_wrapper import GPTIndexChatMemory

index = ListIndex([])
memory = GPTIndexChatMemory(
    index=index,
    memory_key="chat_history",
    query_kwargs={"response_mode": "compact"},
    # return_source returns source nodes instead of querying index
    return_source=True,
    # return_messages returns context in message format
    return_messages=True
)
llm = OpenAIChat(temperature=0)
# llm=OpenAI(temperature=0)
agent_executor = initialize_agent([], llm, agent="conversational-react-description", memory=memory)
