# Introduction to LlamaIndex

**LlamaIndex** is a framework to connect data and LLM's. This data will be loaded into a some type of structure that later will receive the LLM.

## Overview of RAG and it's components with LlamaIndex

The main objective of retrieval augmentation is to put some context to the prompt

The way RAG works is 

1.  A documents is loaded and divided into chunks. This chunks processed by a embedding model .Finally,  their vector representations are stored into a vector database. **This first step is the data ingestion**.
2.  **The second step is data querying(retrieval+synthesis)**. At this step, chunks of data are extracted from the vector database, based on the similarity with the user's prompt, and given as context to the LLM. You can extract the l-most similar chunks from the vector database and plug them to the synthesis module.

So, the main component's in this framework are these : 

-   *LlamaHub (Data ingestion)* : Connect to your existing data, like PDF's, doc's, DDBB's...
-   *Data Structures* : Store and index your data for different use cases. It can be integrated with different DDBB's, like vector db.
-   *Queries* : Retrieve and query over the stored data in the data structures. This includes agents, QA, summarization, ... 

## Vector Stores

Vector store databases enable to store high-dimensional data and provide the essential tools for semantically retrieving relevant documents. These systems analyze the emebddings vectors that encapsulate the entire document's meaning.

A primary function is the similarity search. Semantic search transcends traditional keyword matching. It captures the meaning in vectorized representations, and this technique can be applied to all data formats. Once we have the embedded format, we can calculate indexed similarities or capture the context embedded in the query. These ensures that the results are relevant and in line with the contextual and conceptual nuances of the user input's.

## Data Connectors

Managing data in diverse formats can be challenging, like PDF's, doc's, DDBB's, .csv's... To solve this problem we use the data connectors, also called `Readers`. Readers are responsible for parsing and converting the data into a simplified `Document` representation, **consisting in text and basic metadata**.

So, in summary, data connectors are designed to to streamline the data ingestion process, automating the process of fetching data fro differents sources and format it.

In [1]:
from llama_index.core import download_loader

WikipediaReader = download_loader("WikipediaReader") # Download the wikipedia reader to fetch documents from that website
loader = WikipediaReader() # Create an object of Wikipedia reader
documents = loader.load_data(pages=['Natural Language Processing', 'Artificial Intelligence']) # Get documents about NLP and IA
print(len(documents))

  WikipediaReader = download_loader("WikipediaReader") # Download the wikipedia reader to fetch documents from that website


2


## Nodes

Once the data is ingested as documents, it passes through a processing structure that transforms these documents into `Node` objects. Nodes are data units created from the original documents which constains also metadata and contextual information. In LlamaIndex, there's the `NodeParser` class, designed to convert the content of documents into structured nodes automatically. The `SimpleNodeParser` converts a list of documents objects into nodes.

In [2]:
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core import download_loader

# Download the document loader
WikipediaReader = download_loader("WikipediaReader")
# Create an object to get documents from Wikipedia
loader = WikipediaReader()
# Load documents
loader.load_data(pages=['Natural Language Processing', 'Artificial Intelligence'])

# Initialize the parser
parser = SimpleNodeParser.from_defaults(chunk_size=512, chunk_overlap=20) # Define number of token per chunk, and overlap between chunks
# Parse the documents into nodes
nodes = parser.get_nodes_from_documents(documents)
print(len(nodes))

  WikipediaReader = download_loader("WikipediaReader")


58


We can observe that have been generated 58 chunks from the 2 documents fetched from Wikipedia.

## Indexes

Indexing is an initial step for storing information in a database, transforming the unstructured data into embeddings that capture semantic meaning and optimize the data format, so it can be easily accessed and queried. The most popular indexes are these : 

### Summary Index

Extracts a summary from each document and stores it with all the nodes in that document. Since it’s not always easy to match small node embeddings with a query, sometimes having a document summary helps. 

### Vector Store Index

The vector store index generates embeddings during index construction to identify the top-k most similar nodes in response to a query.

It's suitable for small-scale applications and easily scalable to accommodate larger datasets using high-performance vector databases. 

We can create a dataset in **Activeloop** and append documents to it by employing the **DeepLakeVectorStore** class.

In [3]:
import json
from llama_index.vector_stores.deeplake import DeepLakeVectorStore
import os

# Change this code to load your ActiveLoop key
with open('../data/keys.json', 'r') as file:
    data = json.load(file)
    ActiveLoopKey = data['ActiveLoopKey'] # Activeloop personal key
    name_org = data['NameOrg'] # Activeloop name org

os.environ['ACTIVELOOP_TOKEN'] = ActiveLoopKey

# Create/connect an empty dataset in ActiveLoop cloud
my_activeloop_dataset_name = "LlamaIndex_intro"
dataset_path = f"hub://{name_org}/{my_activeloop_dataset_name}"
vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=False)



Deep Lake Dataset in hub://alejandrotormun/LlamaIndex_intro already exists, loading from the storage


In the previous code, we accessed to our personal information : key and organization name's, and then we created an empty dataframe in the ActiveLoop cloud's.

Now, we have to create a `StorageContext` object for storing nodes, indexes and embedding vectors. Once we have created it, we have to create a `VectorStoreIndex` class to create the index(generate embeddings) and store the results on the defined dataset.

In [7]:
from llama_index.core.storage.storage_context import StorageContext
from llama_index.core import download_loader
from llama_index.core import VectorStoreIndex
from langchain_ollama import OllamaEmbeddings
from llama_index.core import Settings

# We create the storage context from the ActiveLoop's dataset we created in the previous code.
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# We download the documents of interest
WikipediaReader = download_loader("WikipediaReader") # Download the wikipedia reader to fetch documents from that website
loader = WikipediaReader() # Create an object of Wikipedia reader
documents = loader.load_data(pages=['Natural Language Processing', 'Artificial Intelligence']) # Get documents about NLP and IA

# Finally, we create the Vector Store index to store the embeddings from the chunks generated from the documents
embedding_model = OllamaEmbeddings(model="llama3.1:8b") # Load the model to get the embeddings from the chunks
Settings.embed_model = embedding_model # Load it into the setting of llama index
index = VectorStoreIndex.from_documents( # Load the embeddings into the database in the cloud
    documents=documents,
    storage_context=storage_context,
    show_progress=True
)


  WikipediaReader = download_loader("WikipediaReader") # Download the wikipedia reader to fetch documents from that website


Parsing nodes: 100%|██████████| 2/2 [00:00<00:00, 86.85it/s]


[A[A

[A[A

[A[A

[A[A

Generating embeddings: 100%|██████████| 31/31 [00:20<00:00,  1.50it/s]

Uploading data to deeplake dataset.





[A[A

100%|██████████| 31/31 [00:02<00:00, 13.52it/s]
-

Dataset(path='hub://alejandrotormun/LlamaIndex_intro', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
 embedding  embedding  (31, 4096)  float32   None   
    id        text      (31, 1)      str     None   
 metadata     json      (31, 1)      str     None   
   text       text      (31, 1)      str     None   


 

In the previous code, we have used the embedding model of Ollama **llama_3.1:8b**, the lightest model.

## Query Engines

The next step is to query the information stored. To do that, we use a query engine, that combines a retriever and a response synthesizer into a pipeline.

The pipeline used a string to fetch nodes, and then, send them ot the LLM to generate a response

In [10]:
from llama_index.core import GPTVectorStoreIndex
from llama_index.core import download_loader
from langchain_ollama import OllamaLLM
from llama_index.core import Settings


# We download the documents of interest
WikipediaReader = download_loader("WikipediaReader") # Download the wikipedia reader to fetch documents from that website
loader = WikipediaReader() # Create an object of Wikipedia reader
documents = loader.load_data(pages=['Natural Language Processing', 'Artificial Intelligence']) # Get documents about NLP and IA

# Load a local Ollama model to use as LLM 
llm_model = OllamaLLM(model="llama3.1:8b")
Settings.llm = llm_model
# Create an index using the loaded documents from Wikipedia
index = GPTVectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()
response = query_engine.query("What does the NLP stands for?")
print(response.response)

  WikipediaReader = download_loader("WikipediaReader") # Download the wikipedia reader to fetch documents from that website
  output_str = self._llm.predict(prompt, **kwargs)


Based on the provided context information, it appears that "NLP" likely stands for "Natural Language Processing". This is inferred from the various topics discussed in the text, including approaches to NLP (symbolic, statistical, and neural networks), common NLP tasks (such as syntactic analysis, lexical semantics, and relational semantics), and specific subfields within NLP.


## Routers

Routers are used to select the optimal query engine for each task, improving performance and accuracy. These functions are beneficial when dealing with multiple data sources, each holding unique information. Consider an application that employs a SQL database and a Vector Store a it's knowledge base. In this setup, the router can determine which data source is most applicable to the given query. **We will do an example of it in the future**. [link](https://docs.llamaindex.ai/en/stable/module_guides/querying/router/#routers)

## Saving and Loading Indexes Locally

In [11]:
import os.path
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core import download_loader

if not os.path.exists("./storage"):
    WikipediaReader = download_loader("WikipediaReader")
    loader = WikipediaReader()
    documetns = loader.load_data(pages=['Natural Language Processing', 'Artificial Intelligence'])
    index = VectorStoreIndex.from_documents(documents=documents)
    index.storage_context.persist()
else:
    # If the index already exists, we'll just load it:
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context)

  WikipediaReader = download_loader("WikipediaReader")


# Summary and Main Challenges with naive RAG

In this notebook, we have learnt the main points of RAG : how to collect data; process it and store it into a vector store database; and finally, how to fetch it to be used by a LLM.


The main challenges with naive RAG are these : 
-   **Bad Retrieval**:
    -   *Low Precission* : Not all chunks in retrieved set are relevant. Can cause hallucination.
    -   *Low Recall* : Not all releveant chunks are actually retrieved. Lacks enough context for LLM.
    -   *Outdated information* : The data is redundant/out of date.
-   **Bad Response Generation**:
    -   *Hallucination* : Model makes an answer out of context.
    -   *Irrelevance* : Model makes an answer that doesn't answer the question.
    -   *Toxicity/Bias* : Model makes up an offensive answer.

There's also some challenges with RAG, like storing additional data(meta-data), optimize the embeddings or use the LLM for more than just text generation.

To solve this challenges, we will see how to increase the tools used in RAG in the next notebooks.