# LlamaIndex

### Within the framework of **llamaindex**, the main elements of **RAG** are represented by the following modules:

1. **Data Connectors:** These are modules responsible for loading and processing data. They allow you to load data, create nodes from them, and build an index.

2. **Retrievers:** These modules are responsible for searching the index. They enable searching the index and obtaining the most relevant nodes.

3. **Query Engines:** These modules are responsible for generating a response to a query. They use information obtained from retrievers to generate a response.


In [2]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader # Data Connectors

In [3]:
import openai
import os
from dotenv import load_dotenv, find_dotenv
_=load_dotenv(find_dotenv()) # read loacl .env file

In [4]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [5]:
DATA = os.getenv('DATA')
documents = SimpleDirectoryReader(DATA).load_data()

In [6]:
documents

[Document(id_='03b36c9e-ebb4-450d-a295-12abfffad50b', embedding=None, metadata={'file_path': '/Users/gala/PycharmProjects/Scalian_Chat/ScalianChatBot/Data/www_scalian-spain_es_.txt', 'file_name': 'www_scalian-spain_es_.txt', 'file_type': 'text/plain', 'file_size': 7028, 'creation_date': '2024-02-06', 'last_modified_date': '2024-02-06', 'last_accessed_date': '2024-02-06'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='title:\nurl:\n\nDESCUBRE SCALIAN SPAIN\n\n\n - Descubre las nuevas perspectivas del negocio basadas en datos\n - Una nueva era de datos, procesamiento masivo de información e Inteligencia Artificial está marcando el futuro de los procesos empresariales, sociales, económicos y políticos. \u200b\n - Descubre nuestras especialidades\n\n - La 

As expected, in the list, we have one document with the type llama_index.schema.Document. Now let's try to index our document in the vector database. This is handled by the class 
## **ChatGPTRetrievalPluginIndex**. It has a method for creating an index based on a list of documents called `from_documents`. There are various vector databases available for this purpose.


GPTSimpleVectorIndex
    
GPTFaissIndex
    
GPTWeaviateIndex
    
GPTPineconeIndex
    
GPTQdrantIndex
    
GPTChromaIndex
    
GPTMilvusIndex
    
GPTDeepLakeIndex
    
GPTMyScaleIndex

## GPTSimpleVectorIndex

In [7]:
from llama_index.indices.vector_store import GPTVectorStoreIndex
from llama_index.indices.keyword_table import GPTKeywordTableIndex

For now, we are using the simple option GPTSimpleVectorIndex, but beforehand, you need to set the OpenAI API key to use the embedding model.


In llamaIndex, the query_engine is an object that manages the search process in the index. The index is a data structure that stores "nodes." A node corresponds to a fragment of text from a document. LlamaIndex takes document objects and internally breaks them into node objects.

In our case, it searches only through one node because we haven't split our text into nodes.
.


In [9]:
# index object
index = GPTVectorStoreIndex.from_documents(documents)


# query engine object
query_engine = index.as_query_engine()

# query to the index
response2 = query_engine.query(
	'What strategies does Scalian Spain employ to ensure a smooth transition to new operational management paradigms, and how do they integrate digitalization, performance management, and expertise to achieve this goal?'
)
print(response2.response)

Scalian Spain employs a combination of digitalization of processes, digital transformation, and performance management expertise to ensure a smooth transition to new operational management paradigms. They integrate these strategies by leveraging the latest technologies and the support of their Data & Insights division. This includes optimizing processes, improving operational intelligence, forecasting and planning, establishing production policies and programs, and developing data collection systems and digital processes. By visualizing and analyzing real-time data in their supply chain and production, Scalian Spain aims to enhance response capability, performance, quality, cost, and efficiency. Additionally, they focus on improving asset reliability and availability while minimizing operational risks and costs. Overall, Scalian Spain's approach involves a holistic integration of digitalization, performance management, and expertise to drive business growth and enhance operational perf

In [10]:
# level of confidence
response2.source_nodes[0].score

0.8539750999601693

In [11]:
response2

Response(response="Scalian Spain employs a combination of digitalization of processes, digital transformation, and performance management expertise to ensure a smooth transition to new operational management paradigms. They integrate these strategies by leveraging the latest technologies and the support of their Data & Insights division. This includes optimizing processes, improving operational intelligence, forecasting and planning, establishing production policies and programs, and developing data collection systems and digital processes. By visualizing and analyzing real-time data in their supply chain and production, Scalian Spain aims to enhance response capability, performance, quality, cost, and efficiency. Additionally, they focus on improving asset reliability and availability while minimizing operational risks and costs. Overall, Scalian Spain's approach involves a holistic integration of digitalization, performance management, and expertise to drive business growth and enhan

It is important to note that "relevance" here is determined by the model trained to understand which nodes may be relevant to a specific query. This may involve analyzing the semantic content of the query and nodes, as well as other factors.

