# LlamaIndex

### Within the framework of **llamaindex**, the main elements of **RAG** are represented by the following modules:

1. **Data Connectors:** These are modules responsible for loading and processing data. They allow you to load data, create nodes from them, and build an index.

2. **Retrievers:** These modules are responsible for searching the index. They enable searching the index and obtaining the most relevant nodes.

3. **Query Engines:** These modules are responsible for generating a response to a query. They use information obtained from retrievers to generate a response.


In [1]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader # Data Connectors

In [6]:
import openai
import os
from dotenv import load_dotenv, find_dotenv
_=load_dotenv(find_dotenv()) # read loacl .env file

In [3]:
openai.api_key = os.getenv("OPENAI_API_KEY")

In [7]:
DATA = os.getenv('DATA')
documents = SimpleDirectoryReader(DATA).load_data()

In [8]:
documents

[Document(id_='8fedd91f-a1ad-458c-92ac-e5ebe0291d8f', embedding=None, metadata={'file_path': '/Users/gala/PycharmProjects/Scalian_Chat/ScalianChatBot/Data/scalian_es.txt', 'file_name': 'scalian_es.txt', 'file_type': 'text/plain', 'file_size': 4541, 'creation_date': '2024-02-02', 'last_modified_date': '2024-01-30', 'last_accessed_date': '2024-02-02'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text="Scalian Spain is a leading technology consulting and engineering company that provides innovative solutions to clients across various industries. With a strong focus on digital transformation, Scalian Spain helps businesses optimize their operations, improve efficiency, and drive growth.\n\nEstablished in 2009, Scalian Spain is part of the Scalian Group, a glob

As expected, in the list, we have one document with the type llama_index.schema.Document. Now let's try to index our document in the vector database. This is handled by the class 
## **ChatGPTRetrievalPluginIndex**. It has a method for creating an index based on a list of documents called `from_documents`. There are various vector databases available for this purpose.


GPTSimpleVectorIndex
    
GPTFaissIndex
    
GPTWeaviateIndex
    
GPTPineconeIndex
    
GPTQdrantIndex
    
GPTChromaIndex
    
GPTMilvusIndex
    
GPTDeepLakeIndex
    
GPTMyScaleIndex

## GPTSimpleVectorIndex

In [9]:
from llama_index.indices.vector_store import GPTVectorStoreIndex
from llama_index.indices.keyword_table import GPTKeywordTableIndex

For now, we are using the simple option GPTSimpleVectorIndex, but beforehand, you need to set the OpenAI API key to use the embedding model.


In llamaIndex, the query_engine is an object that manages the search process in the index. The index is a data structure that stores "nodes." A node corresponds to a fragment of text from a document. LlamaIndex takes document objects and internally breaks them into node objects.

In our case, it searches only through one node because we haven't split our text into nodes.
.


In [10]:
# index object
index = GPTVectorStoreIndex.from_documents(documents)


# query engine object
query_engine = index.as_query_engine()

# query to the index
response2 = query_engine.query(
	'What services offer scalain?'
)
print(response2.response)

Scalian offers a wide range of services to its clients. These services include software development, system integration, data analytics, cloud computing, and cybersecurity.


In [11]:
# level of confidence
response2.source_nodes[0].score

0.7906174234772851

In [12]:
response2

Response(response='Scalian offers a wide range of services to its clients. These services include software development, system integration, data analytics, cloud computing, and cybersecurity.', source_nodes=[NodeWithScore(node=TextNode(id_='cc8b5f7b-416f-4883-8339-cecbd7e20818', embedding=None, metadata={'file_path': '/Users/gala/PycharmProjects/Scalian_Chat/ScalianChatBot/Data/scalian_es.txt', 'file_name': 'scalian_es.txt', 'file_type': 'text/plain', 'file_size': 4541, 'creation_date': '2024-02-02', 'last_modified_date': '2024-01-30', 'last_accessed_date': '2024-02-02'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='8fedd91f-a1ad-458c-92ac-e5ebe0291d8f', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'f

It is important to note that "relevance" here is determined by the model trained to understand which nodes may be relevant to a specific query. This may involve analyzing the semantic content of the query and nodes, as well as other factors.

