# What is Llama-Index and which problems it solves

Llama-Index is a system for building retrieval-augmented generation (RAG) applications using large language models (LLMs). It aims to solve common issues with naive RAG systems, such as poor response quality due to bad retrieval, hallucination, and loss of context. Llama-Index improves RAG through better retrieval, filtering, chunking, reranking, and fine-tuning.

![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*h4qoECB8eLEPQchsR5qA5A.gif)

Source: [LlamaIndex — Unleashes the power of ChatGPT over your own data](https://sharmadave.medium.com/llama-index-unleashes-the-power-of-chatgpt-over-your-own-data-b67cc2e4e277)

## Main components of Llama-Index 

![](./images/rag-stages.png)

There are five key stages within RAG, which in turn will be a part of any larger application you build. Llama-Index provides abstractions and classes for each of these steps. These are:

- **Loading**: this refers to getting your data from where it lives – whether it’s text files, PDFs, another website, a database, or an API – into your pipeline. LlamaHub provides hundreds of connectors to choose from.
    - **Nodes and Documents**: A `Document` is a container around any data source - for instance, a PDF, an API output, or retrieve data from a database. A `Node` represents a “chunk” of a source Document. *Documents* can be split into *Nodes*.
    - **Connectors**: A data connector (often called a `Reader`) ingests data from different data sources and data formats into `Documents` and `Nodes`.
- **Indexing**: this means creating a data structure that allows for querying the data. For LLMs this nearly always means creating vector embeddings, numerical representations of the meaning of your data, as well as numerous other metadata strategies to make it easy to accurately find contextually relevant data.
    - **Indexes**: format the data into a structure that’s easy to retrieve by generating vector embeddings which are stored in a specialized database called a vector store
    - **Embeddings** LLMs generate numerical representations of data called embeddings. When filtering your data for relevance, LlamaIndex will convert queries into embeddings, and your vector store will find data that is numerically similar to the embedding of your query.
- **Storing**: once your data is indexed you will almost always want to store your index, as well as other metadata, to avoid having to re-index it.
    - A **vector store**: a type of database that is designed to store and handle vector data - a set of coordinates in a multidimensional space which represents mathematically complex data such as images, text, and audio. Vector stores are optimized for operations such as nearest neighbor search, which is a common operation in these applications. In the context of LlamaIndex, the vector store contains the embedding vectors of the ingested document chunks.
    - A **document store**, also known as a **document-oriented database**: a type of non-relational database that is designed to store, retrieve, and manage document-oriented information. These documents can be stored in formats such as JSON, XML, BSON, and others. Document stores are optimized for operations such as querying and processing of documents. In the context of LlamaIndex, the document store would contain the actual documents that are being indexed and queried.
- **Querying**: for any given indexing strategy there are many ways you can utilize LLMs and LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid strategies.
    - **Retrievers**: A retriever defines how to efficiently retrieve relevant context from an index when given a query. Your retrieval strategy is key to the relevancy of the data retrieved and the efficiency with which it’s done.
    - **Routers**: A router determines which retriever will be used to retrieve relevant context from the knowledge base. More specifically, the `RouterRetriever` class, is responsible for selecting one or multiple candidate retrievers to execute a query. They use a selector to choose the best option based on each candidate’s metadata and the query.
    - **Node Postprocessors**: A node postprocessor takes in a set of retrieved nodes and applies transformations, filtering, or re-ranking logic to them.
    - **Response Synthesizers**: A response synthesizer generates a response from an LLM, using a user query and a given set of retrieved text chunks.
- **Evaluation**: a critical step in any pipeline is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures of how accurate, faithful and fast your responses to queries are.

## Integrations with AWS

Llama-index integrates with multiple AWS services:
1. LLMs and Embeddings Models
    - Amazon Bedrock (`pip install llama-index-llms-bedrock`)
    - Amazon SageMaker Endpoints (`pip install llama-index-llms-sagemaker-endpoint`)
2. Vector Stores
    - Amazon OpenSearch (`pip install llama-index-vector-stores-opensearch`)
    - Keyspaces for Apache Cassandra (`pip install llama-index-vector-stores-cassandra`)
    - Amazon DynamoDB (`pip install llama-index-vector-stores-dynamodb`)
    - Postgres (`pip install llama-index-vector-stores-postgres`)
    - Redis (`pip install llama-index-vector-stores-redis`)
    - Any self-hosted database ☺️️

## How to get started
You can get started with Llama-Index in just a few lines of code. The key steps are:

1. Ingest your data into a vector database 
2. Index your data for retrieval
3. Load a LLM
4. Query over the data by retrieving chunks and synthesizing with the LLM

In [1]:
%pip install -r requirements.txt

Collecting llama-index (from -r requirements.txt (line 1))
  Using cached llama_index-0.10.30-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-llms-sagemaker-endpoint (from -r requirements.txt (line 2))
  Using cached llama_index_llms_sagemaker_endpoint-0.1.3-py3-none-any.whl.metadata (731 bytes)
Collecting llama-index-embeddings-sagemaker-endpoint (from -r requirements.txt (line 3))
  Using cached llama_index_embeddings_sagemaker_endpoint-0.1.3-py3-none-any.whl.metadata (690 bytes)
Collecting llama-index-llms-bedrock (from -r requirements.txt (line 4))
  Using cached llama_index_llms_bedrock-0.1.6-py3-none-any.whl.metadata (687 bytes)
Collecting llama-index-embeddings-bedrock (from -r requirements.txt (line 5))
  Using cached llama_index_embeddings_bedrock-0.1.4-py3-none-any.whl.metadata (646 bytes)
Collecting sagemaker (from -r requirements.txt (line 6))
  Using cached sagemaker-2.216.0-py3-none-any.whl.metadata (14 kB)
Collecting boto3 (from -r requirements.txt (line 7))
  U