<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/RedisIndexDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Redis Vector Store

In this notebook we are going to show a quick demo of using the RedisVectorStore.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install -U llama-index-vector-stores-redis

In [None]:
!pip install llama-index

In [None]:
import os
import sys
import logging
import textwrap

import warnings

warnings.filterwarnings("ignore")


# Uncomment to see debug logs
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document
from llama_index.vector_stores.redis import RedisVectorStore
from IPython.display import Markdown, display

### Start Redis

The easiest way to start Redis is using the [Redis Stack](https://hub.docker.com/r/redis/redis-stack) docker image or
quickly signing up for a [FREE Redis Cloud](https://redis.com/try-free) instance.

To follow every step of this tutorial, launch the image as follows:

```bash
docker run --name redis-vecdb -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
```

This will also launch the RedisInsight UI on port 8001 which you can view at http://localhost:8001.


### Setup OpenAI
Lets first begin by adding the openai api key. This will allow us to access openai for embeddings and to use chatgpt.

In [None]:
import getpass

oai_api_key = getpass.getpass("OpenAI API Key:")
os.environ["OPENAI_API_KEY"] = oai_api_key

Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

### Read in a dataset
Here we will use a set of Paul Graham essays to provide the text to turn into embeddings, store in a ``RedisVectorStore`` and query to find context for our LLM QnA loop.

In [None]:
# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print(
    "Document ID:",
    documents[0].id_,
    "Document Filename:",
    documents[0].metadata["file_name"],
)

You can process your files individually using [SimpleDirectoryReader](/examples/data_connectors/simple_directory_reader.ipynb):

In [None]:
loader = SimpleDirectoryReader("./data/paul_graham")
documents = loader.load_data()
for file in loader.input_files:
    print(file)
    # Here is where you would do any preprocessing

### Initialize the default Redis Vector Store

Now we have our documents prepared, we can initialize the Redis Vector Store with **default** settings. This will allow us to store our vectors in Redis and create an index for real-time search.

In [None]:
from llama_index.core import StorageContext
from redis import Redis

# create a Redis client connection
redis_client = Redis.from_url("redis://localhost:6379")

# create the vector store wrapper
vector_store = RedisVectorStore(redis_client=redis_client, overwrite=True)

# load storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# build and load index from documents and storage context
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

### Query the default vector store

Now that we have our data stored in the index, we can ask questions against the index.

The index will use the data as the knowledge base for an LLM. The default setting for as_query_engine() utilizes OpenAI embeddings and GPT as the language model. Therefore, an OpenAI key is required unless you opt for a customized or local language model.

Below we will test searches against out index and then full RAG with an LLM.

In [None]:
query_engine = index.as_query_engine()
retriever = index.as_retriever()

In [None]:
sources = query_engine.retrieve("What did the author learn?")
print(sources)

In [None]:
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))

In [None]:
sources = query_engine.retrieve("What was a hard moment for the author?")
print(sources)

In [None]:
response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))

### Use a custom index schema

In most use cases, you need the ability to customize the underling index configuration
and specification. For example, this is handy in order to define specific metadata filters you wish to enable.

With Redis, this is as simple as defining an index schema object
(from file or dict) and passing it through to the vector store client wrapper.

For this example, we will:
1. switch the embedding model to [Cohere](cohereai.com)
2. add an additional metadata field for the document `updated_at` timestamp
3. index the existing `file_name` metadata field

In [None]:
# set up Cohere Key
import getpass

co_api_key = getpass.getpass("Cohere API Key:")
os.environ["COHERE_API_KEY"] = co_api_key

In [None]:
documents[0]

In [None]:
from llama_index.core.settings import Settings
from llama_index.embeddings.cohere import CohereEmbedding

# set llamaindex to use Cohere embeddings
Settings.embed_model = CohereEmbedding()

In [None]:
from redisvl.schema import IndexSchema


custom_schema = IndexSchema.from_dict(
    {
        # customize basic index specs
        "index": {
            "name": "paul_graham",
            "prefix": "essay",
            "key_separator": ":",
        },
        # customize fields that are indexed
        "fields": [
            # required fields for llamaindex
            {"type": "tag", "name": "id"},
            {"type": "tag", "name": "doc_id"},
            {"type": "text", "name": "text"},
            # custom metadata fields
            {"type": "numeric", "name": "created_at"},
            {"type": "tag", "name": "file_name"},
            # custom vector field definition
            {
                "type": "vector",
                "name": "vector",
                "attrs": {
                    "dims": 1024,
                    "algorithm": "hnsw",
                    "distance_metric": "cosine",
                },
            },
        ],
    }
)

In [None]:
custom_schema.index

In [None]:
custom_schema.fields

Learn more about [schema and index design](https://redisvl.com) with redis.

In [None]:
from datetime import datetime


def date_to_timestamp(date_string: str) -> int:
    date_format: str = "%Y-%m-%d"
    return int(datetime.strptime(date_string, date_format).timestamp())


# iterate through documents and add new field
for document in documents:
    document.metadata["updated_at"] = date_to_timestamp(
        document.metadata["last_modified_date"]
    )

In [None]:
vector_store = RedisVectorStore(
    schema=custom_schema,  # provide customized schema
    redis_client=redis_client,
    overwrite=True,
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

# build and load index from documents and storage context
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

### Query the vector store and filter on metadata
Now that we have additional metadata indexed in Redis, let's try some queries with filters.

In [None]:
from llama_index.core.vector_stores import (
    MetadataFilters,
    MetadataFilter,
    ExactMatchFilter,
)


retriever = index.as_retriever(
    similarity_top_k=3,
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(key="file_name", value="paul_graham_essay.txt"),
            MetadataFilter(
                key="updated_at",
                value=date_to_timestamp("2024-01-01"),
                operator=">=",
            ),
        ],
        condition="and",
    ),
)

sources = retriever.retrieve("What did the author learn?")
print(sources)

### Restoring from an existing index in Redis
Restoring from an index requires a Redis connection client (or URL), `overwrite=False`, and passing in the same schema object used before. (This can be offloaded to a YAML file for convenience using `.to_yaml()`)

In [None]:
custom_schema.to_yaml("paul_graham.yaml")

In [None]:
vector_store = RedisVectorStore(
    schema=IndexSchema.from_yaml("paul_graham.yaml"),
    redis_client=redis_client,
    overwrite=False,
)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

**In the near future** -- we will implement a convenience method to load just using an index name:
```python
RedisVectorStore.from_existing_index(index_name="paul_graham", redis_client=redis_client)
```

### Deleting documents or index completely

Sometimes it may be useful to delete documents or the entire index. This can be done using the `delete` and `delete_index` methods.

In [None]:
document_id = documents[0].doc_id
document_id

In [None]:
print("Number of documents before deleting", redis_client.dbsize())
vector_store.delete(document_id)
print("Number of documents after deleting", redis_client.dbsize())

However, the Redis index still exists (with no associated documents) for continuous upsert.

In [None]:
vector_store.index_exists()

In [None]:
# now lets delete the index entirely
# this will delete all the documents and the index
vector_store.delete_index()

In [None]:
print("Number of documents after deleting", redis_client.dbsize())

### Troubleshooting

If you get an empty query result, there a couple of issues to check:

#### Prefix issues

Redis expects all records to have a key prefix that segments the keyspace into "partitions"
for potentially different applications, use cases, and clients.

Make sure that the chosen `prefix`, as part of the index schema, is consistent across your code (tied to a specific index).

To see what prefix your index was created with, you can run `FT.INFO <name of your index>` in the Redis CLI and look under `index_definition` => `prefixes`.

#### Data versus Index
Redis treats the records in the dataset and the index as different entities. This allows you more flexibility in performing updates, upserts, and index schema migrations.

If you have an existing index and want to make sure it's dropped, you can run `FT.DROPINDEX <name of your index>` in the Redis CLI. Note that this will *not* drop your actual data unless you pass `DD`

#### Empty queries when using metadata

If you add metadata to the index *after* it has already been created and then try to query over that metadata, your queries will come back empty.

Redis indexes fields upon index creation only (similar to how it indexes the prefixes, above).