# Introduction

Redis Enterprise is an enterprise-grade Redis, available both on-premises and in the cloud (on AWS, Google Cloud, or Azure). 
Redis Enterprise simplifies operations, scaling, and multi-tenancy includes many integrations (for example, Kubernetes), and provides multiple tiers of support.
<br>Redis Enterprise offers robust vector database features, with an efficient API for vector index creation, management, distance metric selection, similarity search, and hybrid filtering. When coupled with its versatile data structures - including lists, hashes, JSON, and sets - Redis Enterprise shines as the optimal solution for crafting high-quality Large Language Model (LLM)-based applications. It embodies a streamlined architecture and exceptional performance, making it an instrumental tool for production environments.

### Important use cases include:
* __Chatbots with RAG__
    <br>Ground chatbots in your data using Retrieval Augmented Generation (RAG) to enhance the quality of LLM responses.

* __Semantic caching__
    <br>Identify and retrieve cached LLM outputs to reduce response times and the number of requests to your LLM provider, which saves time and money.

* __Recommendation systems__
    <br>Power recommendation engines with fresh, relevant suggestions at low-latency, and point your users to the products they’re most likely to buy.

* __Document Search__
    <br>Make it easier to discover and retrieve information across documents and knowledge bases, using natural language and semantic search.

# Google's Vertex AI
Google's Vertex AI has expanded its capabilities by introducing Generative AI. This advanced technology comes with a specialized in-console studio experience, a dedicated API and Python SDK designed for deploying and managing instances of Google's powerful Gemini language models.

# Lab overview & Objective

In this Lab, we will implement a production-ready proof of concept by building an VSS application deployed on Google Cloud's infrastructure using Redis as a backbone.
Here we will use a sample IMDB movies dataset, load this in Redis and finally invoke different types of search queries to get insight from this dataset.
We will use the following libraries and frameworks:
* Google Colab for hosting Jupyter Notebook
* Redis Enterprise Cloud as Vector DB provider
* redis-py Python library for Redis
* redis-vl Python library for Vector specific tasks 
* Langchain for other vector management tasks



### Install dependencies


In [None]:
!pwd
!pip install --upgrade pip

# Install required libraries
!python3 -m pip -q install redis pandas
!pip install -U git+https://github.com/RedisVentures/redisvl.git google-cloud-aiplatform langchain gradio


#### Configure Redis 

Here we will leverage Redis Enterprise Cloud available through the GCP marketplace. 
Please follow these steps to get the Redis Enterprise Cloud database up & running.
* Log in to GCP Console, navigate to Marketplace and search for Redis Enterprise
* Click on the option that displays Redis Enterprise Cloud and subscribe to this
* The console will navigate to the Redis Enterprise Cloud URL
* Sign up for Redis Enterprise Cloud. The system will ask for confirmation
* Finally, create the database with your preferred region. For this exercise, an 'Essential' subscription will be sufficient. Select the Redis Stack
* Once the DB is active, note down the URL. This will be needed for subsequent steps


##### Alternative (In case Redis Enterprise Cloud is not configured)
Install Redis Community edition using following link

In [None]:
## Uncomment & execute the following code in case Redis Enterprise is not available
##################################################################################

# %%sh
# curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
# echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
# sudo apt-get update  > /dev/null 2>&1
# sudo apt-get install redis-stack-server  > /dev/null 2>&1
# redis-stack-server --daemonize yes

In [None]:
## Update the 'host' field with the correct Redis host URL
host = ''
port = 17001
password = 'admin'
requirePass = True

## For redis-stack-server, comment out the above code and uncomment the following:
# host = 'localhost'
# requirePass = False

#### Create Redis connection object


In [None]:
import redis

if requirePass:
    client = redis.Redis(host = host, port=port, decode_responses=True, password=password)
else:
    client = redis.Redis(host = 'localhost', decode_responses=True)

print(client.ping())
# Clear Redis database (optional)
client.flushdb()

#### Authenticate with GCP & set project id and region
We will be using Vertex AI Embedding model to create embeddings for our dataset. Before doing that we must authenticate with GCP and get the suitable Google Project Id and Region 

In [None]:
## Authenticate with GCP & set project id and region
from google.colab import auth
from getpass import getpass

auth.authenticate_user()
print('Authenticated')

# input your GCP project ID and region for Vertex AI
PROJECT_ID = 'central-beach-194106' #getpass("PROJECT_ID:")
REGION = 'asia-south1' #input("REGION:")

print(f'PROJECT_ID: {PROJECT_ID} & REGION: {REGION}')


#### Download the sample dataset
We will be using a sample IMDB Movies dataset available from Kaggle.
Next, we will load it into Pandas Dataframe and investigate the column and its data-type.

In [None]:
!wget https://storage.googleapis.com/abhi-data-2024/MOVIES.csv


In [None]:

import pandas as pd

df = pd.read_csv('MOVIES.csv')


In [None]:
df.head(5)

### Create text embeddings with Vertex AI embedding model

Use the Vertex AI API for text embeddings, developed by Google.

Text embeddings are a dense vector representation of a piece of content such that, if two pieces of content are semantically similar, their respective embeddings are located near each other in the embedding vector space. This representation can be used to solve common NLP tasks, such as:


*   Semantic search: Search text ranked by semantic similarity.
*   Recommendation: Return items with text attributes similar to the given text.
*   Classification: Return the class of items whose text attributes are similar to the given text.
*   Clustering: Cluster items whose text attributes are similar to the given text.
*   Outlier Detection: Return items where text attributes are least related to the given text.

The Vertex AI text-embeddings API lets you create a text embedding using Generative AI on Vertex AI. The textembedding-gecko model accepts a maximum of 3,072 input tokens (i.e. words) and outputs 768-dimensional vector embeddings.

In [None]:
## Use the redis-vl library and select the embedding model provider as "textembedding-gecko@003" 

from redisvl.utils.vectorize import VertexAITextVectorizer

vectorizer = VertexAITextVectorizer(
    model = "textembedding-gecko@003",
    api_config = {"project_id": PROJECT_ID, "location": REGION}
)


In [None]:
## 1. Create embeddings for the 'overview' column in movies' Dataframe 
## 2. Store these embeddings in a new column 'overview_embedding' 
## 3. Finally, append this new embeddings column in the existing dataframe

embeddings = vectorizer.embed_many([element for element in df['overview']])
df.insert(len(df.columns)-1, "overview_embedding", embeddings)
df.head()

### Store the dataframe in Redis database
Once done, we will move to the next part of our exercise which is creating the suitable indexes.
We will use these indexes to build:
* VSS queries
* Standard search query
* Hybrid queries


In [None]:
## Store the Dataframe in Redis
import json

pipeline = client.pipeline()

for index, row in df.iterrows():
    redis_key = f"doc:{index}"
    pipeline.json().set(redis_key, '$', row.to_dict())

pipeline.execute()

In [None]:
## Verify of the record is presentin Redis
print(client.json().get('doc:4', '$.overview_embedding'))

### Create Indexes in Redis 
Now is the time to query Redis database. For that we will create few indexes using __redis-vl__ Python library.
<br>Redis offers an enhanced Redis experience via the following search and query features:

* A rich query language
* Incremental indexing on JSON and hash documents
* Vector search
* Full-text search
* Geospatial queries
* Aggregations
* You can find a complete list of features here: https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/

The search and query features of Redis Stack allow you to use Redis as a:

* Document database
* Vector database
* Secondary index
* Search engine


In [None]:
## create index using redis-vl Python library.

from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex


index_name = "idx_movie"

schema = IndexSchema.from_dict({
  "index": {
    "name": index_name,
    "prefix": "doc:",
    "storage_type": "json"
  },
  "fields": [
    {
        "name": "budget",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "original_title",
        "type": "text"
    },
    {
        "name": "overview",
        "type": "text"
    },
    {
        "name": "revenue",
        "type": "numeric"
    },
    {
        "name": "vote_count",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "popularity",
        "type": "numeric",
        "attrs": {
            "sortable": True
        }
    },
    {
        "name": "overview_embedding",
        "type": "vector",
        "attrs": {
            "dims": vectorizer.dims,
            "distance_metric": "cosine",
            "algorithm": "flat",
            "datatype": "float32"
        }
    }
  ]
})

In [None]:
print(vectorizer.dims)

In [None]:
# Create an index from schema and the client
index = SearchIndex(schema, client)
index.create(overwrite=True, drop=True)

In [6]:
## redis-rvl library also provides CLI support as well. You can get the information of created indexes using following commands

!rvl index listall

!# inspect the index fields
!rvl index info -i idx_movie

zsh:1: command not found: rvl
zsh:1: command not found: rvl


### Querying Redis 
Now is the time to query Redis database. Again we will use __redis-vl__ Python library to achieve this.
<br>We will invoke following types of queries against our records present in Redis:

* VSS queries
* Standard search query
* Hybrid queries


In [None]:

from redisvl.query import VectorQuery
from redisvl.query.filter import Num

query = "romantic movie"

vote_count_filter = Num("vote_count") > 4000

query_embedding = vectorizer.embed(query)

vector_query = VectorQuery(
    vector=query_embedding,
    vector_field_name="overview_embedding",
    num_results=5,
    return_fields=["original_title", "overview", "popularity", "revenue", "vote_count"],
    return_score=True,
    filter_expression=vote_count_filter
)

# show the raw redis query
str(vector_query)

In [None]:
# execute the query with RedisVL
print(json.dumps(index.query(vector_query), indent=2))

In [None]:
!rvl stats -i idx_movie