<a href="https://colab.research.google.com/github/denisabrantesredis/denisd-GenAI-Workshop/blob/main/Labs/01-RAG_VectorDB_Cache/01_Redis_Langchain.ipynb" target="_newt">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<div style="display:flex;width=100%;">
<img src="https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120" alt="Redis" width="90"/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<img src="https://www.gstatic.com/devrel-devsite/prod/v0e0f589edd85502a40d78d7d0825db8ea5ef3b99ab4070381ee86977c9168730/cloud/images/cloud-logo.svg" alt="Google Cloud" width="140"/>
</div>

# Vector Similarity Search with Redis & Google Cloud

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/redis_gcp.png?raw=true" alt="Redis and Google Cloud" align="center"/>

[Try a similar app with an always-on demo](https://antonum-redis-vss-streamlit-streamlit-app-p4z5th.streamlit.app/)

In this notebook, we will build a RAG use case using data from a web page. Redis will be used as the Vector Database and Cache for our use case, while Google Gemini is the LLM that will help generate the answers to the user's questions.

## Installing the Pre-Reqs

In [None]:
!pip install -q sentence-transformers==3.0.1 >> /.tmp
!pip install -q unstructured==0.15.10 >> /.tmp
!pip install -q unstructured[pdf] >> /.tmp
!pip install -q redis==5.0.8 >> /.tmp
!pip install -q redisvl==0.3.5 >> /.tmp
!pip install -q langchain==0.2.16 >> /.tmp
!pip install -q langchain-core==0.3.6 >> /.tmp
!pip install -q langchain-huggingface==0.0.3 >> /.tmp
!pip install -q langchain-redis==0.0.4 >> /.tmp
!pip install -q langchain-google-genai==2.0.0 >> /.tmp
!pip install -q nltk==3.9.1 >> /.tmp

In [None]:
# patch an issue with RedisVL
!wget https://github.com/denisabrantesredis/denisd-GenAI-Workshop/raw/refs/heads/main/_assets/files/semantic.py
!rm /usr/local/lib/python3.10/dist-packages/redisvl/extensions/llmcache/semantic.py
!cp semantic.py /usr/local/lib/python3.10/dist-packages/redisvl/extensions/llmcache/

## Part 1 - Declare a Document class to handle web site data

In this lab, we will use the [Unstructured](https://docs.unstructured.io/open-source/core-functionality/partitioning#partition-html) API to load data from a web page, parse it and break into chunks.

A web page can have multiple different types of content; this class will help us identify the type of content being collected from the page, so we can make sure we're only getting the text from the page.

In [None]:
from typing import List, Optional
from enum import Enum
from torch import Tensor
from uuid import uuid4

In [None]:
class DataType(str, Enum):
    TITLE = "Title"
    TEXT = "Text"
    UNCATEGORIZED_TEXT = "UncategorizedText"
    NARRATIVE_TEXT = "NarrativeText"
    BULLETED_TEXT = "BulletedText"
    PARAGRAPH = "Paragraph"
    ABSTRACT = "Abstract"
    THREADING = "Threading"
    FORM = "Form"
    FIELD_NAME = "Field-Name"
    VALUE = "Value"
    LINK = "Link"
    COMPOSITE_ELEMENT = "CompositeElement"
    IMAGE = "Image"
    PICTURE = "Picture"
    FIGURE_CAPTION = "FigureCaption"
    FIGURE = "Figure"
    CAPTION = "Caption"
    LIST = "List"
    LIST_ITEM = "ListItem"
    LIST_ITEM_OTHER = "List-item"
    CHECKED = "Checked"
    UNCHECKED = "Unchecked"
    CHECK_BOX_CHECKED = "CheckBoxChecked"
    CHECK_BOX_UNCHECKED = "CheckBoxUnchecked"
    RADIO_BUTTON_CHECKED = "RadioButtonChecked"
    RADIO_BUTTON_UNCHECKED = "RadioButtonUnchecked"
    ADDRESS = "Address"
    EMAIL_ADDRESS = "EmailAddress"
    PAGE_BREAK = "PageBreak"
    FORMULA = "Formula"
    TABLE = "Table"
    HEADER = "Header"
    HEADLINE = "Headline"
    SUB_HEADLINE = "Subheadline"
    PAGE_HEADER = "Page-header"  # Title?
    SECTION_HEADER = "Section-header"
    FOOTER = "Footer"
    FOOTNOTE = "Footnote"
    PAGE_FOOTER = "Page-footer"
    PAGE_NUMBER = "PageNumber"
    CODE_SNIPPET = "CodeSnippet"


class Metadata(dict):
    """Metadata fields that pertain to the data source."""
    source: str
    url: Optional[str] = None
    text_as_html: Optional[str] = None


class DataElement(dict):
    """A data element is a piece of text, image, link, or table."""
    """ The content field can contain text or Base64 encoded image data."""
    id: uuid4
    data_type: DataType
    content: str | bytes
    metadata: Metadata
    embeddings: Optional[Tensor] = None


class Document(List[DataElement]):
    """A document is a list of data elements."""
    def from_dict(self, data: dict):
        for element in data:
            self.append(DataElement(
                id=element["element_id"],
                data_type=DataType(element["type"]),
                content=element["text"],
                metadata=Metadata(
                    source=element["metadata"]["source"],
                    url=element["metadata"]["url"],
                    text_as_html=element["metadata"]["text_as_html"]
                )
            ))
        return self

## Part 2 - Extract text from the Web Site

### Step 1: Parsing

The process of 'parsing' or 'partitioning' will extract the text from the source (in this case, a web page), and group it into Elements.

Since we're not interested in all the possible content in a web page, we will filter these elements so that we only capture `NarrativeText`, `List` and `ListItem` elements, meaning we will only get paragraphs and bullet-point lists.

In [None]:
from unstructured.partition.html import partition_html

In [None]:
def parse(url):
    print(f"--> Starting parse: {url}")
    acceptable_types = ["NarrativeText", "List", "ListItem"]
    elements = partition_html(url=url)
    output_list = Document()
    for element in elements:
        el = element.to_dict()
        el_type = el["type"]
        if el_type in acceptable_types:
            if len(el["text"]) >= 20:
                output_list.append(element.to_dict())
    print(f"--> Total Elements: {len(output_list)}")
    return output_list

#### Define the web page you want to capture

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_search.png?raw=true" alt="Callout - Value can be changed"/>

In [None]:
blog_page = parse("https://redis.io/blog/redis-insight-makes-rdi-even-simpler/")

In [None]:
blog_page[3]

### Step 2: Chunking

Chunking is the process of grouping elements together into more meaningful text blocks for vector generation.

Due to time and scope constraints, this lab will use a very simple chunking function. However, it's important to keep in mind that this is one of the most important steps in a RAG implementation; a well-designed (and tested) chunking strategy is vital for the success of your RAG project.

In [None]:
from unstructured.chunking.title import chunk_by_title
from unstructured.staging.base import convert_to_dict, dict_to_elements

In [None]:
def chunk_docs_unstruct(elements):
    chunking_settings = {
        "combine_text_under_n_chars": 50,
        "max_characters": 750,
        "new_after_n_chars": 500
    }
    chunked_raw = chunk_by_title(elements=elements, **chunking_settings)
    results = convert_to_dict(chunked_raw)
    return results


def chunk(input_data):
    print(f"--> Generating Chunks")
    elements_raw = dict_to_elements(input_data)
    elements = chunk_docs_unstruct(elements_raw)
    print(f"--> Generated {len(elements)} chunks")
    return elements

In [None]:
chunked_page = chunk(blog_page)

In [None]:
chunked_page[1]

## Part 3: Generating the Vectors & Saving to Redis

In this lab, we will leverage the Langchain package for Redis, which automates most of the functions required to setup and use Redis as Vector Database and Cache service.

To learn more about the Langchain package for Redis, visit the official documentation: [https://python.langchain.com/docs/integrations/vectorstores/redis/](https://python.langchain.com/docs/integrations/vectorstores/redis/)

To generate the embeddings, we will use the [Huggingface embedding model](https://python.langchain.com/docs/integrations/text_embedding/huggingfacehub/).

### Importing Required Packages

In [None]:
import os
import time
import redis
from google.colab import userdata

os.environ["PYDANTIC_SKIP_VALIDATING_CORE_SCHEMAS"] = "True"

from langchain_redis import RedisConfig, RedisVectorStore
from langchain_huggingface import HuggingFaceEmbeddings

### Step 1: Setting Up Connection String

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_secrets.png?raw=true" alt="Callout - Use Google Colab secrets instead"/>

In [None]:
if "GOOGLE_API_KEY" not in os.environ:
    if userdata.get('GOOGLE_API_KEY'):
      os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
    else:
      os.environ["GOOGLE_API_KEY"] = "<insert API key here>"

if userdata.get('REDIS_HOST'):
  REDIS_HOST = userdata.get('REDIS_HOST')
else:
  REDIS_HOST="127.0.0.1"

if userdata.get('REDIS_PORT'):
  REDIS_PORT = userdata.get('REDIS_PORT')
else:
  REDIS_PORT=12000

if userdata.get('REDIS_PASSWORD'):
  REDIS_PASSWORD = userdata.get('REDIS_PASSWORD')
else:
  REDIS_PASSWORD="password"

REDIS_URL = f"redis://default:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

#### Testing the Connection to Redis

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_connection.png?raw=true" alt="Callout - Make sure connection works"/>

In [None]:
r = redis.from_url(REDIS_URL)

if r.ping():
    print("Connection successful!")
else:
    print("Connection issue!")

The Redis configuration includes the name of the index that will be used for Vector search. This index is created automatically by the Langchain package, while allowing developers to control the additional metadata that will be stored with the vectors (for hybrid searches).

In [None]:
embeddings = HuggingFaceEmbeddings()

config = RedisConfig(
    index_name="idx:web",
    redis_url=REDIS_URL,
    metadata_schema=[
        {"name": "id", "type": "text"},
        {"name": "url", "type": "text"},
        {"name": "filetype", "type": "text"},
        {"name": "languages", "type": "tag"}
    ]
)
vector_store = RedisVectorStore(embeddings, config=config)

### Step 2 - Add the chunks to a JSON list and store in Redis

In this step, we prepare a list of JSON objects contanining the data from our chunks. Here is where we can map the metadata fields we want to store in Redis to be used in hybrid searches. Notice how we are not generating the vectors manually as part of the step; this is fully automated by the Langchain package, based on the embedding model we've selected.

In [None]:
counter = 0
texts = []
metadata = []

for document in chunked_page:
    counter = counter + 1
    texts.append(document['text'])
    metadata_obj = {
                       "id": f"webdoc:{counter:05}",
                       "url": document["metadata"]["url"],
                       "filetype": document["metadata"]["filetype"],
                       "languages": document["metadata"]["languages"],
                    }
    metadata.append(metadata_obj)

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_save.png?raw=true" alt="Callout - Saving to Redis"/>

In [None]:
timer_start = time.perf_counter()
ids = vector_store.add_texts(texts, metadata)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

List the IDs of all documents saved to Redis:

In [None]:
ids

&nbsp;

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_insight.png?raw=true" alt="Callout - Check Redis Insight"/>

Open Redis Insight and confirm that all documents were generated. Notice how each document contains the vector that was automatically generated by the Langchain package. You may also notice that the vectors are not presented as a list; this is due to the fact that they are stored as binary strings, which is more efficient for retrieval and storage.

You can also go to the **Workbench** and get a list of indexes using the command:

```
FT._list
```

Finally, you can get more details about the index that was automatically generated by Langchain with this command:
```
FT.info "idx:web"
```
&nbsp;

&nbsp;

## Part 4: Running a Vector Search

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_question.png?raw=true" alt="Callout - Change Question"/>

In [None]:
query = "How does Redis Insight make RDI simpler?"

### Running a Semantic Search

The Langchain integration greatly simplifies the process of running a semantic search. A single function call is enough. Notice how we do not need to generate a vector for our question manually; this is handled automatically by the function, based on the embedding model we've selected before.

For more details on the different ways to run vector searches, check the [Langchain documentation page](https://python.langchain.com/docs/integrations/vectorstores/redis/#query-vector-store).

&nbsp;


In [None]:
timer_start = time.perf_counter()
results = vector_store.similarity_search_with_score(query)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

### Visualizing the search results with the score for each result

In [None]:
print(f"Search results for '{query}':")
for doc in results:
    print("----")
    print(f"Score: {doc[1]} - {doc[0].page_content} (Source: {doc[0].metadata['url']})")

&nbsp;

## Part 5: Using a LLM

In this lab, we will use the Gemini Pro 1.5 model from Google to generate a response to the user, based on the documents retrieved from Redis. The GCP API Key that we set before is required to allow access to the model.

### Step 1: Load the Model

In [None]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_google_genai import ChatGoogleGenerativeAI

In [None]:
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0.5,
    top_p=0.95,
    top_k=64,
    max_output_tokens=8192
    )

### Step 2: Prepare a list with the text from the documents retrieved by the vector search

We will ask the model to respond to the user's questions. To help with the answer, we want to provide the text from the documents that were retrieved by the semantic search.

In [None]:
text_list = []
distance_list = []

for node in results:
    text_list.append(node[0].page_content)
    distance = node[1]
    distance_list.append(distance)

Print the list that will be sent to the model.

In [None]:
text_list

### Step 3: Prepare the Prompt

Since this is just a lab, we will keep the prompt very simple, with just basic instructions for the model to answer based on the documents from the semantic search, and to stick to the documents for the response. Production prompts will benefit from more sophisticated prompts, as well as other controls like guardrails, etc.

In [None]:
def get_system_template(text_list, query):
  system_template = """
  Your task is to answer questions by using a given context.

  Don't invent anything that is outside of the context.

  %CONTEXT%
  {context}

  """
  messages = [
      SystemMessage(content=system_template.format(context=text_list)),
      HumanMessage(content=query)
  ]

  return messages

In [None]:
messages = get_system_template(text_list, query)

### Step 4: Invoke the Model

Since we are not using Redis as cache, the model will be called every time, even if the same question (or a similar) is asked multiple times.

In [None]:
timer_start = time.perf_counter()
llm_response = llm.invoke(messages)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Visualizing the model response:

In [None]:
llm_response.content

&nbsp;

## Part 6: Leveraging Redis for Basic Cache

Redis can be used not only as the Vector Database, but also as a cache to store responses from the Large Language Model, which can significantly improve user experience, by retrieving responses in milliseconds instead of seconds.

In [None]:
from langchain_redis import RedisCache
from langchain.globals import set_llm_cache

To use Redis as a cache, we only need 2 lines of code:

In [None]:
redis_cache = RedisCache(redis_url=REDIS_URL)
set_llm_cache(redis_cache)

We will repeat the same question from before. Since we have just enabled the cache, it will be empty, which means that this next question will require a vector search and will need to go through the Large Language Model again.

In [None]:
query = "How does Redis Insight make RDI simpler?"

In [None]:
timer_start = time.perf_counter()
result_nodes = vector_store.similarity_search_with_score(query)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Prepare the list of texts to send to the LLM:

In [None]:
text_list = []
distance_list = []

for node in result_nodes:
    text_list.append(node[0].page_content)
    distance = node[1]
    distance_list.append(distance)

Display the search results:

In [None]:
print(f"--> Total Documents Found: {len(result_nodes)}")
for node in result_nodes:
  print(f"--> {node[1]} | {node[0].page_content}")

Prepare the prompt:

In [None]:
messages = get_system_template(text_list, query)

Call the model (the `invoke` function will check and populate the cache automatically):

In [None]:
timer_start = time.perf_counter()
llm_response = llm.invoke(messages)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Print the LLM response:

In [None]:
llm_response.content

&nbsp;

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_insight.png?raw=true" alt="Callout - Check Redis Insight"/>

A new document should appear on Redis, of type JSON. This is the cached response from the LLM.
Notice that the key is made from a long this; this is a hash of the question.

Because this is a basic cache, questions from the user will be hashed and compared against the key, which means that for this basic cache, questions must match exactly in order to be used.

&nbsp;


#### Repeating the question to fetch results from the cache

When we ask exactly the same question as before, it should trigger a cache hit, meaning we will receive the answer from the Redis cache, much faster than calling the model.

In [None]:
timer_start = time.perf_counter()
llm_response = llm.invoke(messages)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Print the cached response:

In [None]:
llm_response.content

&nbsp;

#### Asking the same question (worded differently) will cause a cache miss

If the question is not an exact match, it will cause a cache miss. This might be an issue with most of the RAG use cases, which is why we will be exploring Semantic Cache next.

In [None]:
# original query = "How does Redis Insight make RDI simpler?"
query = "What does Redis Insight do to make RDI simpler?"

Prepare the prompt with the new query (PS: we're skipping the vector search on purpose)

In [None]:
messages = get_system_template(text_list, query)

Call the model:

In [None]:
timer_start = time.perf_counter()
llm_response = llm.invoke(messages)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Print the response:

In [None]:
llm_response.content

&nbsp;

## Part 7 - Leveraging Redis for Semantic Cache

The Semantic Cache will generate vectors for each prompt, and store the response from the LLM. That way, new prompts are converted into vectors automatically and a semantic search is executed on Redis, looking for similar questions.

It is possible to set the threshold for for semantic search; for this lab, we are using 20%. In your project, you can run multiple tests with different thresholds, to determined what works best for your use case.

In [None]:
from langchain_redis import RedisSemanticCache

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_threshold.png?raw=true" alt="Callout - Semantic Threshold"/>

In [None]:
redis_cache = RedisSemanticCache(redis_url=REDIS_URL, embeddings=embeddings, distance_threshold=0.2)
set_llm_cache(redis_cache)

Since the Semantic Cache is new, it will be empty. We will ask the original question first, to generate the cache entry:

In [None]:
query = "How does Redis Insight make RDI simpler?"

Prepare the prompt:

In [None]:
messages = get_system_template(text_list, query)

Invoke the model (it will cause a cache miss):

In [None]:
timer_start = time.perf_counter()
llm_response = llm.invoke(messages)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Print the response:

In [None]:
llm_response.content

&nbsp;

<img src="https://github.com/denisabrantesredis/denisd-GenAI-Workshop/blob/main/_assets/images/callout_insight.png?raw=true" alt="Callout - Check Redis Insight"/>

A new Hash document will appear in Redis, with a key prefix of `llmcache`. This is the cached prompt, which includes the question and the answer. The `invoke` function will run a semantic search for these documents, to look for similar questions.

&nbsp;

#### Ask a similar question to trigger a cache hit

In [None]:
query = "What does Redis Insight do to make RDI simpler?"

Prepare the prompt:

In [None]:
messages = get_system_template(text_list, query)

Invoke the model:

In [None]:
timer_start = time.perf_counter()
llm_response = llm.invoke(messages)
timer_end = time.perf_counter()
total_time = round(timer_end - timer_start, 4)
print(f"Total Time: {total_time}s")

Print the response:

In [None]:
llm_response.content

&nbsp;


&nbsp;



# Congrats, this is the end of the lab!!