# LLM Applications With Redis Enterprise

In this demo we'll show 3 common use cases for Redis Enterprise in LLM applications:
1. **Semantic Search** (i.e., Vector Search), and **RAG (Retrieval-Augmented Generation)** to chat with a knowledge base
2. **Semantic Cache**
3. **Chat Memory**

We'll use [LangChain](https://www.langchain.com/) to compose these use cases. You can sign up for a free Redis database [here](https://redis.com/try-free/).

The diagram below shows the demo architecture.

![](vss-vw-demo.png)

## Prerequisites

Install packages

In [18]:
%pip install redis langchain rich spacy google-cloud-aiplatform unstructured markdown python-dotenv requests

Note: you may need to restart the kernel to use updated packages.


Load environment variables

In [19]:
from rich import print # this will pretty-print python objects
import warnings
import dotenv

# mute warnings
warnings.filterwarnings('ignore')

# load env vars from .env file
dotenv.load_dotenv()

def download_file(url, filename):
    import requests
    r = requests.get(url, allow_redirects=True)
    open(filename, 'wb').write(r.content)

## 0. Data Preparation

### Load Documents

Let's talk to the Redis documentation. We'll load a local copy of the Search [Aggregations](https://redis.io/docs/interact/search-and-query/search/aggregations/) and [Query](https://redis.io/docs/interact/search-and-query/query/) pages and use them to answer questions.

In [20]:
# Load documents
from langchain.document_loaders import UnstructuredMarkdownLoader

# download aggregation doc
aggs_url = "https://github.com/RediSearch/RediSearch/raw/master/docs/docs/advanced-concepts/aggregations.md"
download_file(aggs_url, "aggregations.md")
aggs_doc_path = "aggregations.md"
docs = UnstructuredMarkdownLoader(aggs_doc_path).load()

# download query syntax doc
query_syntax_url = "https://github.com/RediSearch/RediSearch/raw/master/docs/docs/advanced-concepts/query_syntax.md"
download_file(query_syntax_url, "query_syntax.md")
query_doc_path = "query_syntax.md"
docs.extend(UnstructuredMarkdownLoader(query_doc_path).load())

print(f"Loaded {len(docs)} documents")

### Split Documents

Next, we'll split the doument into chunks and index each chunk as a separate document.

This will allow us to retrieve specific, smaller, relevant chunks of the document to add context to our prompt.

In [21]:
# Split documents into chunks
from langchain.text_splitter import SpacyTextSplitter

text_splitter = SpacyTextSplitter(chunk_size=750, chunk_overlap=100, strip_whitespace=True)
splits = text_splitter.split_documents(docs)
print(f"Generated {len(splits)} splits")

### Create Embeddings, Load Into Redis and Create Search Index

Let's create our embeddings transfromer. We will use it to transform our documents, the user's questions, and our prompts into vectors.

In [22]:
from langchain.embeddings import VertexAIEmbeddings

# Define Text Embeddings model
embedding = VertexAIEmbeddings()

Model_name will become a required arg for VertexAIEmbeddings starting from Feb-01-2024. Currently the default is set to textembedding-gecko@001


We can now use the embeddings object to transform our documents content, then load the documents into Redis.

This step will also create a search index called `redis-docs` on the documents.

In [23]:
%%time
# Create embeddings and load data into Redis
from langchain.vectorstores import Redis

vectordb = Redis.from_documents(documents=splits, embedding=embedding, index_name="redis-docs")

CPU times: user 90.9 ms, sys: 9.89 ms, total: 101 ms
Wall time: 1.49 s


### Test: Retrieve Documents Related to a Question

In [24]:
question = "How can I load the redis key name (Document ID) and filter results based on that field?"

*K* is the number of documents to retrieve.

In [25]:
results = vectordb.similarity_search_with_score(question, k=3)
print(results)

A different type of search is Max Marginal Relevance (MMR) search. MMR search is an algorithm that combines the similarity of a document to a query with the similarity of the document to the other documents in the result set. It is useful when you want to retrieve a set of documents that are similar to a query, but also diverse from each other.

In [26]:
results = vectordb.max_marginal_relevance_search(question, k=3, top_k=5, threshold=0.5)
print(results)

## 1. Semantic Search - Question Answering (Q&A)

We will create a prommpt template that will provide instructions to the LLM,
as well as contain placeholders for the context (retrieved from Redis) and the question (asked by the user).

In [27]:
from langchain.prompts import PromptTemplate

QA_TEMPLATE = """
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum. Keep the answer as concise as possible. 
-----
Context: 

{context}
-----
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"], 
    template=QA_TEMPLATE
)


Next we will create an LLM object and use it to generate answers to our questions.

We will also create the `RetrievalQA` chain, which will retrieve the most relevant documents from Redis, and use them as context for the LLM.

We are specifying:
* The LLM model name (`text-bison`)
* The maximum length of the generated answer (`max_output_tokens`)
* The LLM temperature (`temperature`), which controls the randomness of the generated text. Higher values will result in more random text while lower values will result in more predictable text.

The type of chain we're creating is a `stuff` chain, as in "stuff the retrieved documents into the LLM".

In [28]:
from langchain.chains import RetrievalQA
from langchain.llms import VertexAI

# Define LLM to generate response
llm = VertexAI(model_name='text-bison@001', max_output_tokens=512, temperature=0.5)

# Create QA chain to respond to user query along with source documents
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(),
    return_source_documents=True
    )

In [29]:
question = "How can I load the redis key name (Document ID) and filter results based on that field?"

In [30]:
%%time
# Run QA chain
result = qa({"query": question})
print(result['result'])

score_threshold is deprecated. Use distance_threshold instead.score_threshold should only be used in similarity_search_with_relevance_scores.score_threshold will be removed in a future release.


CPU times: user 28.4 ms, sys: 8 ms, total: 36.4 ms
Wall time: 970 ms


## 2. Semantic Cache


Making calls to a (paid) LLM API can get very expensive, very quickly. We can use Redis to cache the results of our LLM calls, and use the cache to answer questions that we've already answered before.

This will not only save on API usage costs, but will also significantly speed up our response times.

In [31]:
import langchain
from langchain.cache import RedisSemanticCache

langchain.llm_cache = RedisSemanticCache(
    embedding=embedding,
    redis_url="redis://localhost:6379",
    score_threshold=0.2  # what is the maximum distance between the query and the retrieved document
)

In [32]:
question = "How do I get documents withing a certain radius from a point?"

In [33]:
%%time
result = qa({"query": question})
print(result['result'])

score_threshold is deprecated. Use distance_threshold instead.score_threshold should only be used in similarity_search_with_relevance_scores.score_threshold will be removed in a future release.


CPU times: user 48 ms, sys: 9.74 ms, total: 57.7 ms
Wall time: 1.99 s


In [34]:
question = "How do I get documents withing a certain radius from a coordinate?"

In [35]:
%%time

result = qa({"query": question})
print(result['result'])

score_threshold is deprecated. Use distance_threshold instead.score_threshold should only be used in similarity_search_with_relevance_scores.score_threshold will be removed in a future release.


CPU times: user 20.5 ms, sys: 7.15 ms, total: 27.6 ms
Wall time: 319 ms


## 3. Chat Memory

In this use case, we'll use Redis to provide a memory to our chatbot. We'll use the memory to store the user's questions and the LLM's answers, and use them to provide context to the LLM in subsequent questions.

In [36]:
import langchain

# Clear cache
langchain.llm_cache = None

## I Do Not Recall

First, let's have a chat with the LLM ***without*** any memory.

In [37]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import VertexAI

# Define LLM to generate response
llm = VertexAI(model_name='text-bison@001', max_output_tokens=512, temperature=0.2)

template = """Assistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.

Human: {human_input}
Assistant:"""

prompt = PromptTemplate(
    input_variables=["human_input"], 
    template=template
    )

# Create QA chain to respond to user query along with source documents
chat = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
)

Using `verbose=True`, we can see the LLM's context.

In [38]:
reply = chat.predict(human_input="Hi, my name is Eli. I like eating noodles and I work at Redis. What is your name?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.

Human: Hi, my name is Eli. I like eating noodles and I work at Redis. What is your name?
Assistant:[0m

[1m> Finished chain.[0m


In [39]:
reply = chat.predict(human_input="Who won the World Cup in 2018?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.

Human: Who won the World Cup in 2018?
Assistant:[0m

[1m> Finished chain.[0m


If we had memory, the LLM would know the answer to the next question:

In [40]:
reply = chat.predict(human_input="What's my name?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.

Human: What's my name?
Assistant:[0m

[1m> Finished chain.[0m


---
## Total Recall
Now let's build the same chatbot ***with*** memory.

The message history will be stored in Redis, and the LLM will use it to provide context to the next question.

In [41]:
from langchain.memory import RedisChatMessageHistory, ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain.llms import VertexAI
from langchain.chains import LLMChain

# Define LLM to generate response
llm = VertexAI(model_name='text-bison', max_output_tokens=512, temperature=0.4)

template = """Assistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.
----------
{history}
----------
Human: {human_input}
Assistant:"""

prompt = PromptTemplate(
    input_variables=["history", "human_input"], 
    template=template
    )

# define the chat message memory
message_history = RedisChatMessageHistory(key_prefix="chat-history:", session_id="vs-demo")
message_history.clear()
memory = ConversationBufferMemory(
    memory_key="history", chat_memory=message_history
)

# Create QA chain to respond to user query along with source documents
chat = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
    memory=memory,
)

In [42]:
reply = chat.predict(human_input="Hi, my name is Adam. I have 3 kids and I like gardening. What is your name?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.
----------

----------
Human: Hi, my name is Adam. I have 3 kids and I like gardening. What is your name?
Assistant:[0m

[1m> Finished chain.[0m


In [43]:
reply = chat.predict(human_input="How long was the last Harry Potter book?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.
----------
Human: Hi, my name is Adam. I have 3 kids and I like gardening. What is your name?
AI:  Hello Adam! My name is Assistant, and I'm a large language model designed to help with a wide rang

In [44]:
reply = chat.predict(human_input="What's the name of their school?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.
----------
Human: Hi, my name is Adam. I have 3 kids and I like gardening. What is your name?
AI:  Hello Adam! My name is Assistant, and I'm a large language model designed to help with a wide rang

In [45]:
reply = chat.predict(human_input="What train platform was the train on?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.
----------
Human: Hi, my name is Adam. I have 3 kids and I like gardening. What is your name?
AI:  Hello Adam! My name is Assistant, and I'm a large language model designed to help with a wide rang

In [46]:
reply = chat.predict(human_input="Do you remember my name?")
print(reply)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAssistant is a large language model, designed to be able to assist with a wide range of tasks, 
from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. 
As a language model, Assistant is able to generate human-like text based on the input it receives, 
allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a 
wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, 
Assistant is here to assist.
----------
Human: Hi, my name is Adam. I have 3 kids and I like gardening. What is your name?
AI:  Hello Adam! My name is Assistant, and I'm a large language model designed to help with a wide rang