# [Redis LangChain OpenAI eCommerce Chatbot](https://redis.com/blog/build-ecommerce-chatbot-with-redis/)

In [None]:
# Install requirements
# %pip install -r requirements.txt

## Fetch and prepare the products dataset

### Download the Dataset

We will be working with the [Amazon Berkeley Objects](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) dataset.

In [None]:
# !gdown 1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj

### Preprocess Dataset

We truncate the longer text fields. That’s to keep our dataset a bit leaner, which saves on memory and compute time.

In [None]:
import numpy as np
import pandas as pd

MAX_TEXT_LENGTH = 512  # Maximum num of text characters to use


def auto_truncate(text: str) -> str:
    """Truncate the given text."""
    return text[:MAX_TEXT_LENGTH]


# Load Product data and truncate long text fields
all_prods_df = pd.read_csv(
    "product_data.csv",
    converters={
        "bullet_point": auto_truncate,
        "item_keywords": auto_truncate,
        "item_name": auto_truncate,
    },
)

print(all_prods_df.shape)

Perform some final preprocessing steps to construct a primary key, clean up the keywords field and to drop missing values.

In [None]:
# Contruct a primary key from item ID and domain name
all_prods_df["primary_key"] = (
    all_prods_df["item_id"] + "-" + all_prods_df["domain_name"]
)
# Replace empty strings with None and drop
all_prods_df["item_keywords"].replace("", None, inplace=True)
all_prods_df.dropna(subset=["item_keywords"], inplace=True)

# Reset pandas dataframe index
all_prods_df.reset_index(drop=True, inplace=True)

all_prods_df.head()

The full dataset contains over 100,000 products, but we will restrict it to a subset of 2,500.

In [None]:
# Number products to use (subset)
NUMBER_PRODUCTS = 2500

# Get the first 1000 products with non-empty item keywords
product_metadata = all_prods_df.head(NUMBER_PRODUCTS).to_dict(orient="index")

# Check one of the products
product_metadata[0]

## Set up Redis as a Vector Database

LangChain has a simple wrapper around [Redis](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/redis.py) to help you load text data and to create embeddings that capture “meaning.” In this code, we prepare the product text and metadata, prepare the text embeddings provider (OpenAI), assign a name to the search index, and provide a Redis URL for connection.

In [None]:
import os
import dotenv

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.redis import Redis as RedisVectorStore

# loads .env file with your OPENAI_API_KEY
dotenv.load_dotenv()

In [None]:
# Data that will be embedded and converted to vectors
texts = [v["item_name"] for _, v in product_metadata.items()]

# Product metadata that we'll store along our vectors
metadatas = list(product_metadata.values())

# Define embedding model
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

# Name of the Redis search index to create
index_name = "products"

# Assumes you have a redis stack server running on local host
redis_url = "redis://localhost:6379"

Create the Redis vectorstore.

In [None]:
# Create and load redis with documents
vectorstore = RedisVectorStore.from_texts(
    texts=texts,
    metadatas=metadatas,
    embedding=embeddings,
    index_name=index_name,
    redis_url=redis_url,
)

## Create the LangChain conversational chain

### Build the ChatBot with ConversationalRetrieverChain

In [None]:
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
from langchain.prompts.prompt import PromptTemplate

Redis holds our product catalog including metadata and OpenAI-generated embeddings that capture the semantic properties of the product content. Under the hood, using [Redis Vector Similarity Search](https://redis.io/docs/stack/search/reference/vectors/) (VSS), the chatbot queries the catalog for products that are most similar to or relevant to what the user is shopping for. No fancy keyword search or manual filtering is needed; VSS takes care of it.

The `ConversationalRetrievalChain` that forms the chatbot operates in three phases:

1. **Question creation** evaluates the input question and uses the OpenAI GPT model to combine it with knowledge from previous conversational interactions (if any).
2. **Retrieval** searches Redis for the best available products, given the items in which the  shopper expressed interest.
3. **Question answering** gets the product results from the vector search query and uses the OpenAI GPT model to help the shopper navigate the options.

### Prompt Engineering

Below is the prompt defined for steps 1 and 3 above.

In [None]:
template = """Given the following chat history and a follow up question, rephrase the follow up input question to be a standalone question.
Or end the conversation if it seems like it's done.

Chat History:\"""
{chat_history}
\"""

Follow Up Input: \"""
{question}
\"""

Standalone question:"""

condense_question_prompt = PromptTemplate.from_template(template)

template = """You are a friendly, conversational retail shopping assistant. Use the following context including product names, descriptions, and keywords to show the shopper whats available, help find what they want, and answer any questions.
It's ok if you don't know the answer.

Context:\"""
{context}
\"""

Question:\"
\"""

Helpful Answer:"""

qa_prompt = PromptTemplate.from_template(template)

Define two OpenAI LLMs and wrap them with chains for question generation and question answering respectively. The `streaming_llm` allows us to pipe the chatbot responses to stdout, token by token, giving it a charming, chatbot-like user experience.

In [None]:
# Define two LLM models from OpenAI
llm = OpenAI(temperature=0)

streaming_llm = OpenAI(
    streaming=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
    temperature=0.2,
    max_tokens=150,
)

# Use the LLM Chain to create a question creation chain
question_generator = LLMChain(llm=llm, prompt=condense_question_prompt)

# Use the streaming LLM to create a question answering chain
doc_chain = load_qa_chain(llm=streaming_llm, chain_type="stuff", prompt=qa_prompt)

Finally, we tie it all together with the `ConversationalRetrievalChain` that wraps all three steps.

In [None]:
chatbot = ConversationalRetrievalChain(
    retriever=vectorstore.as_retriever(),
    combine_docs_chain=doc_chain,
    question_generator=question_generator,
)

## Experiment with the friendly virtual shopping assistant

In [None]:
# Create a chat history buffer
chat_history = []

# Gather user input for the first question to kick off the bot
question = input("Hi! What are you looking for today?")

# Keep the bot running in a loop to simulate a conversation
while True:
    result = chatbot({"question": question, "chat_history": chat_history})
    print("\n")
    chat_history.append((result["question"], result["answer"]))
    question = input()

## Customize your chains for better performance

We customize the `BaseRetriever` class to perform some document preprocessing before it returns the results.

In [None]:
import json

from langchain.schema import BaseRetriever
from langchain.vectorstores import VectorStore
from langchain.schema import Document
from pydantic import BaseModel


class RedisProductRetriever(BaseRetriever, BaseModel):
    vectorstore: VectorStore

    class Config:
        arbitrary_types_allowed = True

    def combine_metadata(self, doc) -> str:
        metadata = doc.metadata
        return (
            "Item Name: "
            + metadata["item_name"]
            + ". "
            + "Item Description: "
            + metadata["bullet_point"]
            + ". "
            + "Item Keywords: "
            + metadata["item_keywords"]
            + "."
        )

    def get_relevant_documents(self, query):
        docs = []
        for doc in self.vectorstore.similarity_search(query):
            content = self.combine_metadata(doc)
            docs.append(Document(page_content=content, metadata=doc.metadata))
        return docs

## Setup ChatBot with new retriever

Update the retrieval class and chatbot to use the custom implementation above.

In [None]:
redis_product_retriever = RedisProductRetriever(vectorstore=vectorstore)

chatbot = ConversationalRetrievalChain(
    retriever=redis_product_retriever,
    combine_docs_chain=doc_chain,
    question_generator=question_generator,
)

## Retry

In [None]:
# Create a chat history buffer
chat_history = []

# Gather user input for the first question to kick off the bot
question = input("Hi! What are you looking for today?")

# Keep the bot running in a loop to simulate a conversation
while True:
    result = chatbot({"question": question, "chat_history": chat_history})
    print("\n")
    chat_history.append((result["question"], result["answer"]))
    question = input()