**Install required libraries**

In [None]:
!pwd
!pip install --upgrade pip

# Install required libraries
!python3 -m pip -q install redis
!pip install -U langchain gradio
!pip install -U langchain-core
!pip install -U langchain-community
!pip install -qU pypdf
!pip install -U redisvl
!pip install openai
!pip install -qU langchain-openai

**Configure Redis Enterprise**

In [2]:
## Update the 'host' field with the correct Redis host URL
host = ''
port = 16533
password = 'admin'
requirePass = True


In [3]:
import redis

if requirePass:
    client = redis.Redis(host = host, port=port, decode_responses=True, password=password)
else:
    client = redis.Redis(host = 'localhost', decode_responses=True)

print(client.ping())
# Clear Redis database (optional)
client.flushdb()

REDIS_URL = f"redis://:{password}@{host}:{port}"
INDEX_NAME = f"idx_qna"

True


**Download sample PDF report and create knowledge base**

This PDF file will be used to get answers for the questions related to the contexts

In [4]:
!wget https://storage.googleapis.com/abhi-data-2024/how_india_shops_online.pdf -O report.pdf


--2025-01-10 18:35:28--  https://storage.googleapis.com/abhi-data-2024/how_india_shops_online.pdf
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.184.207, 108.177.121.207, 209.85.145.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.184.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13218601 (13M) [application/pdf]
Saving to: ‘report.pdf’


2025-01-10 18:35:33 (4.42 MB/s) - ‘report.pdf’ saved [13218601/13218601]



In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains import RetrievalQA

#from langchain.document_loaders import PyPDFLoader
from langchain_community.document_loaders import PyPDFLoader


file = "report.pdf"

# set up the file loader/extractor and text splitter to create chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2500, chunk_overlap=50, add_start_index=True
)

loader = PyPDFLoader(file)
documents = loader.load()

chunks = text_splitter.split_documents(documents)
#chunked_docs = [doc.page_content for doc in chunks]

In [6]:
chunked_docs = [doc.page_content for doc in chunks]

print("*****")
print(chunked_docs)

*****
['PwC | How India shops online: Consumer preferences in the metropolises and tier 1-4 cities\n1 \nHow India \nshops online:', 'PwC | How India shops online: Consumer preferences in the metropolises and tier 1-4 cities\n2 \nTable of contents\n3 6 8\nForeword About this report Executive summary: \nUnderstanding the \nshopping experience \nand behaviour of urban \ndwellers and the rest of \nIndia\n18\n19\n48\nFrom fashion to \ngourmet food: \nExploring trends \nacross categories \nFashion and accessories\nSports and fitness\nElectronics\nHome and kitchen \nBeauty and personal care\nHealth and wellness\nGrocery\nSmall towns can drive big \nbusiness\nPwC | How India shops online: Consumer preferences in the metropolises and tier 1– 4 cities 2  \n23\n27\n31\n35\n40\n44', 'PwC | How India shops online: Consumer preferences in the metropolises and tier 1-4 cities\n3 \nForeword\nIn the past few years, about 12.5 crore 1 consumers in India shopped online for the first time \nand since then

**Configure Open AI key**

In [7]:
import getpass

# setup the API Key
api_key = getpass.getpass("Enter your OpenAI API key: ")

Enter your OpenAI API key: ··········


# Create text embeddings with Open AI embedding model

Use the Open AI for text embeddings, developed by Google.

Text embeddings are a dense vector representation of a p\iece of content such that, if two pieces of content are semantically similar, their respective embeddings are located near each other in the embedding vector space. This representation can be used to solve common NLP tasks, such as:


*   Semantic search: Search text ranked by semantic similarity.
*   Recommendation: Return items with text attributes similar to the given text.
*   Classification: Return the class of items whose text attributes are similar to the given text.
*   Clustering: Cluster items whose text attributes are similar to the given text.
*   Outlier Detection: Return items where text attributes are least related to the given text.

The Open AI text-embeddings API lets you create a text embedding using Generative AI on Vertex AI. The text-embedding-3-large model accepts a maximum of 4096 input tokens (i.e. words) and outputs 1024-dimensional vector embeddings.

In [8]:
from langchain.vectorstores.redis import Redis
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains import RetrievalQA
from langchain.document_loaders import UnstructuredFileLoader
from langchain_openai import OpenAIEmbeddings


embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    dimensions=1024,
    api_key=api_key
)

def get_vectordb() -> Redis:
    """Create the Redis vectordb."""
    # Load Redis with documents
    vectordb = Redis.from_documents(
        documents=chunks,
        embedding=embeddings,
        index_name=INDEX_NAME,
        redis_url=REDIS_URL
    )
    return vectordb


redis = get_vectordb()


# Include RAG

We're going to build a complete RAG pipeline from scratch incorporating the following components:

Standard retrieval and chat completion
Dense content representation to improve accuracy
Query re-writing to improve accuracy
Semantic caching to improve performance
Conversational session history to improve personalization

### Define Prompt template
PromptTemplate defines the exect text of the response that would be fed to the LLM. This step is optional, but the defaults usually work well for OpenAI and might fall short for other models.

In [9]:
#@title Function to define prompt template

def create_prompt():
    """Create the QA chain."""
    from langchain.prompts import PromptTemplate
    from langchain.chains import RetrievalQA

    # Define our prompt
    prompt_template = """Use only the following pieces of context to answer the question. If you don't know the answer, say that you don't know, don't try to make up an answer.

    This should be in the following format:

    Question: [question here]
    Answer: [answer here]

    Begin!

    Context:
    ---------
    {context}
    ---------
    Question: {question}
    Answer:"""

    prompt = PromptTemplate(
        template=prompt_template,
        input_variables=["context", "question"]
    )
    return prompt


In [10]:
from langchain_openai import OpenAI

llm = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0.5,
    max_retries=2,
    api_key=api_key,
    verbose=True
)

from langchain.cache import RedisSemanticCache
from langchain.globals import set_llm_cache

# Semantic cache
set_llm_cache(
    RedisSemanticCache(redis_url=REDIS_URL, embedding=embeddings, score_threshold=0.05)
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=redis.as_retriever(search_type="similarity_distance_threshold",search_kwargs={"distance_threshold":0.5}),
    #return_source_documents=True,
    chain_type_kwargs={"prompt": create_prompt()},
    verbose=True
    )

In [31]:
%%time
qa.invoke('What are some motivations for shopping online?')['result']

CPU times: user 104 ms, sys: 4.92 ms, total: 109 ms
Wall time: 1.91 s


' Planned purchases with specific timelines, wider range of price options and brands available, absence of physical stores for premium brands, stockouts of certain products, lack of knowledgeable staff in offline stores, lack of discounts and special offers in physical stores, and browsing through online shopping platforms during free time.'

In [32]:
%%time
qa.invoke('What are the some drivers for Indians to shop online?')['result']

CPU times: user 33.2 ms, sys: 2.92 ms, total: 36.1 ms
Wall time: 475 ms


{'query': 'What are the some drivers for Indians to shop online?',
 'result': ' Planned purchases with specific timelines, wider range of price options and brands available, absence of physical stores for premium brands, stockouts of certain products, lack of knowledgeable staff in offline stores, lack of discounts and special offers in physical stores, and browsing through online shopping platforms during free time.'}

In [33]:
%%time
qa.invoke('How do Indians like to pay for shopping online?')['result']

CPU times: user 90.2 ms, sys: 5.03 ms, total: 95.2 ms
Wall time: 1.81 s


' Indians like to pay for shopping online using UPI, cash on delivery, credit/debit cards, EMI, and e-wallets.'

In [17]:
%%time
qa.invoke('How do Indians prefer to pay when they shop online?')['result']

CPU times: user 35.9 ms, sys: 2.88 ms, total: 38.7 ms
Wall time: 522 ms


' UPI, cash on delivery, credit/debit cards, EMI, and e-wallets are all popular methods of payment for online shopping in India. However, cash on delivery remains the preferred option among rest of India consumers due to concerns about fraud.'

In [28]:
%%time
qa.invoke('What are some known challenges in shopping online?')['result']

CPU times: user 46.6 ms, sys: 743 µs, total: 47.3 ms
Wall time: 1.25 s


' Some known challenges in shopping online include concerns about payment fraud, credibility of unfamiliar websites, doubts regarding product quality matching the images shown, logistical overload, regulations, access restrictions, and inconsistent delivery experiences.'

In [27]:
%%time
qa.invoke('What are some known challenges when shopping online?')['result']

CPU times: user 24.8 ms, sys: 26 µs, total: 24.9 ms
Wall time: 576 ms


' Concerns about payment fraud, credibility of unfamiliar websites, doubts regarding product quality matching the images shown, logistical overload, regulations, access restrictions, and inconsistent experiences with deliveries.'

In [29]:
%%time
qa.invoke('How home and kitchen segment is growing?')['result']

CPU times: user 71.1 ms, sys: 2.16 ms, total: 73.2 ms
Wall time: 1.97 s


' The home and kitchen segment in the Indian market is growing at a CAGR of 10%, driven by factors such as urbanization, a young population, and increasing aspirations of the middle class. The demand for modern and modular furniture solutions, expanding urban landscape, and growing preference for durable and versatile home spaces have also contributed to the growth in this category.'

In [30]:
%%time
qa.invoke('How home and kitchen segment is growing in India?')['result']

CPU times: user 28.3 ms, sys: 876 µs, total: 29.2 ms
Wall time: 389 ms


' The home and kitchen segment in the Indian market is growing at a CAGR of 10%, driven by factors such as urbanization, a young population, and increasing aspirations of the middle class. The demand for modern and modular furniture solutions, expanding urban landscape, and growing preference for durable and versatile home spaces have also contributed to the growth in this category.'

In [None]:
qa.invoke('What are the effects of social media on online shopping?')['result']

In [None]:
qa.invoke('What are some relevant items that are shopped online?')['result']

In [None]:
import gradio as gr

def handle(query):
    response = qa.run(query)
    return response

iface = gr.Interface(fn=handle, inputs="text", outputs="text")
iface.launch(share=True)

In [None]:
iface.close()