# Using Redis and Azure OpenAI to chat with PDF documents

This notebook demonstrates how to use RedisAI and Azure OpenAI to chat with PDF documents. The PDF included is
a informational documents about AI / ML published by SDAIA.

In this notebook, we will use LLamaIndex to chunk, vectorize, and store the PDF document in Redis as vectors
alongside associated text. The query interface provided by LLamaIndex will be used to search for relevant
information given queries from the user.

In [None]:
# Install the requirements
%pip install redis pypdf PyPDF2 python-dotenv transformers tiktoken llama_index==0.6.8

In [1]:
import os
import textwrap
import openai
from langchain.llms import AzureOpenAI, OpenAI
from langchain.embeddings import OpenAIEmbeddings
from llama_index.vector_stores import RedisVectorStore
from llama_index import LangchainEmbedding
from llama_index import (
    GPTVectorStoreIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    PromptHelper,
    ServiceContext,
    StorageContext
)
import sys

import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


In [2]:
# load the .env file in the parent directory into the current environment
from dotenv import load_dotenv
load_dotenv('./.env')

True

# Azure OpenAI 

The notebook allows the user to use Azure OpenAI endpoints. Make sure to follow the instructions in the README and set the .env correctly according to Key and Endpoint from Portal Azure API you are using. 

## Azure OpenAI 

Here we setup the AzureOpenAI models and API keys that we set by reading from the environment above. The ``PromptHelper`` sets the parameters for the OpenAI model. The classes defined here are used together to provide a QnA interface between the user and the LLM.

In [3]:
# setup Llama Index to use Azure OpenAI
openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_API_BASE")
openai.api_version = "2022-12-01"
openai.api_key = os.getenv("OPENAI_API_KEY")

# Get the OpenAI model names ex. "text-embedding-ada-002"
embedding_model = os.getenv("OPENAI_EMBEDDING_MODEL")
text_model = os.getenv("OPENAI_TEXT_MODEL")


print(f"Using models: {embedding_model} and {text_model}")

# get the Azure Deployment name for the model
embedding_model_deployment = os.getenv("AZURE_EMBED_MODEL_DEPLOYMENT_NAME")
text_model_deployment = os.getenv("AZURE_TEXT_MODEL_DEPLOYMENT_NAME")

print(f"Using deployments: {embedding_model_deployment} and {text_model_deployment}")


Using models: text-embedding-ada-002 and gpt-35-turbo
Using deployments: embedding-model and gpt35-model


In [4]:

llm = AzureOpenAI(deployment_name=text_model_deployment, model_kwargs={
    "api_key": openai.api_key,
    "api_base": openai.api_base,
    "api_type": openai.api_type,
    "api_version": openai.api_version,
})
llm_predictor = LLMPredictor(llm=llm)

embedding_llm = LangchainEmbedding(
    OpenAIEmbeddings(
        model=embedding_model,
        deployment=embedding_model_deployment,
        openai_api_key= openai.api_key,
        openai_api_base=openai.api_base,
        openai_api_type=openai.api_type,
        openai_api_version=openai.api_version,
    ),
    embed_batch_size=1,
)

### LLamaIndex

[LlamaIndex](https://github.com/jerryjliu/llama_index) (GPT Index) is a project that provides a central interface to connect your LLM's with external data sources. It provides a simple interface to vectorize and store embeddings in Redis, create search indices using Redis, and perform vector search to find context for generative models like GPT.

Here we will use it to load in the documents (Chevy Colorado Brochure).

In [14]:
# load documents
documents = SimpleDirectoryReader('./docs').load_data()
print('Document ID:', documents[0].doc_id)

Documents Count:  1038


Llamaindex also works with frameworks like langchain to make prompting and other aspects of a chat based application easier. Here we can use the ``PromptHelper`` class to help us generate prompts for the (Azure) OpenAI model. The will be off by default as it can be tricky to setup correctly.

In [6]:
# set number of output tokens
num_output = int(os.getenv("OPENAI_MAX_TOKENS"))
# max LLM token input size
max_input_size = int(os.getenv("CHUNK_SIZE"))
# set maximum chunk overlap
max_chunk_overlap = int(os.getenv("CHUNK_OVERLAP"))

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

In [7]:

# define the service we will use to answer questions
# if you executive the Azure OpenAI code above, your Azure Models and creds will be used and the same for OpenAI
service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    embed_model=embedding_llm,
#    prompt_helper=prompt_helper # uncomment to use prompt_helper.
)

## Initialize Redis as a Vector Database

Now we have our documents read in, we can initialize the ``RedisVectorStore``. This will allow us to store our vectors in Redis and create an index.

The ``GPTVectorStoreIndex`` will then create the embeddings from the text chunks by calling out to OpenAI's API. The embeddings will be stored in Redis and an index will be created.

NOTE: If you didn't set the ``OPENAI_API_KEY`` environment variable, you will get an error here.

In [8]:
def format_redis_conn_from_env(using_ssl=False):
    start = "rediss://" if using_ssl else "redis://"
    # if using RBAC
    password = os.getenv("REDIS_PASSWORD", None)
    username = os.getenv("REDIS_USERNAME", "default")
    if password != None:
        start += f"{username}:{password}@"

    return start + f"{os.getenv('REDIS_ADDRESS')}:{os.getenv('REDIS_PORT')}"


# make using_ssl=True to use SSL with ACRE
redis_address = format_redis_conn_from_env(using_ssl=False)

print(f"Using Redis address: {redis_address}")
vector_store = RedisVectorStore(
    index_name="chevy_docs",
    index_prefix="blog",
    redis_url=redis_address,
    overwrite=True
)

# access the underlying client in the RedisVectorStore implementation to ping the redis instance
vector_store.client.ping()

Using Redis address: redis://default:@localhost:6379


True

In [9]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    service_context=service_context
)

Token indices sequence length is longer than the specified maximum sequence length for this model (1179 > 1024). Running this sequence through the model will result in indexing errors


INFO:llama_index.vector_stores.redis:Deleting index chevy_docs
Deleting index chevy_docs
INFO:llama_index.vector_stores.redis:Creating index chevy_docs
Creating index chevy_docs
INFO:llama_index.vector_stores.redis:Added 1933 documents to index chevy_docs
Added 1933 documents to index chevy_docs
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 1483011 tokens
> [build_index_from_nodes] Total embedding token usage: 1483011 tokens


## Start Querying information from the Document

Now that we have our document stored in the index, we can ask questions against the index. The index will use the data stored in itself as the knowledge base for chatgpt.

In [15]:
query_engine = index.as_query_engine()
response = query_engine.query("ููุญุฉ ุนู ุฌููุฏ ุฃููุงููุง ูู ุชุทููุฑ ุงูุฐูุงุก ุงูุงุตุทูุงุนู")
print("\n", textwrap.fill(str(response), 100))

INFO:llama_index.vector_stores.redis:Querying index chevy_docs
Querying index chevy_docs
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['blog_d1184027-d6c2-492b-9121-8d8d369388dc', 'blog_6f3f67e1-2c5f-46f1-b858-02419df98d9d']
Found 2 results for query with id ['blog_d1184027-d6c2-492b-9121-8d8d369388dc', 'blog_6f3f67e1-2c5f-46f1-b858-02419df98d9d']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 47 tokens
> [retrieve] Total embedding token usage: 47 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 2482 tokens
> [get_response] Total LLM token usage: 2482 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

 ุงูุฐูุงุก ุงูุงุตุทูุงุนู (A

In [17]:
response = query_engine.query("ูุง ุงูุฌููุฏ ุงูุชู ุจุฐูุชูุง ุงูุตูู ูู ุชุทููุฑ ุงูุฐูุงุก ุงูุงุตุทูุงุนู?")
print("\n", textwrap.fill(str(response), 100))

INFO:llama_index.vector_stores.redis:Querying index chevy_docs
Querying index chevy_docs
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['blog_d1184027-d6c2-492b-9121-8d8d369388dc', 'blog_fb789e7d-32ba-4483-b4b2-b83589500f75']
Found 2 results for query with id ['blog_d1184027-d6c2-492b-9121-8d8d369388dc', 'blog_fb789e7d-32ba-4483-b4b2-b83589500f75']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 51 tokens
> [retrieve] Total embedding token usage: 51 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 2469 tokens
> [get_response] Total LLM token usage: 2469 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

 ุงุณุชุฎุฏุงู ุงูุฐูุงุก ุงูุง

In [18]:
response = query_engine.query("ูุงูู ุงูุฌููุฏ ุงููุทููุฉ ูู ูุฌุงู ุงูุฐูุงุก ุงูุงุตุทูุงุนู?")
print("\n", textwrap.fill(str(response), 100))

INFO:llama_index.vector_stores.redis:Querying index chevy_docs
Querying index chevy_docs
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['blog_6f3f67e1-2c5f-46f1-b858-02419df98d9d', 'blog_84c122d7-bf27-4422-b1c7-140e6d03083c']
Found 2 results for query with id ['blog_6f3f67e1-2c5f-46f1-b858-02419df98d9d', 'blog_84c122d7-bf27-4422-b1c7-140e6d03083c']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 41 tokens
> [retrieve] Total embedding token usage: 41 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1850 tokens
> [get_response] Total LLM token usage: 1850 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> [get_response] Total embedding token usage: 0 tokens

 ุงูุฌููุฏ ุงููุทููุฉ ูู ู

In [12]:
llm('ุนุทููู ููุชุฉ')

'")\n\n# C:\\Python27\\python.exe -u "c:\\Users\\Ahmed\\Desktop\\python\\if_elif_else.py"\n# ูุฑุญุจุง ุจู ุจูุนุจุฉ ุงูููุช\n# ุฃูุชุจ ููุชุฉ: \n# ูุญุดุด ุณุฃู ุฃุจูู: ุงูุจูุฒูู ุจูุชุญุฑู ุจุงูููุชุฑุ ูุงู: ูุงููุฏู ุงูุจูุฒูู ุจูุชุญุฑู ุจุงูููุชุฑ.. ููู ุงููู ูุชุญุฑู ูู ุงูููุชุฑ\n# ุงูููุชุฉ ูุถุญูุฉ ุฌุฏุง! ูููููููููููููู\n# ูุฑุญุจุง ุจู ุจูุนุจุฉ ุงูููุช\n# ุฃูุชุจ ููุชุฉ: \n# ููุชู ุงุฎุฑู\n# ุงูููุชุฉ ูุถุญูุฉ ุฌุฏุง! ูููููููููููููู\n# ูุฑุญุจุง ุจู ุจูุนุจุฉ ุงูููุช\n# ุฃูุช'

In [16]:
llm_result = llm.generate(["ุนุทููู ููุชุฉ", "ุฎุจุฑูู ููุชุจุฉ"]*15)

In [17]:
len(llm_result.generations)

30

In [18]:
llm_result.generations[0]

[Generation(text=' ุงุญูู ูู ุงูุซุงููุฉ๐๐๐\n\nููุด ุงูุดุงูุจ ุจูุญุจ ุงูุญุฑุจ ุุุุุุุุุุ ูุงูู ุจูุดุฑุจ ุดุงู ุงูุจูุฏููุฉ ๐๐๐\n\nุจููููู ุทุงูุจ ูุด ุนุงุฑู ูุชููู ุงูุฌููุฒู ูุนุฏ ูุชุนูู ุงูููุฌุงุช ูุจุนุฏ ุณูุชูู ูุงุจู ุงูุฌููุฒู ูุนุฏ ูุชููู ุงูููุฌุงุช ุจุณ ุงูุฌููุฒู ุจูุทููุง ุตุญ\n\nุฒูุฌุฉ ุชููู ูุฒูุฌูุง: ุงุฐุง ุจุชุนุฑู ุงูุง ุงูุด ุงุณูู ุงุทููููุุุุุ ูุงููุง : ููุถูุน ูุด ูุทุฑูุญ ุงูุณูุฏุฉ๐๐๐๐\n\nุงุณุชุงุฐ ุจุญูููู ุนู ูุฑุฉ', generation_info={'finish_reason': 'length', 'logprobs': None})]

In [11]:
response

Response(response="{'question': 'ุนุทูู ููุญุฉ ุชุงุฑูุฎูุฉ ุนู ุชุทูุฑ ุงูููุงุฐุฌ ููุบููุฉ ุงููุจูุฑุฉ', 'context': 'ุงูุชุชุงุญูุฉ\\nูููุฑ ุงูุนุงูููู ุจุชุญููููุงุช ูุจูููุฑุฉ ุฎููุงู ุงูุนุงููููู ุงููุงุถูููููุ ููููุน ุฒูุงู ุงูุฌุงุฆุญููุฉ ูุงูุนูููุฏุฉ ุฅูููู ุงููุถููุน ุงูุทุจูุนููู \\nูููู ุชุนููุฏ ูู ุงูุฃุดูููุงุก ุฅูููู ุทุจูุนุชูููุงุ ููููุฏ ุดูููุฏ ุงูุนุงูููู ุชูุฏูููุงู... ุงูุงุตุทูุงุนููู ุฎ", source_nodes=[NodeWithScore(node=Node(text='page_label: 50\n\nุณูุงู ุงููุตุ ูุฃู ุงูุฃูุฑ ุฃุนูู ูุฃุจุนุฏ ูู ุจูุงุก ูููุฐุฌ ูููุณ ุงูุฑูุงุจุท ุจูู ุงููุตูุต ููุทุ ูุฃูุฏ ุฃููุง ูุง ุฒููุง ุจุญุงุฌุฉ ุฅูู ูููุฐุฌ ููุง ูุงูุด ุงูุญุงุฌุฉ ุฅูู ูุนุงููุฑ ููุงุณ ุฏูููุฉ ุชุณุงุนุฏ ุนูู ูุนุฑูุฉ ูุณุชูู ุชูุฏู ุงูุขูุฉ ูู ูุญุงูุงุชูุง ููุจุดุฑ ุจุฅุดุฑุงู ุนููุงุก ุงูููุณ ุงูุฅ