<a href="https://colab.research.google.com/github/Andrei-Larionov/RAGTest/blob/main/2023-11-30%20RAG%20demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Question Answering with LlamaIndex, OpenAI and Redis

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook would use OpenAI, Redis with Vector Similarity Search and LlamaIndex to answer questions about the information contained in a document.

In [1]:
!pip install -q llama_index redis html2text trafilatura
!pip install -q pypdf


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m914.3/914.3 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.3/250.3 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m51.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m220.9/220.9 kB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m73.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.5/44.5 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━

In [3]:


from llama_index import (
      GPTVectorStoreIndex,
      SimpleDirectoryReader,
      StorageContext,
      ServiceContext
    )
from llama_index.vector_stores import RedisVectorStore



In [4]:
import sys

import logging
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Initialize OpenAI. You need to supply the OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [5]:
import openai
import os
import getpass

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY","")
if OPENAI_API_KEY == "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":
    key=getpass.getpass(prompt='OpenAI Key: ', stream=None)
    os.environ['OPENAI_API_KEY']=key

openai.api_key = os.getenv("OPENAI_API_KEY")

In [16]:
import os
import openai
import os
openai.api_key = "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"

### Install Redis Stack

Redis Search will be used as Vector Similarity Search engine for LangChain. Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [6]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [7]:
import redis
import os


REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"



In [None]:
%%sh
rm -rf docs
if [ ! -d "docs" ]
then
  mkdir docs
  wget https://raw.githubusercontent.com/Andrei-Larionov/RAGTest/main/General%20ledger%20users%20guide.pdf?raw=true \
  -O ./docs/General%20ledger%20users%20guide.pdf -P docs
fi

--2023-09-25 21:56:38--  https://raw.githubusercontent.com/Andrei-Larionov/RAGTest/main/General%20ledger%20users%20guide.pdf?raw=true
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6751120 (6.4M) [application/octet-stream]
Saving to: ‘./docs/General%20ledger%20users%20guide.pdf -P docs’

     0K .......... .......... .......... .......... ..........  0%  656K 10s
    50K .......... .......... .......... .......... ..........  1% 4.21M 6s
   100K .......... .......... .......... .......... ..........  2% 1.85M 5s
   150K .......... .......... .......... .......... ..........  3% 5.63M 4s
   200K .......... .......... .......... .......... ..........  3% 7.33M 3s
   250K .......... .......... .......... .......... ..........  4% 9.04M 3s
   300K .......... ..

In [8]:
documents = SimpleDirectoryReader('./docs').load_data()

In [11]:
#documents

In [10]:
#help(documents)

### Load web documents

Load web documents that would be used to answer questions. Feel free to replace the links with the ones you would like to use.

In [None]:
#documents = TrafilaturaWebReader().load_data(
#    [
#        "https://www.cnn.com/2023/05/18/media/disney-florida-desantis/index.html",
#        "https://www.cnn.com/2022/11/12/business/disney-hiring-freeze-job-cuts/index.html"
#        ]
#)


In [None]:
# optionally examine the retrieved documents
#documents

### Create vector store using Redis as Vector Database

In [12]:
print(f"Using Redis address: {REDIS_URL}")
vector_store = RedisVectorStore(
    index_name="docs",
    index_prefix="orcl",
    redis_url=REDIS_URL,
    overwrite=True
)
vector_store.client.ping()

Using Redis address: redis://:@localhost:6379


True

In [17]:

storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(chunk_size=100, chunk_overlap=20)
index = GPTVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [19]:
from IPython.display import Markdown, display

In [21]:
query_engine = index.as_query_engine()
response = query_engine.query("how can i use In-Situ Adaptive Tabulation in ansys fluent. Please provide detailed instructions")
display(Markdown(f"<b>{response}</b>"))


<b>To use In-Situ Adaptive Tabulation in Ansys Fluent, follow these steps:

1. Open Ansys Fluent and load your simulation case.
2. Go to the "Adaptive" tab in the Fluent interface.
3. Click on the "In-Situ Adaptive Tabulation" option.
4. In the In-Situ Adaptive Tabulation window, you will find various settings and options.
5. Specify the variables that you want to use for tabulation by selecting them from the available options.
6. Set the desired refinement criteria for the tabulation process.
7. Adjust the refinement levels and other parameters according to your simulation requirements.
8. Click on the "Generate" button to start the tabulation process.
9. Once the tabulation is complete, you can use the generated tabulated data for further analysis or visualization.

Please note that the specific steps and options may vary depending on the version of Ansys Fluent you are using. It is recommended to refer to the official Ansys Fluent documentation or user guide for detailed instructions specific to your version.</b>

In [22]:
response

Response(response='To use In-Situ Adaptive Tabulation in Ansys Fluent, follow these steps:\n\n1. Open Ansys Fluent and load your simulation case.\n2. Go to the "Adaptive" tab in the Fluent interface.\n3. Click on the "In-Situ Adaptive Tabulation" option.\n4. In the In-Situ Adaptive Tabulation window, you will find various settings and options.\n5. Specify the variables that you want to use for tabulation by selecting them from the available options.\n6. Set the desired refinement criteria for the tabulation process.\n7. Adjust the refinement levels and other parameters according to your simulation requirements.\n8. Click on the "Generate" button to start the tabulation process.\n9. Once the tabulation is complete, you can use the generated tabulated data for further analysis or visualization.\n\nPlease note that the specific steps and options may vary depending on the version of Ansys Fluent you are using. It is recommended to refer to the official Ansys Fluent documentation or user gu