<a href="https://colab.research.google.com/github/Andrei-Larionov/RAGTest/blob/main/06-LlamaIndex_Redis%20/06.1_OpenAI_LlamaIndex_Redis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Question Answering with LlamaIndex, OpenAI and Redis

![Redis](https://redis.com/wp-content/themes/wpx/assets/images/logo-redis.svg?auto=webp&quality=85,75&width=120)

This notebook would use OpenAI, Redis with Vector Similarity Search and LlamaIndex to answer questions about the information contained in a document.

In [55]:
!pip install -q llama_index redis html2text trafilatura
!pip install -q pypdf

In [56]:


from llama_index import (
      TrafilaturaWebReader,
      GPTVectorStoreIndex,
      SimpleDirectoryReader,
      StorageContext,
      ServiceContext
    )
from llama_index.vector_stores import RedisVectorStore



In [57]:
import sys

import logging
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Initialize OpenAI. You need to supply the OpenAI API key (starts with `sk-...`) when prompted. You can find your API key at https://platform.openai.com/account/api-keys

In [59]:
import openai
import os
import getpass

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY","")
if OPENAI_API_KEY == "":
    key=getpass.getpass(prompt='OpenAI Key: ', stream=None)
    os.environ['OPENAI_API_KEY']=key

openai.api_key = os.getenv("OPENAI_API_KEY")

### Install Redis Stack

Redis Search will be used as Vector Similarity Search engine for LangChain. Instead of using in-notebook Redis Stack https://redis.io/docs/getting-started/install-stack/ you can provision your own free instance of Redis in the cloud. Get your own Free Redis Cloud instance at https://redis.com/try-free/

In [60]:
%%sh
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update  > /dev/null 2>&1
sudo apt-get install redis-stack-server  > /dev/null 2>&1
redis-stack-server --daemonize yes

deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb jammy main
Starting redis-stack-server, database path /var/lib/redis-stack


gpg: cannot open '/dev/tty': No such device or address
curl: (23) Failed writing body


### Connect to Redis

By default this notebook would connect to the local instance of Redis Stack. If you have your own Redis Cloud instance - replace REDIS_PASSWORD, REDIS_HOST and REDIS_PORT values with your own.

In [61]:
import redis
import os


REDIS_HOST = os.getenv("REDIS_HOST", "localhost")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD", "")
#Replace values above with your own if using Redis Cloud instance
#REDIS_HOST="redis-18374.c253.us-central1-1.gce.cloud.redislabs.com"
#REDIS_PORT=18374
#REDIS_PASSWORD="1TNxTEdYRDgIDKM2gDfasupCADXXXX"

#shortcut for redis-cli $REDIS_CONN command
if REDIS_PASSWORD!="":
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT} -a {REDIS_PASSWORD} --no-auth-warning"
else:
  os.environ["REDIS_CONN"]=f"-h {REDIS_HOST} -p {REDIS_PORT}"

REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"



In [31]:
%%sh
rm -rf docs
if [ ! -d "docs" ]
then
  mkdir docs
  wget https://raw.githubusercontent.com/Andrei-Larionov/RAGTest/main/General%20ledger%20users%20guide.pdf?raw=true \
  -O ./docs/General%20ledger%20users%20guide.pdf -P docs
fi

--2023-09-25 21:56:38--  https://raw.githubusercontent.com/Andrei-Larionov/RAGTest/main/General%20ledger%20users%20guide.pdf?raw=true
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6751120 (6.4M) [application/octet-stream]
Saving to: ‘./docs/General%20ledger%20users%20guide.pdf -P docs’

     0K .......... .......... .......... .......... ..........  0%  656K 10s
    50K .......... .......... .......... .......... ..........  1% 4.21M 6s
   100K .......... .......... .......... .......... ..........  2% 1.85M 5s
   150K .......... .......... .......... .......... ..........  3% 5.63M 4s
   200K .......... .......... .......... .......... ..........  3% 7.33M 3s
   250K .......... .......... .......... .......... ..........  4% 9.04M 3s
   300K .......... ..

In [70]:
documents = SimpleDirectoryReader('./docs').load_data()

In [72]:
help(documents)

Help on list object:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate sign

### Load web documents

Load web documents that would be used to answer questions. Feel free to replace the links with the ones you would like to use.

In [8]:
#documents = TrafilaturaWebReader().load_data(
#    [
#        "https://www.cnn.com/2023/05/18/media/disney-florida-desantis/index.html",
#        "https://www.cnn.com/2022/11/12/business/disney-hiring-freeze-job-cuts/index.html"
#        ]
#)


In [9]:
# optionally examine the retrieved documents
#documents

### Create vector store using Redis as Vector Database

In [73]:
print(f"Using Redis address: {REDIS_URL}")
vector_store = RedisVectorStore(
    index_name="docs",
    index_prefix="orcl",
    redis_url=REDIS_URL,
    overwrite=True
)
vector_store.client.ping()

Using Redis address: redis://:@localhost:6379


True

In [74]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(chunk_size=100, chunk_overlap=20)
index = GPTVectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

## Finally - let's ask questions!

Examples:
- What plans is Disney cancelling?
- Who is Bob Chapek?
- Why Disney cancelling the plans?

In [75]:
query_engine = index.as_query_engine()
response = query_engine.query("I am very confused about what flexfield is, and what is the difference between descriptive and key. Can you explain please")
print(str(response).replace(".", ".\n"))

Flexfields are a feature in Oracle General Ledger that allow you to customize and tailor the application to fit your organization's unique information needs.
 There are two types of flexfields: descriptive flexfields and key flexfields.


Descriptive flexfields are used to collect additional information that is specific to your organization.
 They allow you to define segments and prompt for additional information based on previous entries.
 For example, you can define a descriptive flexfield to collect information about budget organizations, such as the manager and the size of the organization.
 Descriptive flexfields can be global, collecting the same information all the time, or context-sensitive, collecting different information depending on the situation.


Key flexfields, on the other hand, are used to define the structure of your General Ledger accounts.
 They allow you to design an account structure that best meets the needs of your organization.
 Key flexfields enable you to de