## Creating an index and populating it with documents using Redis

Simple example on how to ingest PDF documents, then web pages content into a Redis VectorStore.

Requirements:
- A Redis cluster
- A Redis database with at least 2GB of memory (to match with the initial index cap)

In [None]:
!pip install langchain langchain-community sentence-transformers elasticsearch pypdf langchain-elasticsearch

### Base parameters, the Redis info

In [28]:
es_url = "http://es-vectordb-es-http:9200"
es_user = "elastic"
es_password = "xxxxxxxxxxxxxxxxxxxxxxx"
index = "docs"

#### Imports

In [31]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_elasticsearch.vectorstores import ElasticsearchStore

## Initial index creation and document ingestion

#### Document loading from a folder containing PDFs

In [32]:
pdf_folder_path = 'rhods-doc'

loader = PyPDFDirectoryLoader(pdf_folder_path)
docs = loader.load()

#### Split documents into chunks with some overlap

In [33]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                               chunk_overlap=40)
all_splits = text_splitter.split_documents(docs)

#### Create the index and ingest the documents

In [35]:
embeddings = HuggingFaceEmbeddings()
rds = ElasticsearchStore(embedding=embeddings,
                         es_url=es_url,
                         es_user=es_user,
                         es_password=es_password,
                         index_name=index)


Ingest documents

In [None]:
rds.add_documents(all_splits)

#### Write the schema to a yaml file to be able to open the index later on

In [6]:
#rds.write_schema("redis_schema.yaml")

## Ingesting new documents

#### Example with Web pages

In [None]:
from langchain.document_loaders import WebBaseLoader

In [8]:
loader = WebBaseLoader(["https://ai-on-openshift.io/getting-started/openshift/",
                        "https://ai-on-openshift.io/getting-started/opendatahub/",
                        "https://ai-on-openshift.io/getting-started/openshift-data-science/",
                        "https://ai-on-openshift.io/odh-rhods/configuration/",
                        "https://ai-on-openshift.io/odh-rhods/custom-notebooks/",
                        "https://ai-on-openshift.io/odh-rhods/nvidia-gpus/",
                        "https://ai-on-openshift.io/odh-rhods/custom-runtime-triton/",
                        "https://ai-on-openshift.io/odh-rhods/openshift-group-management/",
                        "https://ai-on-openshift.io/tools-and-applications/minio/minio/"
                       ])

In [9]:
data = loader.load()

In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024,
                                               chunk_overlap=40)
all_splits = text_splitter.split_documents(data)

In [None]:
embeddings = HuggingFaceEmbeddings()
rds = ElasticsearchStore.from_documents(all_splits,
                           embeddings,
                           es_url=es_url,
                           es_user=es_user,
                           es_password=es_password,
                           index_name=index_name)

In [None]:
rds.add_documents(all_splits)