## Knowledge Creation - Storing docs on ChromaDB instance

In [2]:
from chromadb import HttpClient
from dotenv import load_dotenv
import os

load_dotenv()

chroma_client = HttpClient(host=os.environ['HOST'], port=8000)
chroma_client

<chromadb.api.client.Client at 0x7fbbf176ae70>

In [3]:
chroma_client.list_collections()

[]

In [5]:
# Now let's load and parse the word files using langchain
from langchain_community.document_loaders import Docx2txtLoader, DirectoryLoader

loader = DirectoryLoader(
    path="./RH_Docs",
    glob="**/*",
    loader_kwargs={
        ".docx": Docx2txtLoader
    }
)

docs = loader.load()
print(f"Loaded {len(docs)} documents!")

Loaded 2 documents!


In [6]:
# Let's chunk these docs
from langchain.text_splitter import RecursiveCharacterTextSplitter

txt_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", "."],
    chunk_size=8192,
    chunk_overlap=0
)

doc_chunks = txt_splitter.split_documents(docs)
print(f"Total chunks: {len(doc_chunks)}")

Total chunks: 13


In [7]:
# Now as we got the chunks, let's load them to vector database 
# along with the embedding function to embed the chunks to vectors
from langchain_chroma.vectorstores import Chroma
from langchain_ollama.embeddings import OllamaEmbeddings

EMBEDDING_MODEL="bge-m3:latest"

vector_store = Chroma(
    collection_name="red_hat",
    embedding_function=OllamaEmbeddings(model=EMBEDDING_MODEL, base_url=f"http://{os.environ['HOST']}:11434"),
    client=chroma_client
)

chunk_ids = vector_store.add_documents(doc_chunks)
print(chunk_ids)

['ee573577-bb36-4331-8d25-ba863ae091de', 'b295135e-70ae-4064-bffc-412daf7bfa8e', '306fbc5f-6851-4063-a3fa-f48122741403', '668dd304-cd2f-4d67-822f-bbe8de85a218', 'ef45a6f6-bd5f-4d89-ba62-2ab0890fda3a', 'e035088f-e149-4eae-8391-650bddca8929', '22cc3995-4911-451f-bb7d-04f65c3f1928', 'bb3308b8-42cc-4e80-8c7f-15450210564f', 'aa2564fb-53ae-40af-b09a-60e2991a6d11', '2fbebd78-a6e2-4b1e-858a-38c04dc1eede', 'adf4f1dd-d5aa-4280-8b43-c3410630b59b', '72d98749-61fe-4448-8832-6b870132f4fd', 'b0942fe9-fbf9-4d5e-98e2-6cd458cf3a2e']


In [8]:
top_searches = vector_store.similarity_search(query="Tell me about RHOCP?", k=2)

for search in top_searches:
    print(search.page_content)

Back up your cluster’s etcd data by performing a single invocation of the backup script on a control plane host.

Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.

Decisions:

1.	CUSTOMER Team has confirmed to take etcd backup and store in NFS Server. Details are added in the Pre-Req sheet.

Note: At This point of Time, complete rollback up of the Openshift cluster is not supported. For more information please refer to:

https://docs.openshift.com/container-platform/4.14/backup_and_restore/control_plane_backup_an d_restore/backing-up-etcd.html

12 DNS requirements(Section 23.1.5):

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html/installing/installing-on-an y-platform#installation-user-provisioned-validating-dns_installing-platform-agnostic



Software Versions

Product Version OpenShift Container Platform 4.14 OpenShift Data Foundation 4.14 Red Hat Enterprise Linux 9.x

Hardware S

#### Similarity Search Test

In [9]:
top_searches = vector_store.similarity_search(query="Tell me about bastion?", k=2)

for search in top_searches:
    print(search.page_content)

Back up your cluster’s etcd data by performing a single invocation of the backup script on a control plane host.

Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.

Decisions:

1.	CUSTOMER Team has confirmed to take etcd backup and store in NFS Server. Details are added in the Pre-Req sheet.

Note: At This point of Time, complete rollback up of the Openshift cluster is not supported. For more information please refer to:

https://docs.openshift.com/container-platform/4.14/backup_and_restore/control_plane_backup_an d_restore/backing-up-etcd.html

12 DNS requirements(Section 23.1.5):

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html/installing/installing-on-an y-platform#installation-user-provisioned-validating-dns_installing-platform-agnostic



Software Versions

Product Version OpenShift Container Platform 4.14 OpenShift Data Foundation 4.14 Red Hat Enterprise Linux 9.x

Hardware S

In [10]:
top_searches = vector_store.similarity_search(query="Tell me about control plane?", k=2)

for search in top_searches:
    print(search.page_content)

Back up your cluster’s etcd data by performing a single invocation of the backup script on a control plane host.

Only save a backup from a single control plane host. Do not take a backup from each control plane host in the cluster.

Decisions:

1.	CUSTOMER Team has confirmed to take etcd backup and store in NFS Server. Details are added in the Pre-Req sheet.

Note: At This point of Time, complete rollback up of the Openshift cluster is not supported. For more information please refer to:

https://docs.openshift.com/container-platform/4.14/backup_and_restore/control_plane_backup_an d_restore/backing-up-etcd.html

12 DNS requirements(Section 23.1.5):

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/html/installing/installing-on-an y-platform#installation-user-provisioned-validating-dns_installing-platform-agnostic



Software Versions

Product Version OpenShift Container Platform 4.14 OpenShift Data Foundation 4.14 Red Hat Enterprise Linux 9.x

Hardware S