# `VKS document with llama3`


In [1]:
%load_ext autoreload
%autoreload 2

## On its own


In [2]:
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.converters import PDFMinerToDocument
from haystack.components.preprocessors import DocumentCleaner
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter

In [3]:
raw_document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

pipeline = Pipeline()
pipeline.add_component("converter", PDFMinerToDocument())
pipeline.add_component("cleaner", DocumentCleaner())
pipeline.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=20, split_overlap=15))
pipeline.add_component("writer", DocumentWriter(document_store=raw_document_store))

pipeline.connect("converter", "cleaner")
pipeline.connect("cleaner", "splitter")
pipeline.connect("splitter", "writer")

<haystack.core.pipeline.pipeline.Pipeline object at 0x79c9108ef140>
🚅 Components
  - converter: PDFMinerToDocument
  - cleaner: DocumentCleaner
  - splitter: DocumentSplitter
  - writer: DocumentWriter
🛤️ Connections
  - converter.documents -> cleaner.documents (List[Document])
  - cleaner.documents -> splitter.documents (List[Document])
  - splitter.documents -> writer.documents (List[Document])

In [4]:
file_names = ["sample.pdf"]
pipeline.run({"converter": {"sources": file_names}})

{'writer': {'documents_written': 687}}

In [5]:
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

In [6]:
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

Initalize a Document Embedder

In [7]:
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()

In [8]:
docs_with_embeddings = doc_embedder.run(raw_document_store.filter_documents())
document_store.write_documents(docs_with_embeddings["documents"])

Batches:   0%|          | 0/22 [00:00<?, ?it/s]

687

In [9]:
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

In [10]:
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

In [11]:
template = [ChatMessage.from_user("""
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
""")]

prompt_builder = ChatPromptBuilder(template=template)

In [12]:
chat_generator = OllamaChatGenerator(
    model="llama3.1:8b",
    streaming_callback=lambda chunk: print(chunk.content, end="", flush=True),
    url = "http://localhost:11434",
    generation_kwargs={
        "num_predict": 100,
        "temperature": 0.9})

Initialize retriever

In [13]:
retriever = InMemoryEmbeddingRetriever(document_store)

In [14]:
basic_rag_pipeline = Pipeline()
# Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)

In [15]:
# Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x79c77afa7710>
🚅 Components
  - text_embedder: SentenceTransformersTextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: ChatPromptBuilder
  - llm: OllamaChatGenerator
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.messages (List[ChatMessage])

# Asking question

In [16]:
question = "What is VKS?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(f"\n\n\n\n\n==========================\nAnswer: {response["llm"]["replies"][0].text}")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

VKS stands for VNGCloud Kubernetes Service. It appears to be a cloud-based platform that provides managed Kubernetes services, allowing users to create and manage clusters, nodes, volumes, load balancers, and other resources. VKS offers various features such as monitoring, event history, re-activation of default IAM service accounts, and garbage collection of unused containers and images.




Answer: VKS stands for VNGCloud Kubernetes Service. It appears to be a cloud-based platform that provides managed Kubernetes services, allowing users to create and manage clusters, nodes, volumes, load balancers, and other resources. VKS offers various features such as monitoring, event history, re-activation of default IAM service accounts, and garbage collection of unused containers and images.


In [17]:
question = "Compare VKS private clusters and VKS public clusters"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(f"\n\n\n\n\n\n\n================\nAnswer: {response["llm"]["replies"][0].text}")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Here's a comparison of VKS (Virtual Kubernetes Service) private clusters and public clusters:

**Security**

* Private Cluster: Provides higher security with all connections within VNG Cloud's private network, minimizing the risk of external network attacks.
* Public Cluster: Offers medium security as it uses public IP addresses to communicate between nodes and control plane.

**Access Management**

* Private Cluster: Offers strict access control through VNG Cloud's internal network.
* Public Cluster: Allows easier access from anywhere with internet but






Answer: Here's a comparison of VKS (Virtual Kubernetes Service) private clusters and public clusters:

**Security**

* Private Cluster: Provides higher security with all connections within VNG Cloud's private network, minimizing the risk of external network attacks.
* Public Cluster: Offers medium security as it uses public IP addresses to communicate between nodes and control plane.

**Access Management**

* Private Cluster: Offe

In [18]:
question = "What is vngcloud-blockstorage-csi-driver?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(f"\n\n\n\n\n\n\n====================\nAnswer: {response["llm"]["replies"][0].text}")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

vngcloud-blockstorage-csi-driver is a CSI (Container Storage Interface) driver developed by VNG Cloud, which allows Kubernetes clusters to manage and attach BlockStorage volumes from the VNG Cloud platform. It enables users to provision and use persistent storage for their containers in a Kubernetes environment.






Answer: vngcloud-blockstorage-csi-driver is a CSI (Container Storage Interface) driver developed by VNG Cloud, which allows Kubernetes clusters to manage and attach BlockStorage volumes from the VNG Cloud platform. It enables users to provision and use persistent storage for their containers in a Kubernetes environment.


In [19]:
question = "I want to create a pvc of 20Gi using vngcloud-blockstorage-csi-driver"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(f"\n\n\n\n\n==============================\nAnswer: {response["llm"]["replies"][0].text}")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

To create a Persistent Volume Claim (PVC) of 20Gi using the vngcloud-blockstorage-csi-driver, you would typically follow these steps:

1. **Create a Kubernetes cluster** on VNGCloud or use an existing one.
2. **Install Helm version 3.0 or higher**, as per the requirements for installing vngcloud-blockstorage-csi-driver.
3. **Add the vks-helm-charts repository** to your cluster using `helm repo add v




Answer: To create a Persistent Volume Claim (PVC) of 20Gi using the vngcloud-blockstorage-csi-driver, you would typically follow these steps:

1. **Create a Kubernetes cluster** on VNGCloud or use an existing one.
2. **Install Helm version 3.0 or higher**, as per the requirements for installing vngcloud-blockstorage-csi-driver.
3. **Add the vks-helm-charts repository** to your cluster using `helm repo add v


In [20]:
question = "How to use volume snapshot in VKS cluster"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(f"\n\n\n===========================\nAnswer: {response["llm"]["replies"][0].text}")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

To use Volume Snapshot in a VKS (VNGCloud Snapshot) cluster, follow these steps:

1. **Enable the Snapshot service**: Go to the VNGCloud console and select "Activate Snapshot Service" in the Block Store > Snapshot menu.
2. **Install Helm version 3.0 or higher**: Refer to the official Helm documentation for instructions on how to install Helm.
3. **Add the VKS Helm repository**: Run `helm repo add vks-helm-charts https://


Answer: To use Volume Snapshot in a VKS (VNGCloud Snapshot) cluster, follow these steps:

1. **Enable the Snapshot service**: Go to the VNGCloud console and select "Activate Snapshot Service" in the Block Store > Snapshot menu.
2. **Install Helm version 3.0 or higher**: Refer to the official Helm documentation for instructions on how to install Helm.
3. **Add the VKS Helm repository**: Run `helm repo add vks-helm-charts https://
