In [1]:
%pip install langchain pgvector psycopg2-binary tiktoken kubernetes

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [9]:
import os, json
from langchain.llms import HuggingFacePipeline
from langchain.llms import HuggingFaceTextGenInference
from langchain.document_loaders import TextLoader
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores.pgvector import PGVector

from kubernetes import client, config

In [20]:
# get the ip address of our Mistral 7B inference service
config.load_kube_config()
api_instance = client.CoreV1Api()

mistral_7b = api_instance.read_namespaced_service(name="mistral-7b-service", namespace="default")
mistral_7b_ip = mistral_7b.status.load_balancer.ingress[0].ip

In [21]:
# test our Mistral 7B interface with langchain
llm = HuggingFaceTextGenInference(
    inference_server_url=f"http://{mistral_7b_ip}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("[INST] You are a helpful, respectful and honest assistant who is an expert in explaining Kubernetes concepts. Always answer as helpfully as possible, while being safe and keep your responses less than 200 words. What is a deployment?[/INST]")

' A deployment in Kubernetes is a way to manage the lifecycle of a set of replicas of a pod. It allows you to specify the desired state of the deployment, including the number of replicas, the desired CPU and memory resources, and the container image to use. Kubernetes will then automatically manage the deployment, scaling it up or down as needed to meet the desired state. Deployments also provide rolling updates, which means that new versions of the pods are introduced gradually, with each new version replacing one of the old ones. This ensures that the application remains available during the update process.'

In [4]:
# Load list of URLs -> kubernetes.io/docs/concepts/
file1 = open('./data/k8s-urls.samples.txt', 'r')

loader = WebBaseLoader(file1.readlines())
documents = loader.load()

In [44]:
# Chunk all the kubernetes concept documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
docs = text_splitter.split_documents(documents)

print("%s chunks in %s pages" % (len(docs), len(documents)))

102 chunks in 2 pages


In [6]:
# Load sentence transformer embeddings
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device":"cpu"} # use {"device":"cuda"} for distributed embeddings

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

In [7]:
# Connection string for connecting to Postgres
CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver=os.environ.get("PGVECTOR_DRIVER", "psycopg2"),
    host=os.environ.get("PGVECTOR_HOST", "localhost"),
    port=int(os.environ.get("PGVECTOR_PORT", "5432")),
    database=os.environ.get("PGVECTOR_DATABASE", "postgres"),
    user=os.environ.get("PGVECTOR_USER", "postgres"),
    password=os.environ.get("PGVECTOR_PASSWORD", "secretpassword"),
)

In [8]:
import sqlalchemy

print(sqlalchemy.__version__)

1.4.50


In [41]:
COLLECTION_NAME = "k8s_concepts"

db = PGVector.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)

In [None]:
# A better way with distributed jobs on Kubernetes - Queue Docs
! sh ./queue-documents.sh -f './data/k8s-urls.txt' -t k8s_concepts

In [None]:
# A better way with distributed jobs on Kubernetes
! sh ./deploy-indexer.sh

In [43]:
query = "What's a deployment?" # "Should I use gateway API in my app?"
docs = db.similarity_search(query)
print(f"Query: {query}")
print(f"Retrieved documents: {len(docs)}")
for doc in docs:
    doc_details = doc.to_json()['kwargs']
    print("Source: ", doc_details['metadata']['source'])
    print("Text: ", doc_details['page_content'], "\n")

Query: What's a deployment?
Retrieved documents: 4
Source:  https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
Text:  Create an issue
 Print entire sectionUse CaseCreating a DeploymentPod-template-hash labelUpdating a DeploymentRollover (aka multiple updates in-flight)Label selector updatesRolling Back a DeploymentChecking Rollout History of a DeploymentRolling Back to a Previous RevisionScaling a DeploymentProportional scalingPausing and Resuming a rollout of a DeploymentDeployment statusProgressing DeploymentComplete DeploymentFailed DeploymentOperating on a failed deploymentClean up PolicyCanary DeploymentWriting a Deployment SpecPod TemplateReplicasSelectorStrategyProgress Deadline SecondsMin Ready SecondsRevision History LimitPausedWhat's nextDocumentation
Blog
Training
Partners
Community 

Source:  https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
Text:  Create an issue
 Print entire sectionUse CaseCreating a DeploymentPod-template-hash labe

In [35]:
from langchain.prompts import PromptTemplate
prompt_template = """[INST] You are a helpful, respectful and honest assistant who is an expert in explaining Kubernetes concepts. Always answer as helpfully as possible, while being safe.
        Use the following pieces of context to answer the question. If you don't know the answer, just say that you don't know, don't try to make up an answer.
        
        {context}

        Question: {question}
        Answer:[/INST]"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

chain_type_kwargs = {"prompt": PROMPT}

In [38]:
retriever = db.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever,
    chain_type_kwargs=chain_type_kwargs,
    verbose=True
)

In [39]:
result = qa.run("When should I use the Gateway API?")
print(result)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
 The Gateway API is used to manage and configure ingress controllers in Kubernetes. It allows you to define and manage ingress resources, such as services and routes, and to configure the behavior of ingress controllers.

You should use the Gateway API when you need to manage and configure ingress resources in your Kubernetes cluster. This might include defining routes for incoming traffic, configuring load balancing and security policies, and managing the behavior of ingress controllers.

It's important to note that the Gateway API is still in the experimental phase and may not be suitable for all use cases. Additionally, it requires the use of an ingress controller, which may not be available in all environments.
