LangChain currently offers [two wrappers around Hugging Face LLMs](https://langchain.readthedocs.io/en/latest/ecosystem/huggingface.html), one for a local pipeline and one to access a hosted model in Hugging Face Hub.

Here we will use the local pipeline version, i.e. running the LLM locally.

## Setting up the model

In [1]:
import json
import os
import requests
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms.base import LLM
from langchain.prompts import PromptTemplate
from langchain.text_splitter import MarkdownTextSplitter
from langchain.vectorstores import Chroma
from typing import Optional, List, Mapping, Any

## Custom LLM

Here we define a custom LLM wrapper around the Hugging Face pipeline for LangChain to use

In [2]:
class ApiLLM(LLM):
    
    api_url: str

    api_params: dict

    @property
    def _llm_type(self) -> str:
        return "custom"
    
    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        """Call the LLM."""
        payload = json.dumps([prompt, self.api_params]) 
        response = requests.post(self.api_url, json={"data":[payload]}).json()
        return response["data"][0]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"api_url": self.api_url, "api_params": self.api_params}

In [3]:
# API URL
# get the endpoint from an environment value
# endpoint = os.environ.get("LLM_API_URL", None)
# or set it manually
endpoint = "https://text-generation-api-llm.apps.et-cluster.6mwp.p1.openshiftapps.com/api"

# Generation parameters
# Reference: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
params = {
    'max_new_tokens': 500,
    'do_sample': True,
    'temperature': 0.7,
    'top_p': 0.73,
    'typical_p': 1,
    'repetition_penalty': 1.0,
    'encoder_repetition_penalty': 1.0,
    'top_k': 0,
    'min_length': 10,
    'no_repeat_ngram_size': 0,
    'num_beams': 1,
    'penalty_alpha': 0,
    'length_penalty': 0,
    'early_stopping': False,
    'seed': -1,
    'add_bos_token': True,
    'custom_stopping_strings': [],
    'truncation_length': 2048,
    'ban_eos_token': False,
    'skip_special_tokens': True,
}

llm = ApiLLM(api_url=endpoint, api_params=params)

In [6]:
print(llm("This is a test"))

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

## Document indexing

In [6]:
# Note: using TextLoader here instead of UnstructuredMarkdownLoader. MarkdownTextSplitter will do the job of parsing markdown
loader = DirectoryLoader('../data/external', glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()  # FYI with the current dataset, documents[42] is the FAQ

text_splitter = MarkdownTextSplitter(chunk_overlap=0, chunk_size=500)  # Consider setting chunk_size=1000
texts = text_splitter.split_documents(documents)
print(f"{len(documents)} documents were loaded in {len(texts)} chunks")

74 documents were loaded in 10349 chunks


In [7]:
# Find the index of the longest text in texts
max_len = max(len(text.page_content) for text in texts)
max_len_idx = [i for i, text in enumerate(texts) if len(text.page_content) == max_len][0]
print(f"The longest text chunk is index {max_len_idx}, with lenght {max_len}")

The longest text chunk is index 334, with lenght 500


In [8]:
# show the biggest chunk
print(texts[max_len_idx])

page_content='```bash\noc create imagestream ruby\noc tag openshift/ruby:2.5-ubi8 ruby:2.5\necho << EOF | oc apply -f -\nkind: BuildConfig\napiVersion: build.openshift.io/v1\nmetadata:\n  name: ruby-sample-build\n  namespace: test-ecr-secret-operator\nspec:\n  runPolicy: Serial\n  source:\n    git:\n      uri: "https://github.com/openshift/ruby-hello-world"\n  strategy:\n    sourceStrategy:\n      from:\n        kind: "ImageStreamTag"\n        name: "ruby:2.5"\n      incremental: true\n  output:\n    to:\n      kind: "DockerImage"' metadata={'source': 'data\\external\\rh-mobb\\ecr-secret-operator-_index.md'}


In [9]:
embeddings = HuggingFaceEmbeddings()

# https://langchain.readthedocs.io/en/latest/modules/indexes/vectorstore_examples/chroma.html#persist-the-database
db_dir = "../data/interim"
docsearch = None
if os.path.isdir(os.path.join(db_dir, "index")):
    # Load the existing vector store
    docsearch = Chroma(persist_directory=db_dir, embedding_function=embeddings)
else:
    # Create a new vector store
    docsearch = Chroma.from_documents(texts[:1000], embeddings, persist_directory=db_dir)
    docsearch.persist()

  from .autonotebook import tqdm as notebook_tqdm
Using embedded DuckDB with persistence: data will be stored in: ./data/interim


## Chain definition

In [35]:
# template = """When learning about Red Hat OpenShift Service on AWS (ROSA), and considering the following context:
# ========= snippets start here =========
# {context}
# ========= spippets end here =========
# Given this question: {question}
# The answer to the question is:"""
#

# Use the following pieces of context to answer the question at the end. Answer with as much details and explanations as possible.
template = """You are a talkative AI model who loves to explain how things work. You are smart and constantly learning.
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Detailed answer:"""
qa_prompt = PromptTemplate(template=template, input_variables=["question", "context"])
chain_type_kwargs = {"prompt": qa_prompt}

qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), chain_type_kwargs=chain_type_kwargs)

: 

: 

## Question Answering

References:
- https://langchain.readthedocs.io/en/latest/modules/indexes/combine_docs.html for types of chains to combine documents
- https://langchain.readthedocs.io/en/latest/modules/indexes/chain_examples/question_answering.html for the QA example

In [None]:
queries = ['Where can I see a roadmap or make feature requests for the service?',
           'How is the pricing of Red Hat OpenShift Service on AWS calculated?',
           'Is there an upfront commitment?',
           'How can I delete ROSA cluster?',
           'Can I shut down my VMs temporarily?', # https://docs.openshift.com/rosa/rosa_architecture/rosa_policy_service_definition/rosa-service-definition.html#rosa-sdpolicy-instance-types_rosa-service-definition
           'How can I automatically deploy ROSA cluster?',
           'How can my ROSA cluster autoscale?',
           'How can I install aws load balancer controller',
           'How can I install Prometheus Operator with my ROSA cluster?',
           'What time is it?',
           'How can I federate metrics to a centralized Prometheus Cluster?',
           'What is the meaning of life?']

In [None]:
answers = []
for query in queries:
    answers.append(qa_chain(query))

# Print the answers
for result in answers:
    print("Question:", result["query"])
    # split the answer into chunks, separating context and answer
    answer = result["result"].split("Detailed answer:")[1]
    print("Answer: ", answer)



Question: Where can I see a roadmap or make feature requests for the service?
Answer:   https://ai.google.com/
Question: How is the pricing of Red Hat OpenShift Service on AWS calculated?
Answer:   not enough information
Question: Is there an upfront commitment?
Answer:   No
Question: How can I delete ROSA cluster?
Answer:   Delete ROSA cluster from the command line.
Question: Can I shut down my VMs temporarily?
Answer:   No
Question: How can I automatically deploy ROSA cluster?
Answer:   ROSA is a set of tools for building, running, and monitoring large distributed computing clusters.
Question: How can my ROSA cluster autoscale?
Answer:   You don't know
Question: How can I install aws load balancer controller
Answer:   you can download the Aws Load Balancer Controller from the Cloud Controller
Question: How can I install Prometheus Operator with my ROSA cluster?
Answer:   See the installation guide for ROSA cluster.
Question: What time is it?
Answer:   It is 1am
Question: How can I fe