# Mixtral-8x7B-Instruct-v0.1 + Haystack: build RAG pipelines🤘

###  Retrieval Augmented Generation pipeline , using the new powerful [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/blog/mixtral/) and [Haystack](https://github.com/deepset-ai/haystack) LLM orchestration framework.



<img src="https://codeandhack.com/wp-content/uploads/2023/12/Mixtral-8x7B-SMoE-Model.jpeg" width="270" style="display:inline;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="https://haystack.deepset.ai/images/haystack-ogimage.png" width="360" style="display:inline;">

In [None]:
#%%capture

!pip install farm-haystack[colab]

Collecting farm-haystack[colab]
  Downloading farm_haystack-1.23.0-py3-none-any.whl (764 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m764.4/764.4 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting boilerpy3 (from farm-haystack[colab])
  Downloading boilerpy3-1.0.7-py3-none-any.whl (22 kB)
Collecting events (from farm-haystack[colab])
  Downloading Events-0.5-py3-none-any.whl (6.8 kB)
Collecting httpx (from farm-haystack[colab])
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
Collecting lazy-imports==0.3.1 (from farm-haystack[colab])
  Downloading lazy_imports-0.3.1-py3-none-any.whl (12 kB)
Collecting posthog (from farm-haystack[colab])
  Downloading posthog-3.1.0-py2.py3-none-any.whl (37 kB)
Collecting prompthub-py==4.0.0 (from farm-haystack[colab])
  Downloading prompthub_py-4.0.0-py3-none-any.whl (6.9 kB)
Collecting quantulum3

In [None]:

HF_TOKEN = ""
#Get code from Hugging face

`from haystack.nodes import PreProcessor` with this we get preprocessing functionality to prepare our data.

`from haystack.nodes import PromptModel` gives  access to the functions of the prompt model.

`from haystack.nodes import PromptTemplate` provides  the flexibility to define custom prompt templates.

In [None]:
from haystack.nodes import PreProcessor,PromptModel, PromptTemplate, PromptNode

In [None]:
start_url ="https://user-guide.cloud-platform.service.justice.gov.uk/"

In [None]:
import requests
from bs4 import BeautifulSoup

from urllib.parse import urljoin, urlparse

visited_urls = set()

def scrape_page(url, depth=0, max_depth=1, all_texts=[]):
    if depth > max_depth or url in visited_urls:
        return all_texts

    visited_urls.add(url)
#     print(url)

    try:
        page = requests.get(url)
        page.raise_for_status()
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return all_texts

    soup = BeautifulSoup(page.content, 'html.parser')
    text = soup.get_text(separator=' ', strip=True)
    all_texts.append(text)

    links = soup.find_all('a')
    for link in links:
        href = link.get('href')
        if href and is_valid_link(href, url):
            absolute_url = urljoin(url, href)
            scrape_page(absolute_url, depth + 1, max_depth, all_texts)

    return all_texts

def is_valid_link(href, base_url):
    # Ignore fragment identifiers
    if href.startswith('#'):
        return False
    if href.startswith('mailto:'):
        return False
    # Construct absolute URL from relative URL
    absoluteURL=urljoin(base_url, href)
#     print(absoluteURL)
    return absoluteURL



all_texts = scrape_page(start_url)


output_file = 'scraped_texts.txt'

# Open the file in write mode and write each text to the file
with open(output_file, 'w', encoding='utf-8') as file:
    for text in all_texts:
        file.write(text + '\n\n')  # Adding two line breaks as a separator

print(f"Texts saved to {output_file}")

Texts saved to scraped_texts.txt


In [None]:
text=open(output_file,mode='r', encoding='utf-8').read()
print(len(text))

503099


In [None]:
print(text[:10000])

Cloud Platform user guide - Cloud Platform User Guide Skip to main content Cloud Platform User Guide Menu Feedback / Report a problem Documentation GitHub Table of contents Search (via Google) Search Cloud Platform user guide Overview Getting started Containers Databases Relational databases Key-value databases Storage Other topics OpenSearch Messaging Publish/subscribe Queue Custom domains Security Continuous deployment Observability Monitoring Logging Deprecations Other topics Tutorials Reference Cloud Platform Kubernetes Getting help Adding to the guide Cloud Platform user guide This user guide is for teams with applications or services deployed on, or
intending to deploy to, the Ministry of Justice’s Cloud Platform. Overview What is the Cloud Platform? What can I host on the Cloud Platform? Getting started Using the Cloud Platform CLI Creating a Cloud Platform environment Connecting to the Cloud Platform’s Kubernetes cluster Accessing the AWS console (read-only) Deploying an exampl

In [None]:
from haystack import Document

doc = Document(
    content=text
)



**Initializing the PreProcessor** : With processor = PreProcessor(...), configuring  preprocessor to  prepare documents.

   - **Parameters:*
     - `clean_empty_lines`: remove empty lines for optimal cleaning.   
     - `clean_whitespace`:  eliminate whitespace
     - `clean_header_footer`: treat any header and footer elements.
     - `split_by`: We split the text by word. 📊
     - `split_length`: We set the split length to 500 words. 📏
     - `split_respect_sentence_boundary`: We respect sentence boundaries when splitting. 🗣️
     - `split_overlap`: No overlap when splitting. 🚫


 **Processing the Document List:**  `preprocessed_docs = processor.process(docs)`, is applying  preprocessor to the document list.


In [None]:
# Create a list containing single document
docs = [doc]

processor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    clean_header_footer=True,
    split_by="word",
    split_length=500,
    split_respect_sentence_boundary=True,
    split_overlap=0,
    language="en",
)

preprocessed_docs = processor.process(docs)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Preprocessing: 100%|██████████| 1/1 [00:00<00:00,  1.48docs/s]


In [None]:
print(len(preprocessed_docs))

# a smaller chunked document

preprocessed_docs[10]

143


<Document: {'content': 'This guide will take you through installing and configuring kubectl . kubectl is the official command line tool for Kubernetes. Once installed and configured, you can use kubectl to deploy and manage your applications on the Cloud Platform, in addition to the Cloud Platform CLI . Installing kubectl You should install a version of kubectl that is within one minor version of the Cloud Platform’s Kubernetes cluster. The Cloud Platform’s current Kubernetes cluster version is 1.25 .\nTherefore, you should install kubectl version 1.24, 1.25, or 1.26. There is official documentation on how to install kubectl , including how to install a specific version: Install kubectl on Linux Install kubectl on macOS Install kubectl on Windows Authenticating with the Cloud Platform’s Kubernetes cluster Once you’ve installed kubectl , you will need to authenticate with the Cloud Platform’s Kubernetes cluster. You will need to: be part of the ministryofjustice GitHub organisation be p

# Creation of DocumentStore in Memory: 📚💾

Storing our documents in an easily accessible place! In the following code:

1. **Initializing InMemoryDocumentStore:** With `document_store = InMemoryDocumentStore(use_bm25=True)`, we are creating an in-memory document store. Storage power is now in our hands! 💡🔍

   - **Key Parameter:**
     - `use_bm25`:  enable BM25, a search ranking algorithm, to improve query precision.

2. **Writing Documents to the Store:** With `document_store.write_documents(preprocessed_docs)`, is writing the pre-processed documents to the store.

In [None]:
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_bm25=True)
document_store.write_documents(preprocessed_docs)

Updating BM25 representation...: 100%|██████████| 143/143 [00:00<00:00, 2382.67 docs/s]


# Configuring a BM25 Recovery Unit with Haystack: 🔍🚀


1. **Configuring BM25Retriever:** With `retriever = BM25Retriever(document_store, top_k=2)`, creates a BM25 retriever that relies on the in-memory document store. It uses ranking in our case with a top-k of 2 results! [BM25 Retriever](https://docs.haystack.deepset.ai/docs/retriever#bm25-recommended) 📈🔗

   - **Key Parameters:**
     - `document_store`: We use our in-memory document store as the basis for searching. 📚💾
     - `top_k`: We specify that we want to get the best 2 results for each query. 🔝2️⃣

That is how it can find the most relevant documents in the subsequent stages


In [None]:
from haystack import Pipeline
from haystack.nodes import BM25Retriever
retriever = BM25Retriever(document_store, top_k=2)

# Definition of a PromptTemplate for Questions and Answers:

Giving structure to our question-answer interactions with a `PromptTemplate`! In the following code:

1. **Template Definition:** With `qa_template = PromptTemplate(...)`, is  creating a template that guides the question and answer process. 📝💬

   - **Prompt Structure:**
     - "Using the information contained in the context, answer only the question asked without adding question suggestions"
     - "If the answer cannot be inferred from the context, reply: '\I don't know\'."
     - "Context: {join(documents)};"
     - "Question: {query}"

   - **Clear Description:** The template provides clear instructions on how to answer questions based on context.   💡

2. **Template Variables:**  there is use of  `{join(documents)}` and `{query}` to dynamically incorporate the necessary information into the prompt. Our questions are now driven by context! 🔄🌐

This template is the key to providing structured and contextually informed interactions

In [None]:
qa_template = PromptTemplate(prompt=
  """ Using the information contained in the context, answer only the question asked without adding question suggestions
  If the answer cannot be inferred from the context, reply: '\I don't know'.
  Context: {join(documents)};
  Question: {query}
  """)

#  PromptNode Initialization: 🌐🤖

We are exploring the world of HTTP requests and initializing our [PromptNode](https://docs.haystack.deepset.ai/docs/prompt_node)! In the following code:

1. **Initializing PromptNode:** With `prompt_node = PromptNode(...)`, we are configuring our prompt node to interact with the Zephyr model. 🌬️🚀

   - **Mixtral Model:**  specifying `model_name_or_path="HuggingFaceH4/zephyr-7b-beta"` or `mistralai/Mixtral-8x7B-Instruct-v0.1` or whatever model you want

   - **Hugging Face API Key:** Hugging Face key,  with `api_key=HF_TOKEN`

   - **Default Prompt Template:**  configure the default prompt template

   - **Additional Model Parameters:** By specifying `max_length=500` and `model_kwargs={"model_max_length": 5000}`, we provide the maximum number of tokens the output text generated by the model can have.






In [None]:


prompt_node = PromptNode(
    model_name_or_path="mistralai/Mixtral-8x7B-Instruct-v0.1",
    api_key=HF_TOKEN,
    default_prompt_template=qa_template,
    max_length=500,
    model_kwargs={"model_max_length": 5000}
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

**Configuring a Pipeline with Haystack:** 🚀🔗


A pipeline means we are creating a sequence of steps to maximize the effectiveness of our workflow

In the following code:

Creating a RAG Pipeline:

 With rag_pipeline = Pipeline(), we are starting the construction of our pipeline for Recurrent Retrieval Augmented Generation (RAG). 🔄🚀

Adding a Retriever Node:

Using rag_pipeline.add_node(...), we are embedding our BM25 retriever. The "retriever" node will receive input from the "Query".

Retriever Node:  configure the previously created BM25 retriever to be ready to provide  the most relevant documents!
Adding a Prompt Node: With rag_pipeline.add_node(...), we are inserting our prompt node. The "prompt_node" node will receive input from the "retriever" node.



In [None]:
rag_pipeline = Pipeline()
rag_pipeline.add_node(component=retriever, name="retriever", inputs=["Query"])
rag_pipeline.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

In [None]:
from pprint import pprint
print_answer = lambda out: pprint(out["results"][0].strip())
def print_answer_with_score(output):
    if output["results"]:
        top_result = output["results"][0].strip()  # Get the top result
        top_document = output["documents"][0]  # Get the top document

        score = top_document.score  # Extract the score of the top document
        pprint(f"{top_result}"+"\nscore = "+f"{score}")
    else:
        pprint("No results found.")

## Let's try the RAG Pipeline
Finally running the RAG Pipeline with a Query and Printing the Response:



 **Running the RAG Pipeline:**  `rag_pipeline.run(query="")`,  is invoking our RAG pipeline with a specific query.





In [None]:
print_answer_with_score(rag_pipeline.run(query="how to clean up  finished jobs in kubernetes?"))


('Answer: You can clean up finished jobs in kubernetes by using the '
 'Time-to-live (TTL mechanism) for Jobs by setting ttlSecondsAfterFinished, so '
 'that a Job can be cleaned up automatically some time after it finishes. '
 'Note: Kubernetes uses UTC exclusively. Make sure you take that into account '
 'when you’re creating your schedule or setting up ttlSecondsAfterFinished. We '
 'also have a delete-completed-jobs concourse job which will clean up all '
 'completed jobs which do not have ttlSecondsAfterFinished defined.\n'
 'score = 0.9242783497119054')


In [None]:
print_answer_with_score(rag_pipeline.run(query="how to remove a unneeded namespace? "))

('To remove an unneeded namespace, follow these steps:\n'
 '\n'
 '1. If there are any ECR resources in your namespace that still contain '
 'images, you need to delete the images manually or prepare the ECR for '
 'deletion by setting the `deletion_protection` argument to `false` in your '
 'ECR module code and raising a PR to apply this configuration change to your '
 'ECR resource.\n'
 '2. Merge the PR and proceed with the namespace deletion process.\n'
 '3. Raise a PR that deletes the associated folder from the '
 '`cloud-platform-environments` repository. The folder name should be in the '
 'format '
 '`namespaces/live.cloud-platform.service.justice.gov.uk/<namespace_name>`.\n'
 '4. Merging this PR will trigger the `destroy-deleted-namespaces` Pipeline, '
 "which will delete all of your non-production namespace's AWS resources, "
 'followed by the namespace itself for production namespaces.\n'
 '5. If you have set up a CI/CD pipeline that deploys into your namespace, '
 'delete it 

In [None]:
print_answer_with_score(rag_pipeline.run(query="Can non-prod namespaces get scaled down overnight"))

('Answer: Yes, non-production workloads can be auto-scaled down overnight or '
 'at the weekend using the Horizontal Pod Autoscaler (HPA) feature in '
 'Kubernetes. The HPA can ensure that critical applications are elastic and '
 'can scale out to meet increasing demand as well as scale down to ensure '
 'optimal resource usage. It calculates the number of replicas by calculating '
 'the ratio between desired metric value and current metric value. Important '
 'aspects of the HorizontalPodAutoscaler to be aware of include setting '
 'resource limits, the 15-second interval for checking the value of the metric '
 'used, and the 3-minute and 5-minute intervals for scaling up and down pods '
 'based on the metric threshold.\n'
 'score = 0.8569081092034656')


In [None]:
print_answer_with_score(rag_pipeline.run(query="what to do  add secrets to my application"))

('Answer: To add secrets to your application, you can store them in AWS '
 'Secrets Manager and then access them from your namespace. Here are the steps '
 'to do this:\n'
 '\n'
 '1. Create the secret in AWS Secrets Manager and store the required value '
 '(e.g. APPINSIGHTS\\_INSTRUMENTATIONKEY).\n'
 '2. Base64-encode the secret value.\n'
 '3. Create the kubernetes\\_secret using the base64-encoded value.\n'
 '4. List the secrets in your namespace to confirm that the new secret has '
 'been created.\n'
 '5. Update the secret value in AWS Secrets Manager if needed.\n'
 '6. Decode the secret value from base64 to use it in your application.\n'
 '\n'
 'Note: You can also configure secrets manually using kubectl if you prefer. '
 'In this case, you would need to base64-encode the secret value, create the '
 'secret using kubectl, list the secrets in your namespace, update the secret '
 'value using kubectl if needed, and decode the secret value from base64 to '
 'use it in your application.

In [None]:
print_answer_with_score(rag_pipeline.run(query="any guidelines available please on how to create multiple environments (dev, test, staging, etc.) in cloud platform?"))



('Answer: Yes, there are guidelines available on how to create multiple '
 'environments (dev, test, staging, etc.) in cloud platform. The Cloud '
 'Platform provides a Terraform module for creating multiple environments with '
 'a consistent set of infrastructure components. The module creates a virtual '
 'private cloud (VPC) with public and private subnets, network access control '
 'lists (ACLs), security groups, and other resources. The module also creates '
 'a Kubernetes cluster in each environment with a consistent set of node '
 'pools, add-ons, and other resources. The module is designed to be customized '
 'to meet the specific needs of each environment. The documentation provides '
 'detailed instructions on how to use the module to create and manage multiple '
 'environments. Additionally, the documentation provides guidelines on how to '
 'configure and use the resources in each environment, such as how to '
 'configure domain names, how to configure ingress controllers, 

In [None]:
print_answer_with_score(rag_pipeline.run(query="how to configure role-based access control (RBAC)"))

('Answer: Role-Based Access Control (RBAC) is a method of regulating access to '
 'cluster resources based on the roles of individual users within an '
 'organization. To configure RBAC in Kubernetes, you need to create roles and '
 'bindings.\n'
 '\n'
 '  A role is a set of permissions that specify what actions can be performed '
 'on which resources. To create a role, you can use the `kubectl create role` '
 'command followed by the name of the role and the permissions. For example, '
 'the following command creates a role called `pod-reader` that allows users '
 'to view pods in the `default` namespace:\n'
 '\n'
 '  ```\n'
 '  kubectl create role pod-reader --verb=get,list --resource=pods '
 '--namespace=default\n'
 '  ```\n'
 '\n'
 '  A binding is a relationship between a role and a user or group of users. '
 'To create a binding, you can use the `kubectl create rolebinding` command '
 'followed by the name of the binding, the role to be bound, and the users or '
 'groups to be bou

In [None]:
print_answer_with_score(rag_pipeline.run(query="do we create environments in separate cluster? What clusters are available?"))

('Answer: No, we do not create environments in separate clusters. The Cloud '
 'Platform consists of two kubernetes clusters: manager and live. The manager '
 'cluster runs shared services such as monitoring and CI/CD pipelines, while '
 'the live cluster is the EKS "application" cluster that runs all hosted '
 'services. Both clusters are hosted in a single VPC in Amazon AWS, which is '
 'only accessible to the outside world via HTTPS connections or the kubernetes '
 'API and SSH through a bastion host. Services hosted in the Cloud Platform '
 'run in one or more namespaces, which are isolated from each other by '
 'kubernetes NetworkPolicy configurations. Service teams can choose to allow '
 'traffic between specific namespaces, but by default no inter-namespace '
 'network traffic is permitted. Security groups cannot be used in the Cloud '
 'Platform as they work at the EC2 instance level, and requests to resources '
 'could come from any of the worker nodes in the cluster.\n'
 'sco

In [None]:

print_answer_with_score(rag_pipeline.run(query="how to reset  authentication is not working"))


('Answer: To reset authentication, you can try the following steps:\n'
 '\n'
 '  1. Delete any existing credentials or tokens associated with the '
 'Kubernetes API.\n'
 '  2. Restart your kubeconfig file or the kubectl configuration.\n'
 '  3. Re-authenticate with the Kubernetes API using the appropriate '
 'authentication method.\n'
 'score = 0.8163386010944413')
