# GPT3.5-Turbo with Retrieval Augmentation over Cilium v1.13.x Docs

In this notebook I'll work throught an example of using GPT3.5-Turbo with retrieval augmentation to answer questions about the Cilium v1.13.x Docs.

Install the required dependencies

In [32]:
!pip install -qU bs4 tiktoken openai langchain "pinecone-client[grpc]" python-dotenv

I'm using a dotenv file to store environment variables like the OpenAI API key. You can also just set them directly in the notebook. 

In [33]:
%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


https://docs.cilium.io/en/v1.13 is the starting point for the docs. From here we will look for all links to other pages and scrape them.


In [34]:
import requests

res = requests.get("https://docs.cilium.io/en/v1.13")

A quick check to see whether we can extract the text from and links from the docs.

In [35]:
from bs4 import BeautifulSoup
import urllib.parse
import html
import re

domain = "https://docs.cilium.io/en/v1.13/"
domain_full = domain

soup = BeautifulSoup(res.text, 'html.parser')

# Find all links to local pages on the website
local_links = []
for link in soup.find_all('a', href=True, class_=["reference", "internal"]):
    href = link['href']

    if not href.startswith("https://") and not href.startswith("http://") and not href.startswith("dns:query;ignoreAAAA") and not href.startswith("mailto:"):
      local_links.append(urllib.parse.urljoin(domain_full, href))

# Find the main content using CSS selectors
main_content = soup.select('body div section div div')[0]

# Extract the HTML code of the main content
main_content_html = str(main_content)

# Extract the plaintext of the main content
main_content_text = main_content.get_text()

# Remove all HTML tags
main_content_text = re.sub(r'<[^>]+>', '', main_content_text)

# Remove extra white space
main_content_text = ' '.join(main_content_text.split())

# Replace HTML entities with their corresponding characters
main_content_text = html.unescape(main_content_text)

print(main_content_text)
print(local_links)

» Welcome to Cilium’s documentation! Welcome to Cilium’s documentation! The documentation is divided into the following sections: Cilium Quick Installation: Provides a simple tutorial for running a small Cilium setup on your laptop. Intended as an easy way to get your hands dirty applying Cilium security policies between containers. Getting Started : Details instructions for installing, configuring, and troubleshooting Cilium in different deployment modes. Network Policy : Detailed walkthrough of the policy language structure and the supported formats. Monitoring & Metrics : Instructions for configuring metrics collection from Cilium. Troubleshooting : Describes how to troubleshoot Cilium in different deployment modes. BPF and XDP Reference Guide : Provides a technical deep dive of eBPF and XDP technology, primarily focused at developers. API Reference : Details the Cilium agent API for interacting with a local Cilium instance. Development : Gives background to those looking to develo

A function that returns the link, the text and any sublinks from a given page.

In [36]:
def scrape(url: str):
    res = requests.get(url)
    if res.status_code != 200:
        print(f"{res.status_code} for '{url}'")
        return None
    soup = BeautifulSoup(res.text, 'html.parser')

    # Find all links to local pages on the website
    local_links = []
    for link in soup.find_all('a', href=True, class_=["reference", "internal"]):
      href = link['href']

      if not href.startswith("https://") and not href.startswith("http://") and not href.startswith("dns:query;ignoreAAAA") and not href.startswith("mailto:"):
        local_links.append(urllib.parse.urljoin(domain_full, href))

    # Find the main content using CSS selectors
    main_content = soup.select('body div section div div')[0]

    # Extract the HTML code of the main content
    main_content_html = str(main_content)

    # Extract the plaintext of the main content
    main_content_text = main_content.get_text()

    # Remove all HTML tags
    main_content_text = re.sub(r'<[^>]+>', '', main_content_text)

    # Remove extra white space
    main_content_text = ' '.join(main_content_text.split())

    # Replace HTML entities with their corresponding characters
    main_content_text = html.unescape(main_content_text)
    
    # return as json
    return {
        "url": url,
        "text": main_content_text
    }, local_links

Now loop through all links and execute the scrape function against them.

In [None]:
links = ["https://docs.cilium.io/en/v1.13/"]
scraped = set()
data = []

while True:
    if len(links) == 0:
        print("Complete")
        break
    url = links[0]
    print(url)
    res = scrape(url)
    scraped.add(url)
    if res is not None:
        page_content, local_links = res
        data.append(page_content)
        # add new links to links list
        links.extend(local_links)
        # remove duplicates
        links = list(set(links))
    # remove links already scraped
    links = [link for link in links if link not in scraped]

In [38]:
data[5]

{'url': 'https://docs.cilium.io/en/v1.13/api/#compatibility-guarantees',

Create a tokenizer that will be used to tokenize the text.

In [39]:
import tiktoken

tokenizer = tiktoken.get_encoding('p50k_base')

# create the length function
def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

Create a text spliter that splits the text into chunks of 500 

In [40]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=20,
    length_function=tiktoken_len,
    separators=["\n\n", "\n", " ", ""]
)

print(text_splitter)

<langchain.text_splitter.RecursiveCharacterTextSplitter object at 0x169c11b10>


Create chunks that inlcude a UUID, the text, a chunk number and the URL.

In [41]:
from uuid import uuid4
from tqdm.auto import tqdm

chunks = []

for idx, record in enumerate(tqdm(data)):
    texts = text_splitter.split_text(record['text'])
    chunks.extend([{
        'id': str(uuid4()),
        'text': texts[i],
        'chunk': i,
        'url': record['url']
    } for i in range(len(texts))])

print(chunks[0])

100%|██████████| 1167/1167 [00:07<00:00, 153.47it/s]

{'id': '333f264c-f939-417b-bde8-78bea7d8a123', 'text': '» Welcome to Cilium’s documentation! Welcome to Cilium’s documentation!\uf0c1 The documentation is divided into the following sections: Cilium Quick Installation: Provides a simple tutorial for running a small Cilium setup on your laptop. Intended as an easy way to get your hands dirty applying Cilium security policies between containers. Getting Started : Details instructions for installing, configuring, and troubleshooting Cilium in different deployment modes. Network Policy : Detailed walkthrough of the policy language structure and the supported formats. Monitoring & Metrics : Instructions for configuring metrics collection from Cilium. Troubleshooting : Describes how to troubleshoot Cilium in different deployment modes. BPF and XDP Reference Guide : Provides a technical deep dive of eBPF and XDP technology, primarily focused at developers. API Reference : Details the Cilium agent API for interacting with a local Cilium instan




Our chunks are ready so now we move onto embedding and indexing everything.

## Initialize Embedding Model

We use `text-embedding-ada-002` as the embedding model. We can embed text like so:

In [42]:
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

openai.Engine.list()  # check we have authenticated

<OpenAIObject list at 0x1696dc9b0> JSON: {
  "data": [
    {
      "created": null,
      "id": "babbage",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "davinci",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-davinci-edit-001",
      "object": "engine",
      "owner": "openai",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "babbage-code-search-code",
      "object": "engine",
      "owner": "openai-dev",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "text-similarity-babbage-001",
      "object": "engine",
      "owner": "openai-dev",
      "permissions": null,
      "ready": true
    },
    {
      "created": null,
      "id": "code-davinci-edit-001",
      "object": "engine",
      "ow

In [43]:
embed_model = "text-embedding-ada-002"

res = openai.Embedding.create(
    input=[
        "Sample document text goes here",
        "there will be several phrases in each batch"
    ], engine=embed_model
)

In the response `res` we will find a JSON-like object containing our new embeddings within the `'data'` field.

In [44]:
res.keys()

dict_keys(['object', 'data', 'model', 'usage'])

Inside `'data'` we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains `1536` dimensions (the output dimensionality of the `text-embedding-ada-002` model.

In [45]:
len(res['data'])

2

In [46]:
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])

(1536, 1536)

We will apply this same embedding logic to the langchain docs dataset we've just scraped. But before doing so we must create a place to store the embeddings.

## Initializing the Index

Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index.

In [47]:
import pinecone

api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")

pinecone.init(api_key=api_key, environment=env)
pinecone.whoami()

WhoAmIResponse(username='1946a7a', user_label='default', projectname='26d3f25')

In [48]:
index_name = 'cilium-docs-langchain'

In [49]:
# check if index already exists (it shouldn't if this is first time)
if index_name not in pinecone.list_indexes():
    # if does not exist, create index
    pinecone.create_index(
        index_name,
        dimension=len(res['data'][0]['embedding']),
        metric='dotproduct'
    )
# connect to index
index = pinecone.GRPCIndex(index_name)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.1,
 'namespaces': {'': {'vector_count': 9136}},
 'total_vector_count': 9136}

We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:

In [50]:
from tqdm.auto import tqdm
from time import sleep

batch_size = 100  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(chunks), batch_size)):
    # find end of batch
    i_end = min(len(chunks), i+batch_size)
    meta_batch = chunks[i:i_end]
    # get ids
    ids_batch = [x['id'] for x in meta_batch]
    # get texts to encode
    texts = [x['text'] for x in meta_batch]
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(input=texts, engine=embed_model)
    except:
        done = False
        while not done:
            sleep(5)
            try:
                res = openai.Embedding.create(input=texts, engine=embed_model)
                done = True
            except:
                pass
    embeds = [record['embedding'] for record in res['data']]
    # cleanup metadata
    meta_batch = [{
        'text': x['text'],
        'chunk': x['chunk'],
        'url': x['url']
    } for x in meta_batch]
    to_upsert = list(zip(ids_batch, embeds, meta_batch))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

100%|██████████| 91/91 [04:32<00:00,  2.99s/it]


Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4.

## Retrieval

To search through our documents we first need to create a query vector `xq`. Using `xq` we will retrieve the most relevant chunks from the LangChain docs, like so:

In [65]:
query = "How can I configure LB IPAM. Show me the YAML config"

res = openai.Embedding.create(
    input=[query],
    engine=embed_model
)

# retrieve from Pinecone
xq = res['data'][0]['embedding']

# get relevant contexts (including the questions)
res = index.query(xq, top_k=5, include_metadata=True)

In [66]:
res

{'matches': [{'id': '81bd18a7-6afb-45c2-94cc-1bc088202bca',
              'metadata': {'chunk': 0.0,
                           'text': '» BGP » LoadBalancer IP Address Management '
                                   '(LB IPAM) LoadBalancer IP Address '
                                   'Management (LB IPAM)\uf0c1 LB IPAM is a '
                                   'feature that allows Cilium to assign IP '
                                   'addresses to Services of type '
                                   'LoadBalancer. This functionality is '
                                   'usually left up to a cloud provider, '
                                   'however, when deploying in a private cloud '
                                   'environment, these facilities are not '
                                   'always available. LB IPAM works in '
                                   'conjunction with features like the Cilium '
                                   'BGP Control Plane. Where LB

With retrieval complete, we move on to feeding these into GPT-4 to produce answers.

## Retrieval Augmented Generation

GPT-4 is currently accessed via the `ChatCompletions` endpoint of OpenAI. To add the information we retrieved into the model, we need to pass it into our user prompts *alongside* our original query. We can do that like so:

In [67]:
# get list of retrieved text
contexts = [item['metadata']['text'] for item in res['matches']]

augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query

In [68]:
print(augmented_query)

» BGP » LoadBalancer IP Address Management (LB IPAM) LoadBalancer IP Address Management (LB IPAM) LB IPAM is a feature that allows Cilium to assign IP addresses to Services of type LoadBalancer. This functionality is usually left up to a cloud provider, however, when deploying in a private cloud environment, these facilities are not always available. LB IPAM works in conjunction with features like the Cilium BGP Control Plane. Where LB IPAM is responsible for allocation and assigning of IPs to Service objects and other features are responsible for load balancing and/or advertisement of these IPs. LB IPAM is always enabled but dormant. The controller is awoken when the first IP Pool is added to the cluster. Pools LB IPAM has the notion of IP Pools which the administrator can create to tell Cilium which IP ranges can be used to allocate IPs from. A basic IP Pools with both an IPv4 and IPv6 range looks like this: apiVersion: "cilium.io/v2alpha1" kind: CiliumLoadBalancerIPPool metadata: 

Now we ask the question:

In [69]:
# system message to 'prime' the model
primer = f"""You are Q&A bot. A highly intelligent system that answers
user questions based on the information provided by the user above
each question. If the information can not be found in the information
provided by the user you truthfully say "I don't know".
"""

res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": augmented_query}
    ]
)

To display this response nicely, we will display it in markdown.

In [70]:
from IPython.display import Markdown

display(Markdown(res['choices'][0]['message']['content']))

Here is an example of how to configure LB IPAM using YAML config:

```
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: "blue-pool"
spec:
  cidrs:
    - cidr: "10.0.10.0/24"
    - cidr: "2004::0/64"
```

This example creates an IP pool named "blue-pool" with an IPv4 CIDR block of "10.0.10.0/24" and an IPv6 CIDR block of "2004::0/64".

Let's compare this to a non-augmented query...

In [71]:
res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": primer},
        {"role": "user", "content": query}
    ]
)
display(Markdown(res['choices'][0]['message']['content']))

I'm sorry, but I don't have enough information to answer your question. Can you provide me with more context and details about what you mean by "LB IPAM" and what type of system or platform you are referring to? Additionally, without knowing your specific requirements and environment, it's not possible to provide a YAML configuration that would be suitable for your needs.

If we drop the `"I don't know"` part of the `primer`?

In [72]:
res = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are Q&A bot. A highly intelligent system that answers user questions"},
        {"role": "user", "content": query}
    ]
)
display(Markdown(res['choices'][0]['message']['content']))

To configure LoadBalancer IPAM (LB IPAM) in Kubernetes, you can use the following YAML config as an example:

```
apiVersion: crd.projectcalico.org/v1alpha1
kind: IPPool
metadata:
  name: lb-ippool
spec:
  blockSize: 26
  cidr: {{LB_CIDR}}
  ipipMode: Always
  natOutgoing: true
  nodeSelector: has(lb-node)=true
  vxlanMode: Never

---

apiVersion: projectcalico.org/v3
kind: IPPoolClaim
metadata:
  name: lb-ippool-claim
spec:
  cidr: {{LB_CIDR}}
  ipam: ipam-controller
```

In this config, you define an IP pool for LB IPAM using the `IPPool` resource, which sets parameters like CIDR range and node selection. Then, you create a pool claim using the `IPPoolClaim` resource, which associates the IP pool with an IPAM controller. 

You will need to replace `{{LB_CIDR}}` with your desired CIDR range for the load balancer IP addresses.