<a href="https://colab.research.google.com/github/badlogic/genai-workshop/blob/main/09_simple_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building a simple RAG
Let's put what we've learned to practice. We want to build a chatbot that can answer questions related to a subset of the documentation of WinCC OA, namely the [Getting Started](https://www.winccoa.com/documentation/WinCCOA/latest/en_US/GettingStarted/GETTINGSTARTED_DE.html) documentation.

> **Note:** Execute the code cells below, as the remainder of this section requires those helper functions.

In [1]:
!pip -q install openai tiktoken umap-learn

In [2]:
from openai import OpenAI
import tiktoken

# Use your own OpenAI API key here.
client = OpenAI(api_key = "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")

messages = []
model_name="gpt-3.5-turbo"
max_tokens = 12000
temperature=0

# Uncomment to use a model served locally via `ollama serve`
# client = OpenAI(
#    base_url = 'http://localhost:11434/v1',
#    api_key='ollama', # required, but unused
#)
#model_name="mixtral:latest"

enc = tiktoken.get_encoding("cl100k_base")
def num_tokens(message):
    return len(enc.encode(message))

def truncate_messages(messages, max_tokens):
    total_tokens = sum(num_tokens(message["content"]) for message in messages)
    if total_tokens <= max_tokens:
        return messages

    truncated_messages = messages[:1]
    remaining_tokens = max_tokens - num_tokens(truncated_messages[0]["content"])
    for message in reversed(messages[1:]):
        tokens = num_tokens(message["content"])
        if remaining_tokens >= tokens:
            truncated_messages.insert(1, message)
            remaining_tokens -= tokens
        else:
            break
    return truncated_messages

def complete(message, max_response_tokens=2048, silent=False):
    global messages
    messages.append({"role": "user", "content": message})
    truncated_messages = truncate_messages(messages, max_tokens=max_tokens)
    stream = client.chat.completions.create(
        model=model_name,
        messages=truncated_messages,
        stream=True,
        temperature=temperature,
        max_tokens=max_response_tokens
    )
    reply = ""
    for response in stream:
        token = response.choices[0].delta.content
        if (token is None):
            break
        reply += token
        if not silent:
          print(token, end='')

    reply = {"role": "assistant", "content": reply}
    messages.append(reply)
    total_tokens = sum(num_tokens(message["content"]) for message in truncated_messages)
    if not silent:
      print(f'\nTokens: {total_tokens}')

def clear_history():
  global messages
  messages = [];

def print_history():
  global messages
  for message in messages:
    print("<" + message["role"] + ">")
    print(message["content"])
    print()

def system_prompt(message):
  global messages
  prompt = { "role": "system", "content": message }
  if (len(messages) == 0):
    messages.append(prompt)
  else:
    messages[0] = prompt

## Data
I've taken each raw HTML file from the Getting Started guide, converted it to corresponding Markdown offline, zipped the result and uploaded it to [https://marioslab.io/uploads/genai/wincc-oa-getting-started.zip](https://marioslab.io/uploads/genai/wincc-oa-getting-started.zip).

Each file starts with a line containing the URL it came from.

Let's fetch and load the data.


In [3]:
!wget https://marioslab.io/uploads/genai/wincc-oa-getting-started.zip
!unzip -o wincc-oa-getting-started.zip

--2024-02-25 22:07:49--  https://marioslab.io/uploads/genai/wincc-oa-getting-started.zip
Resolving marioslab.io (marioslab.io)... 95.216.8.184
Connecting to marioslab.io (marioslab.io)|95.216.8.184|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97486 (95K) [application/zip]
Saving to: ‘wincc-oa-getting-started.zip.3’


2024-02-25 22:07:50 (292 KB/s) - ‘wincc-oa-getting-started.zip.3’ saved [97486/97486]

Archive:  wincc-oa-getting-started.zip
  inflating: GettingStarted-01.md    
  inflating: GettingStarted-02.md    
  inflating: GettingStarted-06.md    
  inflating: GettingStarted-09.md    
  inflating: GettingStarted-12.md    
  inflating: GettingStarted-13.md    
  inflating: GettingStarted-14.md    
  inflating: GettingStarted-20.md    
  inflating: GettingStarted-21.md    
  inflating: GettingStarted-22.md    
  inflating: GettingStarted-23.md    
  inflating: GettingStarted-24.md    
  inflating: GettingStarted-25.md    
  inflating: GettingStarted-26.m

Next, let's load the contents of all .md files. Each file is transformed into an object with fields `url`, `content`, `num_tokens`

In [4]:
import glob
import os
import pandas as pd

documents = []
for md_file in glob.glob("*.md"):
    with open(md_file, 'r', encoding='utf-8') as file:
        lines = file.readlines()
        url = lines[0].strip()
        content = ''.join(lines[1:])
        documents.append({"url": url, "content": content, "num_tokens": num_tokens(content)})

df = pd.DataFrame(documents, columns=["url", "content", "num_tokens"])
print(documents[10]["content"])
df

# Modeling of Data Point Types

In the previous sections we took a look at existing data points that are available from the start of any project. However, usually we want to **design****our own** device-oriented data points in order to cope with the **specific project or industry requirements**.

We will design some data point types, since we need a couple of data points for our **example.** A \[rightMouseClick] on the empty white space in the data point tree displays the "**Create data point type**" option.

Figure 1. Create a new data point Type "GS\_PUMP2" in the Dp Type Editor of the Module PARA

![](GettingStarted-39.png)

This will open the data point editor, where the required **data structure can be easily designed**:

1. Replace the selected type name "NewDpType" with "GS\_PUMP2".

2. Open the context menu with a \[rightMouseClick] and select "**Insert node**".

3. Assign the name "state" to the node.

4. Repeat the step "**Insert node**" directly at the node "state" and to in

Unnamed: 0,url,content,num_tokens
0,https://www.winccoa.com/documentation/WinCCOA/...,# Addressing of Data Point Elements / Configs ...,899
1,https://www.winccoa.com/documentation/WinCCOA/...,# Simple Draw Operations\n\nBy selecting a **g...,1027
2,https://www.winccoa.com/documentation/WinCCOA/...,# Control Manager (Runtime Scripts)\n\nIn addi...,1205
3,https://www.winccoa.com/documentation/WinCCOA/...,# Master Data Points\n\nIt is recommended to u...,1826
4,https://www.winccoa.com/documentation/WinCCOA/...,# Dynamic Colors - Blinking\n\nWinCC OA allows...,286
5,https://www.winccoa.com/documentation/WinCCOA/...,# Emergency Mode\n\nWinCC OA automatically mon...,136
6,https://www.winccoa.com/documentation/WinCCOA/...,"# System, Distribution, Configuration\n\nA str...",512
7,https://www.winccoa.com/documentation/WinCCOA/...,# Import From ASCII File (Mass Engineering)\n\...,783
8,https://www.winccoa.com/documentation/WinCCOA/...,"# Data point Concept, Process Image\n\nThe **v...",995
9,https://www.winccoa.com/documentation/WinCCOA/...,# Add the Simulation\n\nIn order to configure ...,717


Next we split all documents which are larger than 1024 tokens into smaller chunks of maximally 1024 tokens. Looking at the data frame table above, there are only a handful of documents bigger than 1024 tokens.

Let's write a function that does the splitting into chunks for us.



In [5]:
def chunk(documents, max_tokens):
    chunked_documents = []

    for doc in documents:
        content = doc['content']
        if num_tokens(content) <= max_tokens:
            chunked_documents.append(doc)
        else:
            content_parts = []
            words = content.split()
            current_part = []
            for word in words:
                current_part.append(word)
                if num_tokens(' '.join(current_part)) > max_tokens:
                    current_part.pop()
                    content_parts.append(' '.join(current_part))
                    current_part = [word]
            if current_part:
                content_parts.append(' '.join(current_part))

            for part in content_parts:
                chunked_documents.append({'url': doc['url'], 'content': part, 'num_tokens': num_tokens(part)})

    return chunked_documents

In [6]:
chunked_documents = chunk(documents, 1024)

In [7]:
pd.DataFrame(chunked_documents, columns=["url", "content", "num_tokens"])

Unnamed: 0,url,content,num_tokens
0,https://www.winccoa.com/documentation/WinCCOA/...,# Addressing of Data Point Elements / Configs ...,899
1,https://www.winccoa.com/documentation/WinCCOA/...,# Simple Draw Operations By selecting a **grap...,1013
2,https://www.winccoa.com/documentation/WinCCOA/...,Basics](../Native_GEDI/Referenz_Native_GEDI.ht...,20
3,https://www.winccoa.com/documentation/WinCCOA/...,# Control Manager (Runtime Scripts) In additio...,1024
4,https://www.winccoa.com/documentation/WinCCOA/...,scripts for the simulation run in a single CON...,175
...,...,...,...
63,https://www.winccoa.com/documentation/WinCCOA/...,# Panel Topology The panel topology in WinCC O...,1024
64,https://www.winccoa.com/documentation/WinCCOA/...,"on ""**OK**"". 15. A dialog box pop-ups: ""Apply ...",1018
65,https://www.winccoa.com/documentation/WinCCOA/...,![](pt_configur_directAccess_collage_de.png) D...,371
66,https://www.winccoa.com/documentation/WinCCOA/...,# System Requirements\n\nYou can find the syst...,212


## Index & retrieval
We have a total of 68 chunks, which is ridiculously small. We do not really need a full-blown vector store like [Chroma](https://www.trychroma.com/) or [Pinecone](https://www.pinecone.io/).

Instead, we'll build a simple in-memory vector store. To embed our chunks, We'll use [OpenAI's embedding API](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings), which is cheap, fast, and produces great embeddings.

Let's write a function that takes our list of `chunked_documents`, and embeds each chunk's text using OpenAI's `text-embedding-3-small` embedder. Each returned vector will have a lenght of 1536 dimensions. We batch submission to the OpenAI API.

In [8]:
def embed_chunks(chunks):
    batch = []
    batch_tokens = 0
    for chunk in chunks:
        if batch_tokens + chunk['num_tokens'] <= 7000:
            batch.append(chunk)
            batch_tokens += chunk['num_tokens']
        else:
            if batch:
                response = client.embeddings.create(
                    input=[chunk['content'] for chunk in batch],
                    model="text-embedding-3-small"
                )
                for chunk, data in zip(batch, response.data):
                    chunk['vector'] = data.embedding

                batch = [chunk]
                batch_tokens = chunk['num_tokens']

    if batch:
        response = client.embeddings.create(
            input=[chunk['content'] for chunk in batch],
            model="text-embedding-3-small"
        )
        for chunk, data in zip(batch, response.data):
            chunk['vector'] = data.embedding

In [9]:
embed_chunks(chunked_documents)

In [10]:
pd.DataFrame(chunked_documents, columns=["url", "content", "vector", "num_tokens"])

Unnamed: 0,url,content,vector,num_tokens
0,https://www.winccoa.com/documentation/WinCCOA/...,# Addressing of Data Point Elements / Configs ...,"[0.004041814710944891, 0.03711238130927086, 0....",899
1,https://www.winccoa.com/documentation/WinCCOA/...,# Simple Draw Operations By selecting a **grap...,"[-0.042427342385053635, 0.05359514430165291, 0...",1013
2,https://www.winccoa.com/documentation/WinCCOA/...,Basics](../Native_GEDI/Referenz_Native_GEDI.ht...,"[-0.05739807337522507, 0.043246570974588394, 0...",20
3,https://www.winccoa.com/documentation/WinCCOA/...,# Control Manager (Runtime Scripts) In additio...,"[-0.017730658873915672, 0.07946362346410751, 0...",1024
4,https://www.winccoa.com/documentation/WinCCOA/...,scripts for the simulation run in a single CON...,"[0.016824405640363693, 0.041737232357263565, 0...",175
...,...,...,...,...
63,https://www.winccoa.com/documentation/WinCCOA/...,# Panel Topology The panel topology in WinCC O...,"[-0.052004531025886536, 0.02570650540292263, 0...",1024
64,https://www.winccoa.com/documentation/WinCCOA/...,"on ""**OK**"". 15. A dialog box pop-ups: ""Apply ...","[-0.043920110911130905, 0.026816731318831444, ...",1018
65,https://www.winccoa.com/documentation/WinCCOA/...,![](pt_configur_directAccess_collage_de.png) D...,"[-0.015205268748104572, 0.013240096159279346, ...",371
66,https://www.winccoa.com/documentation/WinCCOA/...,# System Requirements\n\nYou can find the syst...,"[-0.057609036564826965, 0.039825376123189926, ...",212


To complete our in-memory vector store, we need a way to find the top-k
most similar chunks given a user query. We do so by:
1. Embedding the user query query using the same embedding model.
2. Iterate through all chunks and calculate the cosine similarity
3. Return the top-k most similar chunks.

For the cosine similarity, we reuse the `cosine_similarity` function from an earlier section on embeddings.

Let's wrap this up in a function:

In [11]:
from numpy.linalg import norm
import numpy as np

cosine_similarity = lambda a,b: (a @ b.T) / (norm(a)*norm(b))

In [12]:
def query_chunks(query, chunks, top_k):
  response = client.embeddings.create(
    input=query,
    model="text-embedding-3-small"
  )
  query_vector = response.data[0].embedding

  cosine_similarity = lambda a, b: np.dot(a, b) / (norm(a) * norm(b))
  similarities = []
  for chunk in chunks:
    if "vector" not in chunk:
      continue
    similarity = cosine_similarity(np.array(query_vector), np.array(chunk["vector"]))
    similarities.append((chunk, similarity))
  sorted_chunks = sorted(similarities, key=lambda x: x[1], reverse=True)[:top_k]

  return [chunk[0] for chunk in sorted_chunks]

Let's try it out

In [13]:
top_k = query_chunks("What is WinCC OA?", chunked_documents, 5)
pd.DataFrame(top_k, columns=["url", "content", "vector", "num_tokens"])

Unnamed: 0,url,content,vector,num_tokens
0,https://www.winccoa.com/documentation/WinCCOA/...,"# What is WinCC OA?\n\n""WinCC OA"" is the abbre...","[-0.03152685612440109, 0.012403362430632114, 0...",391
1,https://www.winccoa.com/documentation/WinCCOA/...,# System Requirements\n\nYou can find the syst...,"[-0.057609036564826965, 0.039825376123189926, ...",212
2,https://www.winccoa.com/documentation/WinCCOA/...,# WinCC OA Getting Started - Tutorial\n\nThis ...,"[-0.05473048612475395, 0.022419877350330353, 0...",880
3,https://www.winccoa.com/documentation/WinCCOA/...,# General Information\n\nWinCC OA possesses a ...,"[-0.013674133457243443, 0.041188377887010574, ...",290
4,https://www.winccoa.com/documentation/WinCCOA/...,# Architecture\n\nWinCC OA is a **modularly bu...,"[-0.03362273424863815, 0.024104533717036247, 0...",1024


Looks like this returns pretty good results. We also want query expansion to resolve references in a user query.

## Putting it all together

We can now build our RAG system on-top of these primitives. We create an API that works similarly to the API for normal completion above.

Our RAG system needs to keep track of the converstion history, which we store in `rag_history`. We also provide a function to clear the history called `rag_clear_history`, analogous to our `clear_history` function.



In [14]:
rag_history = []

def rag_clear_history():
  global rag_history
  rag_history = []


Next, we need a few help functions to format the conversation history and chunks so we can inject them into our prompt.

In [15]:
def format_messages(messages):
  formatted_str = ""
  for message in messages:
      formatted_str += f"<{message['role']}>:\n{message['content']}\n\n"
  return formatted_str

def format_chunks(chunks):
    formatted_str = "[\n"
    chunk_strings = []
    for chunk in chunks:
        content = chunk['content'].replace("\n", "\\n")
        formatted_chunk = f"   {{ url: \"{chunk['url']}\", content: \"{content}\"}}"
        chunk_strings.append(formatted_chunk)
    formatted_str += ",\n".join(chunk_strings)
    formatted_str += "\n]"
    return formatted_str

Since this is a turn-by-turn system, we also need query expansion. We reuse what we had in the previous section and wrap it in a function. Note how we use our `clear_history` and `complete` primitive functions from above.

In [16]:
def expand_query(query, history):
  clear_history()
  complete(f"""You are given a conversation and new message, both delimited by triple backticks.
  Expand the new message by resolving and references to persons, entities or locations, in the
  conversation with their full name.

  Cconversation:
  ```
  {format_messages(history)}
  ```

  New Message:
  ```
  {query}
  ```
  """, 2024, True)
  expanded = messages[-1]["content"]
  clear_history()
  return expanded

Finally, we create a function called `rag_complete`, which works analogously to `complete`, except it will augment the prompt with relevant information for the LLM to use for answering the user query.

The function first truncates the RAG history to 4000 tokens. Next we expand the query using the truncated history to not explode the LLMs token window.

We then query the top-k relevant chunks based on the expanded query, which will result in an additional 5120 tokens to be injected into the prompt.

When then construct the final prompt based on all the pieces, with instructions for the LLM on how to use the information to answer the query.

We send of the completion request and record the response in the RAG history.

In [17]:

def rag_complete(query, debug=False):
  global rag_history
  truncated_history = truncate_messages(rag_history, 4000) # 4000 tokens
  expanded_query = expand_query(query, truncated_history)
  if debug:
    print("Expanded query: " + expanded_query)
  top_k = query_chunks(expanded_query, chunked_documents, 5) # 5 * 1024 tokens = 5120 tokens
  prompt = f"""
You are provided with a conversation history, a set of relevant information, and a user query.

The conversation history:
```
{format_messages(truncated_history)}
```

The relevant information:
```
{format_chunks(top_k)}
```

The query:
```
{query}
```

- Answer the query in your own words based on the relevant information and conversation history.
- Output your answer in Markdown
- Include links to the relevant information you used to answer the query inline.
- Use a descriptive label for each link, based on the contents of the relevant information you cite
  """
  if debug:
    print(prompt)
  clear_history()
  complete(prompt, 2048, False)
  rag_history.append({"role": "user", "content": query})
  rag_history.append(messages[-1])
  if debug:
    print(rag_history)

Let's give it a try!

In [18]:
rag_clear_history()
rag_complete("Can you tell me what WinCC OA is?")

WinCC OA, which stands for SIMATIC WinCC Open Architecture, is a software package designed for automation technology, specifically for the operation and control of technical plants using VDU workstations with full graphical capability. It serves as a supervisory software for control centers or machine operation, utilizing PC-based servers and workstations as the hardware platform. WinCC OA interacts with control systems of basic automation, such as PLCs, DDC, and RTUs, forming a complete automation system. It allows for the presentation of current process states, transfer of conditions and commands, alerting in case of critical conditions, historical archiving of data, and more. Additionally, WinCC OA is modularly built, with specific units called managers handling different tasks within the system. These managers include process interface modules (drivers), event manager, data manager, control language (CTRL), user interface manager, and more. The software also features a powerful scr

In [19]:
rag_complete("What does it cost?", True)

Expanded query: What does SIMATIC WinCC Open Architecture (WinCC OA) cost?

You are provided with a conversation history, a set of relevant information, and a user query.

The conversation history:
```
<user>:
Can you tell me what WinCC OA is?

<assistant>:
WinCC OA, which stands for SIMATIC WinCC Open Architecture, is a software package designed for automation technology, specifically for the operation and control of technical plants using VDU workstations with full graphical capability. It serves as a supervisory software for control centers or machine operation, utilizing PC-based servers and workstations as the hardware platform. WinCC OA interacts with control systems of basic automation, such as PLCs, DDC, and RTUs, forming a complete automation system. It allows for the presentation of current process states, transfer of conditions and commands, alerting in case of critical conditions, historical archiving of data, and more. Additionally, WinCC OA is modularly built, with specif

In [20]:
rag_complete("Describe what the process interface is")

The process interface in WinCC OA, also known as drivers, serves as the communication link between the software system and the control and field level devices such as PLCs and remote control nodes. These drivers are specialized programs that handle the conversion of various communication protocols into the internal communication format of WinCC OA. By utilizing different drivers based on the communication protocol used by the PLC or device, WinCC OA can read current states, measured values, and execute commands within the automation system. The process interface modules ensure seamless data exchange between the software and the control equipment, enabling efficient monitoring and control of the technical plants.

For more detailed information on the process interface in WinCC OA, you can refer to the [Drivers](https://www.winccoa.com/documentation/WinCCOA/latest/en_US/GettingStarted/GettingStarted-79.html) section of the WinCC OA documentation.
Tokens: 4488


In [33]:
rag_complete("What programming languages can I use?")

### Programming Languages in WinCC OA

In SIMATIC WinCC Open Architecture (WinCC OA), you can utilize a powerful script language called **Control** for implementing various logics, design symbols, configuration dialogs, reports, and calculation rules. The Control language is a **procedural high-level language** with all typical control structures, and its syntax closely resembles the **ANSI-C standard**. It offers functions tailored to the needs of automation technology, providing access to various elements such as current process images, historical data, configuration details, graphic objects, operating system functions, external databases, and more.

The Control language in WinCC OA is processed interpretatively, eliminating the need for compilation or linking. It supports **event-driven** execution but can also be implemented in a time-controlled or cyclical manner. Additionally, WinCC OA offers a **common programming interface API** (Application Programming Interface) for developer

In [34]:
rag_complete("How do i create a new project?")

To create a new project in WinCC OA, you need to follow the steps outlined in the [Configuration of a Project](https://www.winccoa.com/documentation/WinCCOA/latest/en_US/GettingStarted/GettingStarted-23.html) section of the WinCC OA documentation.

1. **Console**: In the console, you can select the managers that belong to your project by adding a new manager and specifying the start properties. This can be done through the console interface where you can manage and configure different components of your project.

2. **System Management**: The system management serves as an administration center for various settings related to your project. You can access it via the database editor, graphics editor, or runtime user interface module. It allows you to configure project settings and utilize diverse tools for project administration.

3. **Configuration Files**: WinCC OA uses configuration files to make specific project settings, especially in conjunction with connections (drivers). The most

A pretty decent system for the amount of code. We can compare its outputs with the outputs of the current system.

<center><img src="https://marioslab.io/uploads/genai/brag.png" width=480></center>
<center><img src="https://marioslab.io/uploads/genai/brag-2.png" width=480></center>
<center><img src="https://marioslab.io/uploads/genai/brag-3.png" width=480></center>
<center><img src="https://marioslab.io/uploads/genai/brag-5.png" width=480></center>
<center><img src="https://marioslab.io/uploads/genai/brag-4.png" width=480></center>