# Praktikum Wirtschaftsinformatik: Exercise Sheet 6

Group Nr.: 3

Student Names: Felix Findeisen; Lucas Janssen

# 6.1 Pipelines

## Setup and Imports

In [25]:
import os
from haystack import Pipeline, component
from haystack import component
from haystack.components.builders import PromptBuilder
from haystack_integrations.components.generators.ollama import OllamaGenerator
from typing import List, Set, Dict, Tuple
import re

# Configuration for Ollama (Example from Sheet 3 and 5)
model_name = "llama3.2:3b" # or "llama3.1:8b" depending on your setup
ollama_url = "http://localhost:11434"

## a) Generator Pipeline

Task: Create a Pipeline that receives a .bpmn file and generates a descriptive text. The Prompt should contain a BPMN-text pair as a "Few-Shot" example.

In [2]:
# 1. Define Template (based on Sheet 3, PromptBuilder)
# We add a fixed example (Few-Shot) to the prompt to improve quality.
generator_template = """
You are a strict BPMN-to-Text converter. Your sole task is to describe the process flow exactly as defined in the XML.

### CRITICAL RULES:
1. **Exact Naming (Most Important):** Whenever you mention a BPMN element (Task, Event, Gateway), you MUST copy the exact string from the `name="..."` attribute and enclose it in double quotes. 
   - Example: name="Check Stock" -> The task "Check Stock" is performed.
2. **No Interpretation:** Do not infer logic. Only describe elements that actally exist as tags (e.g., `<bpmn:task>`). If a task implies a check but no gateway follows, do NOT invent a gateway.
3. **Trace the Flow:** Follow the `<bpmn:sequenceFlow>` elements strictly.
4. **Output:** Only output the text description, no conversational filler.

### Example 1 (Simple Sequence)
BPMN XML:
<bpmn:startEvent id="Start_1" name="Order received">
  <bpmn:outgoing>Flow_1</bpmn:outgoing>
</bpmn:startEvent>
<bpmn:sequenceFlow id="Flow_1" sourceRef="Start_1" targetRef="Task_1" />
<bpmn:task id="Task_1" name="Check stock">
  <bpmn:incoming>Flow_1</bpmn:incoming>
  <bpmn:outgoing>Flow_2</bpmn:outgoing>
</bpmn:task>
<bpmn:endEvent id="End_1" name="Order fulfilled">
  <bpmn:incoming>Flow_2</bpmn:incoming>
</bpmn:endEvent>

Description:
The process begins with the start event "Order received". Afterwards, the task "Check stock" is executed. Finally, the process ends with the event "Order fulfilled".

### Example 2 (Exclusive Decision)
BPMN XML:
<bpmn:exclusiveGateway id="Gateway_1" name="Is item available?">
  <bpmn:outgoing>Flow_Yes</bpmn:outgoing>
  <bpmn:outgoing>Flow_No</bpmn:outgoing>
</bpmn:exclusiveGateway>
<bpmn:sequenceFlow id="Flow_Yes" name="Yes" sourceRef="Gateway_1" targetRef="Task_Ship" />
<bpmn:sequenceFlow id="Flow_No" name="No" sourceRef="Gateway_1" targetRef="Task_Reject" />
<bpmn:task id="Task_Ship" name="Ship Item" />
<bpmn:task id="Task_Reject" name="Reject Order" />

Description:
The exclusive gateway checks "Is item available?". If the condition is "Yes", the process proceeds to the task "Ship Item". If the condition is "No", the process proceeds to the task "Reject Order".

### Your Task
BPMN XML:
{{ bpmn_content }}

Description:
"""

# 2. Initialize Components
prompt_builder_gen = PromptBuilder(template=generator_template)
llm_gen = OllamaGenerator(model=model_name, url=ollama_url, generation_kwargs={"temperature": 0.0}) # to be as strict as possible

# 3. Create Pipeline
generator_pipeline = Pipeline()
generator_pipeline.add_component("prompt_builder", prompt_builder_gen)
generator_pipeline.add_component("llm", llm_gen)

# Connect Components
generator_pipeline.connect("prompt_builder", "llm")

# Test Call (Example)
bpmn_str = """<bpmn:process id="Process_Airport">
    <bpmn:startEvent id="StartEvent_1" name="Passenger arrives">
      <bpmn:outgoing>Flow_1</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:task id="Task_1" name="Check-in at Counter">
      <bpmn:incoming>Flow_1</bpmn:incoming>
      <bpmn:outgoing>Flow_2</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_1" sourceRef="StartEvent_1" targetRef="Task_1" />
    <bpmn:task id="Task_2" name="Pass Security Check">
      <bpmn:incoming>Flow_2</bpmn:incoming>
      <bpmn:outgoing>Flow_3</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_2" sourceRef="Task_1" targetRef="Task_2" />
    <bpmn:endEvent id="EndEvent_1" name="Passenger ready">
      <bpmn:incoming>Flow_3</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="Flow_3" sourceRef="Task_2" targetRef="EndEvent_1" />
  </bpmn:process>"""
result = generator_pipeline.run({"prompt_builder": {"bpmn_content": bpmn_str}})
generated_text = result["llm"]["replies"][0]

print(generated_text)

PromptBuilder has 1 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.


The process begins with the start event "Passenger arrives". Afterwards, the task "Check-in at Counter" is executed. The output of this task is then used to proceed to the task "Pass Security Check". Finally, the process ends with the event "Passenger ready".


## b) Extractor Pipeline

Building a pipeline that analyzes the descriptive text generated in 1a) and extracts the names of the BPMN elements mentioned.

In [3]:
# --- Custom Component: List[str] Input ---
@component
class TextToListParser:
    """
    Parses a comma-separated string from the LLM into a Python List.
    Adjusts Input-Typ to 'List[str]', to be compatible with OllamaGenerator.
    """
    @component.output_types(elements=List[str])
    def run(self, replies: List[str]):
        # List empty?
        if not replies:
            return {"elements": []}
        
        # Wir nehmen einfach die erste Antwort aus der Liste
        llm_text = replies[0]
        
        # 1. Clean the string
        clean_text = llm_text.replace("[", "").replace("]", "").replace('"', '').replace("'", "")
        
        # 2. Split by comma and strip whitespace
        if not clean_text.strip():
            return {"elements": []}
            
        # Split and clean
        elements = [item.strip() for item in clean_text.split(",") if item.strip()]
        return {"elements": elements}

# --- Pipeline Setup ---

extractor_template = """
You are a precise data extractor. 
Your goal is to extract the **exact labels** of the BPMN elements from the text.

### STRICT RULES:
1. Extract ONLY the names that are written inside double quotes ("...").
2. Do NOT include words like "Task", "Event", "Gateway" unless they are explicitly inside the quotes.
3. Do NOT add prefixes like "Task - ".
4. Return a simple comma-separated list.

### Example:
Text: The process starts with "Order received". Then the task "Check Stock" is executed.
Output: Order received, Check Stock

### Process Description:
{{ description }}

### Output:
"""

prompt_builder_ext = PromptBuilder(template=extractor_template)
llm_ext = OllamaGenerator(model=model_name, url=ollama_url, generation_kwargs={"temperature": 0})
parser = TextToListParser()

extractor_pipeline = Pipeline()
extractor_pipeline.add_component("prompt_builder", prompt_builder_ext)
extractor_pipeline.add_component("llm", llm_ext)
extractor_pipeline.add_component("parser", parser)

# --- CONNECTIONS ---
extractor_pipeline.connect("prompt_builder", "llm")

# Connect 'replies' (Output LLM) and 'replies' (Input Parser)
extractor_pipeline.connect("llm.replies", "parser.replies")

# 1. Example-Input
# Hier testen wir, ob er sowohl Tasks als auch Events korrekt erkennt.
sample_description = """
The process starts at the "Passenger arrives". The task "Check-in at Counter" is performed. Afterwards, the process continues directly with the task "Pass Security Check". The task "Pass Security Check" is performed. Afterwards, the process ends when the passenger is ready.
"""

# 2. Pipeline execution
# Hand over variable 'description' (Prompt)
result_b = extractor_pipeline.run({
    "prompt_builder": {
        "description": sample_description
    }
})

# 3. Result
# In Output of 'parser'-Components within Key 'elements'
extracted_elements = result_b["parser"]["elements"]

print("Input Text:", sample_description)
print("-" * 30)
print("Extrahierte Liste:", extracted_elements)


PromptBuilder has 1 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.


Input Text: 
The process starts at the "Passenger arrives". The task "Check-in at Counter" is performed. Afterwards, the process continues directly with the task "Pass Security Check". The task "Pass Security Check" is performed. Afterwards, the process ends when the passenger is ready.

------------------------------
Extrahierte Liste: ['Passenger arrives', 'Check-in at Counter', 'Pass Security Check']


## c) Comparison Function

This section defines a custom Haystack component called MistakeFinder. It serves as the evaluator in our system. It extracts the "Ground Truth" (actual element names) directly from the BPMN XML using Regular Expressions and compares them against the list of elements extracted from the text by the LLM in the previous step. It identifies two types of errors:

Missing: Elements present in the XML but not found in the text.

Hallucinated: Elements found in the text that do not exist in the XML.

In [4]:
@component
class MistakeFinder:
    """
    Compares the elements extracted by the LLM with the actual elements in the BPMN XML.
    Identifies 'Missing' (exists in XML, missing in text) and 'Hallucinated' (exists in text, missing in XML) elements.
    References logic from Exercise Sheet 5.
    """
    
    # We return two values:
    # 1. A list of error messages (for the Corrector Pipeline in 1d).
    # 2. A boolean to control the loop in Exercise 2 later (Stop if False).
    @component.output_types(mistakes=List[str], has_mistakes=bool)
    def run(self, bpmn_content: str, extracted_elements: List[str]):
        
        # --- 1. Extract Ground Truth from XML ---
        # We use Regex to find all 'name="..."' attributes in relevant BPMN tags.
        # This covers Tasks, Events, and Gateways.
        pattern = re.compile(r'<bpmn:(?:task|startEvent|endEvent|exclusiveGateway|parallelGateway)[^>]*name="([^"]+)"')
        ground_truth_names = set(pattern.findall(bpmn_content))
        
        # --- 2. Clean Extracted List ---
        # Remove empty entries and whitespace
        text_names = set([x.strip() for x in extracted_elements if x.strip()])
        
        mistakes = []
        
        # --- 3. Comparison: Missing Elements ---
        # Elements that are in the XML but were NOT found in the text.
        missing = ground_truth_names - text_names
        for m in missing:
            # Formatting is important for the Corrector Prompt later
            mistakes.append(f"Missing element: The element '{m}' is in the BPMN model but missing in the description.")
            
        # --- 4. Comparison: Hallucinated Elements ---
        # Elements that are in the text but do NOT exist in the XML.
        hallucinated = text_names - ground_truth_names
        for h in hallucinated:
            mistakes.append(f"Hallucinated element: The element '{h}' is in the description but not in the BPMN model.")
            
        # Debugging Output (optional, good for presentation)
        print(f"DEBUG: Found {len(missing)} missing and {len(hallucinated)} hallucinated elements.")
        
        return {
            "mistakes": mistakes, 
            "has_mistakes": len(mistakes) > 0
        }

# Instantiate the component
mistake_finder = MistakeFinder()

# 1. Simulated XML (Ground Truth: "Start", "Task A")
mock_bpmn = """
bpmn:process id="Process_Airport">
    <bpmn:startEvent id="StartEvent_1" name="Passenger arrives">
      <bpmn:outgoing>Flow_1</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:task id="Task_1" name="Check-in at Counter">
      <bpmn:incoming>Flow_1</bpmn:incoming>
      <bpmn:outgoing>Flow_2</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_1" sourceRef="StartEvent_1" targetRef="Task_1" />
    <bpmn:task id="Task_2" name="Pass Security Check">
      <bpmn:incoming>Flow_2</bpmn:incoming>
      <bpmn:outgoing>Flow_3</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_2" sourceRef="Task_1" targetRef="Task_2" />
    <bpmn:endEvent id="EndEvent_1" name="Passenger ready">
      <bpmn:incoming>Flow_3</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="Flow_3" sourceRef="Task_2" targetRef="EndEvent_1" />
  </bpmn:process>"""

# 2. Simulated Text Extraction (Faulty: "Start" is missing, "Task B" is hallucinated)
mock_extracted = ['Check-in at Counter', 'Pass Security Check', 'Passenger arrives', 'End Event']

# 3. Execution
result_c = mistake_finder.run(bpmn_content=mock_bpmn, extracted_elements=mock_extracted)

print("-" * 30)
print("Mistakes found:")
for m in result_c["mistakes"]:
    print("-", m)
    
print("Has Mistakes:", result_c["has_mistakes"])

DEBUG: Found 1 missing and 1 hallucinated elements.
------------------------------
Mistakes found:
- Missing element: The element 'Passenger ready' is in the BPMN model but missing in the description.
- Hallucinated element: The element 'End Event' is in the description but not in the BPMN model.
Has Mistakes: True


## d) Corrector Pipeline

This pipeline is responsible for the self-correction mechanism. It uses a specific prompt that receives both the original text and the list of errors found (Missing/Hallucinated elements). The LLM is instructed to rewrite the text to fix these specific issues while keeping the correct parts of the description intact. This concept is based on the "Correction" approach from Exercise Sheet 5

In [5]:
# --- 1. Corrector Prompt Template ---
# This template instructs the LLM to modify the text based on the error list.
# We use a loop {% for ... %} to list all mistakes clearly.

corrector_template = """
You are a helpful assistant that corrects BPMN process descriptions based on a list of errors.

### Instructions:
1. Read the 'Original Text' and the 'List of Mistakes'.
2. Rewrite the text to fix the mistakes.
3. If a mistake says "Missing element", add it to the text in a logical position.
4. If a mistake says "Hallucinated element", remove it from the text.
5. Do not change parts of the text that are already correct.
6. Do not explain what you did, just correct the current description.

### Original Text:
{{ original_text }}

### List of Mistakes:
{% for mistake in mistakes %}
- {{ mistake }}
{% endfor %}

### Corrected Text:
"""

# --- 2. Initialize Components ---
prompt_builder_corr = PromptBuilder(template=corrector_template)

# We use a low temperature (0.1) to ensure the model focuses on the correction
# rather than "creatively" rewriting the whole story.
llm_corr = OllamaGenerator(model=model_name, url=ollama_url, generation_kwargs={"temperature": 0.1})

# --- 3. Create Pipeline ---
corrector_pipeline = Pipeline()
corrector_pipeline.add_component("prompt_builder", prompt_builder_corr)
corrector_pipeline.add_component("llm", llm_corr)

# --- 4. Connect Components ---
corrector_pipeline.connect("prompt_builder", "llm")

# Dummy-Daten zum Testen
mock_text = """
The process starts at the "Passenger arrives". The task "Check-in at Counter" is performed. Afterwards, the process continues directly with the task "Pass Security Check". The task "Pass Security Check" is performed. Afterwards, the process ends when the passenger is ready.
"""
mock_mistakes = [
    "Missing element: The element 'Passenger ready' is in the BPMN model but missing in the description.",
    "Hallucinated element: The element 'End Event' is in the description but not in the BPMN model."
]

# Pipeline ausf端hren
result_d = corrector_pipeline.run({
    "prompt_builder": {
        "original_text": mock_text,
        "mistakes": mock_mistakes
    }
})

corrected_text = result_d["llm"]["replies"][0]

print("--- Original ---")
print(mock_text)
print("\n--- Mistakes ---")
for m in mock_mistakes: print("-", m)
print("\n--- Corrected ---")
print(corrected_text)

PromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.


--- Original ---

The process starts at the "Passenger arrives". The task "Check-in at Counter" is performed. Afterwards, the process continues directly with the task "Pass Security Check". The task "Pass Security Check" is performed. Afterwards, the process ends when the passenger is ready.


--- Mistakes ---
- Missing element: The element 'Passenger ready' is in the BPMN model but missing in the description.
- Hallucinated element: The element 'End Event' is in the description but not in the BPMN model.

--- Corrected ---
The process starts at the "Passenger arrives". The task "Check-in at Counter" is performed. Afterwards, the process continues directly with the task "Pass Security Check". The task "Pass Security Check" is performed. When the passenger is ready, the process ends.


# 6.2 - Connecting Pipelines

In [None]:
def extract_process_only(input_file_path):
    """
    Liest eine .bpmn Datei ein und extrahiert NUR den <bpmn:process> Teil.
    Gibt den String zur端ck (oder None im Fehlerfall).
    """
    try:
        with open(input_file_path, 'r', encoding='utf-8') as f:
            content = f.read()
        
        # Pattern sucht Start <bpmn:process ...> bis Ende </bpmn:process>
        # .*? ist "non-greedy" (stoppt beim ersten </bpmn:process>)
        pattern = r'<bpmn:process.*?</bpmn:process>'
        
        # KORREKTUR: re.DOTALL direkt als 3. Argument 端bergeben (ohne 'flags=')
        match = re.search(pattern, content, re.DOTALL)
        
        if match:
            process_content = match.group(0)
            print("Erfolg: <bpmn:process> Abschnitt gefunden.")
            return process_content
        else:
            print("Warnung: Kein <bpmn:process> Tag gefunden. Pr端fe die Datei.")
            # Debugging-Hilfe: Zeige die ersten 100 Zeichen der Datei
            print(f"Datei-Inhalt Vorschau: {content[:100]}...") 
            return None

    except FileNotFoundError:
        print(f"Fehler: Die Datei '{input_file_path}' wurde nicht gefunden.")
        return None
    except Exception as e:
        print(f"Ein unerwarteter Fehler ist aufgetreten: {e}")
        return None


In [6]:
def run_bpmn_rag_system(bpmn_xml_string, max_iterations=5):
    """
    Orchestrates the Self-Correcting RAG pipeline.
    
    Args:
        bpmn_xml_string (str): The content of the BPMN file.
        max_iterations (int): Maximum number of correction loops.
        
    Returns:
        final_text (str): The refined description.
        history (list): A log of steps (useful for documentation/presentation).
    """
    
    history = [] # To store results for Exercise 3 documentation
    
    print(f"--- STARTING PIPELINE ---")
    
    # ---------------------------------------------------------
    # Step 1: Initial Generation (Exercise 2a)
    # ---------------------------------------------------------
    print("1. Generating initial description...")
    gen_result = generator_pipeline.run({
        "prompt_builder": {"bpmn_content": bpmn_xml_string}
    })
    current_text = gen_result["llm"]["replies"][0]
    
    # Log initial state
    history.append({
        "step": "Initial Generation",
        "text": current_text,
        "mistakes": None
    })
    
    print(f"   -> Initial text generated (Length: {len(current_text)} chars).")
    print("Initial Text: " + current_text)

    # ---------------------------------------------------------
    # Step 2: The Correction Loop (Exercise 2b & 2c)
    # ---------------------------------------------------------
    for i in range(max_iterations):
        print(f"\n--- Iteration {i+1}/{max_iterations} ---")
        
        # A. Extract Elements (Pipeline 1b)
        ext_result = extractor_pipeline.run({
            "prompt_builder": {"description": current_text}
        })
        # Note: Our custom component puts the list in 'elements'
        extracted_elements = ext_result["parser"]["elements"]
        
        # B. Find Mistakes (Component 1c)
        # We pass the ORIGINAL XML (Ground Truth) and the extracted list
        validation_result = mistake_finder.run(
            bpmn_content=bpmn_xml_string, 
            extracted_elements=extracted_elements
        )
        
        mistakes = validation_result["mistakes"]
        has_mistakes = validation_result["has_mistakes"]
        
        # Log this iteration
        history.append({
            "step": f"Iteration {i+1}",
            "extracted": extracted_elements,
            "mistakes": mistakes,
            "text_before_correction": current_text
        })
        
        # C. Decision Logic
        if not has_mistakes:
            print(">>> SUCCESS: No mistakes found! The text matches the BPMN model.")
            break
        else:
            print(f">>> Found {len(mistakes)} mistakes:")
            for m in mistakes:
                print(f"    - {m}")
            
            # Check if this was the last allowed iteration
            if i == max_iterations - 1:
                print(">>> WARNING: Max iterations reached. Stopping loop.")
                break
            
            # D. Run Correction (Pipeline 1d)
            print("    -> Running Corrector Pipeline...")
            corr_result = corrector_pipeline.run({
                "prompt_builder": {
                    "original_text": current_text,
                    "mistakes": mistakes
                }
            })
            
            # Update current_text for the next loop
            current_text = corr_result["llm"]["replies"][0]
            print("    -> Text corrected: " + current_text)

    return current_text, history

### 0. Testrun

In [31]:
# use a dummy string for quick testing right now:
bpmn_content = """
<bpmn:process id="Process_Turnaround" isExecutable="false">
    <bpmn:startEvent id="StartEvent_1" name="Aircraft at Gate">
      <bpmn:outgoing>Flow_1</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:task id="Task_Deboard" name="Deboard Passengers">
      <bpmn:incoming>Flow_1</bpmn:incoming>
      <bpmn:outgoing>Flow_2</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_1" sourceRef="StartEvent_1" targetRef="Task_Deboard" />
    <bpmn:parallelGateway id="Gateway_Split" name="Start Ground Ops">
      <bpmn:incoming>Flow_2</bpmn:incoming>
      <bpmn:outgoing>Flow_To_Refuel</bpmn:outgoing>
      <bpmn:outgoing>Flow_To_Baggage</bpmn:outgoing>
      <bpmn:outgoing>Flow_To_Cabin</bpmn:outgoing>
    </bpmn:parallelGateway>
    <bpmn:sequenceFlow id="Flow_2" sourceRef="Task_Deboard" targetRef="Gateway_Split" />
    
    <bpmn:task id="Task_Refuel" name="Refuel Aircraft">
      <bpmn:incoming>Flow_To_Refuel</bpmn:incoming>
      <bpmn:outgoing>Flow_From_Refuel</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_To_Refuel" sourceRef="Gateway_Split" targetRef="Task_Refuel" />
    
    <bpmn:task id="Task_Unload" name="Unload Baggage">
      <bpmn:incoming>Flow_To_Baggage</bpmn:incoming>
      <bpmn:outgoing>Flow_Bag_1</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_To_Baggage" sourceRef="Gateway_Split" targetRef="Task_Unload" />
    <bpmn:task id="Task_Load" name="Load Baggage">
      <bpmn:incoming>Flow_Bag_1</bpmn:incoming>
      <bpmn:outgoing>Flow_From_Baggage</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_Bag_1" sourceRef="Task_Unload" targetRef="Task_Load" />
    
    <bpmn:task id="Task_Clean" name="Clean Cabin">
      <bpmn:incoming>Flow_To_Cabin</bpmn:incoming>
      <bpmn:outgoing>Flow_Cab_1</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_To_Cabin" sourceRef="Gateway_Split" targetRef="Task_Clean" />
    <bpmn:task id="Task_Cater" name="Load Catering">
      <bpmn:incoming>Flow_Cab_1</bpmn:incoming>
      <bpmn:outgoing>Flow_From_Cabin</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_Cab_1" sourceRef="Task_Clean" targetRef="Task_Cater" />
    
    <bpmn:parallelGateway id="Gateway_Join" name="Ground Ops Done">
      <bpmn:incoming>Flow_From_Refuel</bpmn:incoming>
      <bpmn:incoming>Flow_From_Baggage</bpmn:incoming>
      <bpmn:incoming>Flow_From_Cabin</bpmn:incoming>
      <bpmn:outgoing>Flow_Merge</bpmn:outgoing>
    </bpmn:parallelGateway>
    <bpmn:sequenceFlow id="Flow_From_Refuel" sourceRef="Task_Refuel" targetRef="Gateway_Join" />
    <bpmn:sequenceFlow id="Flow_From_Baggage" sourceRef="Task_Load" targetRef="Gateway_Join" />
    <bpmn:sequenceFlow id="Flow_From_Cabin" sourceRef="Task_Cater" targetRef="Gateway_Join" />
    
    <bpmn:task id="Task_Board" name="Board Passengers">
      <bpmn:incoming>Flow_Merge</bpmn:incoming>
      <bpmn:outgoing>Flow_End</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_Merge" sourceRef="Gateway_Join" targetRef="Task_Board" />
    <bpmn:endEvent id="EndEvent_1" name="Ready for Departure">
      <bpmn:incoming>Flow_End</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="Flow_End" sourceRef="Task_Board" targetRef="EndEvent_1" />
  </bpmn:process>
"""

# 2. Run the system
final_description, run_history = run_bpmn_rag_system(bpmn_content)

print("\n\n=== FINAL RESULT ===")
print(final_description)
print(run_history)

--- STARTING PIPELINE ---
1. Generating initial description...
   -> Initial text generated (Length: 690 chars).
Initial Text: The process begins with the start event "Aircraft at Gate". Afterwards, the task "Deboard Passengers" is executed. The gateway "Start Ground Ops" splits into three paths:

- If the condition is "To Refuel", the process proceeds to the task "Refuel Aircraft".
- If the condition is "To Baggage", the process proceeds to the task "Unload Baggage".
- If the condition is "To Cabin", the process proceeds to the task "Clean Cabin".

Each of these paths then continues independently until they reach the gateway "Ground Ops Done". The gateway joins all three paths together, and the process proceeds to the task "Board Passengers". Finally, the process ends with the event "Ready for Departure".

--- Iteration 1/5 ---
DEBUG: Found 3 missing and 0 hallucinated elements.
>>> Found 3 mistakes:
    - Missing element: The element 'Start Ground Ops' is in the BPMN model but missin

### 1. Run

In [None]:
# Example Usage:
# 1. Load a file 
input_bpmn = "/Users/felixfindeisen/Documents/Uni/Praktikum Wirtschaftsinformatik/Excercise 3/carry_on_baggage.bpmn"
bpmn_content = extract_process_only(input_bpmn)

# print(bpmn_content)

# OR use a dummy string for quick testing right now:
dummy_bpmn = """
<bpmn:process id="Process_Turnaround" isExecutable="false">
    <bpmn:startEvent id="StartEvent_1" name="Aircraft at Gate">
      <bpmn:outgoing>Flow_1</bpmn:outgoing>
    </bpmn:startEvent>
    <bpmn:task id="Task_Deboard" name="Deboard Passengers">
      <bpmn:incoming>Flow_1</bpmn:incoming>
      <bpmn:outgoing>Flow_2</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_1" sourceRef="StartEvent_1" targetRef="Task_Deboard" />
    <bpmn:parallelGateway id="Gateway_Split" name="Start Ground Ops">
      <bpmn:incoming>Flow_2</bpmn:incoming>
      <bpmn:outgoing>Flow_To_Refuel</bpmn:outgoing>
      <bpmn:outgoing>Flow_To_Baggage</bpmn:outgoing>
      <bpmn:outgoing>Flow_To_Cabin</bpmn:outgoing>
    </bpmn:parallelGateway>
    <bpmn:sequenceFlow id="Flow_2" sourceRef="Task_Deboard" targetRef="Gateway_Split" />
    
    <bpmn:task id="Task_Refuel" name="Refuel Aircraft">
      <bpmn:incoming>Flow_To_Refuel</bpmn:incoming>
      <bpmn:outgoing>Flow_From_Refuel</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_To_Refuel" sourceRef="Gateway_Split" targetRef="Task_Refuel" />
    
    <bpmn:task id="Task_Unload" name="Unload Baggage">
      <bpmn:incoming>Flow_To_Baggage</bpmn:incoming>
      <bpmn:outgoing>Flow_Bag_1</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_To_Baggage" sourceRef="Gateway_Split" targetRef="Task_Unload" />
    <bpmn:task id="Task_Load" name="Load Baggage">
      <bpmn:incoming>Flow_Bag_1</bpmn:incoming>
      <bpmn:outgoing>Flow_From_Baggage</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_Bag_1" sourceRef="Task_Unload" targetRef="Task_Load" />
    
    <bpmn:task id="Task_Clean" name="Clean Cabin">
      <bpmn:incoming>Flow_To_Cabin</bpmn:incoming>
      <bpmn:outgoing>Flow_Cab_1</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_To_Cabin" sourceRef="Gateway_Split" targetRef="Task_Clean" />
    <bpmn:task id="Task_Cater" name="Load Catering">
      <bpmn:incoming>Flow_Cab_1</bpmn:incoming>
      <bpmn:outgoing>Flow_From_Cabin</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_Cab_1" sourceRef="Task_Clean" targetRef="Task_Cater" />
    
    <bpmn:parallelGateway id="Gateway_Join" name="Ground Ops Done">
      <bpmn:incoming>Flow_From_Refuel</bpmn:incoming>
      <bpmn:incoming>Flow_From_Baggage</bpmn:incoming>
      <bpmn:incoming>Flow_From_Cabin</bpmn:incoming>
      <bpmn:outgoing>Flow_Merge</bpmn:outgoing>
    </bpmn:parallelGateway>
    <bpmn:sequenceFlow id="Flow_From_Refuel" sourceRef="Task_Refuel" targetRef="Gateway_Join" />
    <bpmn:sequenceFlow id="Flow_From_Baggage" sourceRef="Task_Load" targetRef="Gateway_Join" />
    <bpmn:sequenceFlow id="Flow_From_Cabin" sourceRef="Task_Cater" targetRef="Gateway_Join" />
    
    <bpmn:task id="Task_Board" name="Board Passengers">
      <bpmn:incoming>Flow_Merge</bpmn:incoming>
      <bpmn:outgoing>Flow_End</bpmn:outgoing>
    </bpmn:task>
    <bpmn:sequenceFlow id="Flow_Merge" sourceRef="Gateway_Join" targetRef="Task_Board" />
    <bpmn:endEvent id="EndEvent_1" name="Ready for Departure">
      <bpmn:incoming>Flow_End</bpmn:incoming>
    </bpmn:endEvent>
    <bpmn:sequenceFlow id="Flow_End" sourceRef="Task_Board" targetRef="EndEvent_1" />
  </bpmn:process>
"""

# 2. Run the system
final_description, run_history = run_bpmn_rag_system(bpmn_content)

print("\n\n=== FINAL RESULT ===")
print(final_description)
print(run_history)

Erfolg: <bpmn:process> Abschnitt gefunden.
--- STARTING PIPELINE ---
1. Generating initial description...
   -> Initial text generated (Length: 1042 chars).
Initial Text: The process begins with the start event "Passenger arrives at Airport". Afterwards, the task "Check amount, weight and dimensions" is executed. 

If the baggage is not compliant, the passenger is required to pay a fee via credit card. Subsequently, prohibited items such as valuables and batteries must be removed, and the baggage is transported in the aircraft hold.

If the baggage is compliant, the process checks for high flight occupancy. If occupancy is high, the staff offers a free check-in option for the compliant baggage. The passenger can then decide whether to accept this offer. 

If accepted, the baggage is prepared for the hold after removing sensitive items. If the offer is declined, or if the flight occupancy is normal, the passenger proceeds to boarding and enters the cabin.

Inside the cabin, the stowage 

### 2. Run

In [36]:
# Example Usage:
# 1. Load a file 
input_bpmn = "/Users/felixfindeisen/Documents/Uni/Praktikum Wirtschaftsinformatik/Excercise 3/delayed_baggage.bpmn"
bpmn_content = extract_process_only(input_bpmn)

# print(bpmn_content)

# 2. Run the system
final_description, run_history = run_bpmn_rag_system(bpmn_content)

print("\n\n=== FINAL RESULT ===")
print(final_description)
print(run_history)

Erfolg: <bpmn:process> Abschnitt gefunden.
--- STARTING PIPELINE ---
1. Generating initial description...
   -> Initial text generated (Length: 1388 chars).
Initial Text: The process begins with the start event "Baggage missing at destination". Afterwards, the task "Report delayed baggage (Online or at Airport)" is executed. The exclusive gateway "Chosen option?" checks if a chosen option has been selected. If so, the process proceeds to the task "Deliver to provided address". If not, the process proceeds to the task "Collect baggage from airport".

The parallel gateway "Gateway_Parallel" branches into two paths: one for checking status online and another for collecting baggage. The exclusive gateway "Chosen option?" determines which path to take.

If the chosen option is delivery, the process proceeds to the task "Deliver to provided address". If the chosen option is collection, the process proceeds to the task "Collect baggage from airport".

The parallel gateway "Gateway_closeParall

### 3. Run

In [37]:
# Example Usage:
# 1. Load a file 
input_bpmn = "/Users/felixfindeisen/Documents/Uni/Praktikum Wirtschaftsinformatik/Excercise 3/passenger_arrival.bpmn"
bpmn_content = extract_process_only(input_bpmn)

# print(bpmn_content)

# 2. Run the system
final_description, run_history = run_bpmn_rag_system(bpmn_content)

print("\n\n=== FINAL RESULT ===")
print(final_description)
print(run_history)

Erfolg: <bpmn:process> Abschnitt gefunden.
--- STARTING PIPELINE ---
1. Generating initial description...
   -> Initial text generated (Length: 1251 chars).
Initial Text: The process begins with the start event "StartEvent_1". Afterwards, the task "Passenger arrives at terminal" is executed. The exclusive gateway "Arrival from Schengen area" checks if the arrival is from the Schengen area. If the condition is "Yes", the passenger proceeds to the task "Passenger goes to passport control" and then shows their passport in the task "Passenger shows passport". If the condition is "No", these control steps are skipped.

Subsequently, the passenger goes to the baggage claim area. The exclusive gateway "Passenger has Baggage?" determines whether the passenger has baggage. If the passenger has baggage, they execute the task of taking their baggage in the task "Passenger takes baggage". If they do not have baggage, they proceed directly to the next stage without taking baggage.

Finally, the pro