# 🛠️ Timeseries QA with agentic LLMs & manuals

Welcome to this **hands-on workshop** where we explore how to combine **LLMs (Large Language Models)** with **machine telemetry data** and **equipment manuals** for smart diagnostics. With the help of an **AI agent** we will talk to our data in natural language.

---

## 🎯 Workshop Goals

By the end of this session, you'll be able to:

- ✅ **Query timeseries data** using natural language
- ✅ **Detect anomalies** using both data and manual thresholds
- ✅ **Interpret machine behavior** by combining real-time metrics with context from manuals
- ✅ **Generate SQL and visualizations** with the help of an LLM
- ✅ **Explain telemetry results** in plain English
- ✅ **Define an AI agent** and understand its basic building blocks such as model context protocol (MCP) servers

---

## 🧠 Why This Matters

Modern machines generate a huge volume of telemetry data (vibration, temperature, speed, etc.). Understanding this data is critical for:

- 🛑 **Detecting anomalies before they cause failures**
- 🔍 **Explaining why something is behaving abnormally**
- 🧰 **Making maintenance more proactive and data-driven**

But reading raw data isn't enough...

That's why this notebook shows how **AI assistants can help domain experts and analysts** by combining:

- 📊 **Telemetry data** (from CrateDB)
- 📘 **Manuals and expert context** (stored in SQL)
- 💬 **Natural language** (as the interface)

---

## 📦 What You'll Build

Over the course of this workshop, you'll create a system that can:

- Load and explore machine telemetry stored in CrateDB
- Ask questions like:
  - _“Is machine 5 overheating?”_
  - _“What should I do if machine 3 has an anomaly?”_
- Generate relevant SQL queries, tapping into the large number of LLMs available on AWS Bedrock
- Use an MCP server in connection with an agent, to make the smart assistant extendable with other services

All of this will run **in a single Jupyter notebook** — no frontend or backend code needed.

---

Let’s get started! 🚀

## Step 1: Setup & Installation 

🛠️ Setup and Installation
In this step, we install all required Python packages to run the workshop.
Our hosted Jupyter notebook service already includes many by default, but we ensure compatibility and version alignment here.

In [None]:
# Install dependencies
%pip install -U \
    pandas matplotlib ipython-sql tqdm \
    crate cratedb-mcp==0.0.3 sqlalchemy-cratedb \
    "inlineagent @ git+https://github.com/awslabs/amazon-bedrock-agent-samples@4a5d72a#subdirectory=src/InlineAgent"

## Step 2: Generate and Store Synthetic Timeseries Data

### Connect to CrateDB


For this workshop, we’ll use **CrateDB** as our database to store both:

- 📈 **Timeseries telemetry data** (e.g., vibration, temperature, rotations)
- 📘 **Machine manuals** (e.g., anomaly thresholds, emergency protocols)

CrateDB is a distributed SQL database optimized for **real-time analytics on machine data and IoT workloads**. It blends the scalability of NoSQL with the familiarity and power of SQL — making it ideal for hybrid scenarios like combining sensor readings with structured documents.

In this notebook, we’ll use CrateDB to:

- Store synthetic telemetry data across multiple machines
- Store matching operational manuals per machine
- Use natural language to **query both datasets together**
- Detect anomalies, extract insights, and generate contextual diagnostics

You can use **CrateDB Cloud** to get started without any setup:
🔗 [Launch a free cluster on CrateDB Cloud](https://console.cratedb.cloud/)

Alternatively, you can also run CrateDB locally using Docker.

Let’s connect and load our first dataset.

Please **adjust the connection string** to point to your CrateDB cluster.

In [None]:
import os
import sqlalchemy as sa
import pandas as pd

# Option 1: CrateDB Cloud (Update or set via CRATEDB_CONNECTION_STRING env variable)
# Example: crate://admin:ooToh2Paecielun@demo.eks1.eu-west-1.aws.cratedb.net/?ssl=true
CONNECTION_STRING = os.environ.get(
    "CRATEDB_CONNECTION_STRING",
    "crate://USER:PASSWORD@CRATEDB_HOST/?ssl=true",
)

# Option 2: Localhost setup
# CONNECTION_STRING = os.environ.get("CRATEDB_CONNECTION_STRING", "crate://crate@localhost/")

# Try to connect
try:
    engine = sa.create_engine(CONNECTION_STRING)
    connection = engine.connect()

    # Run a simple query to validate connection
    result = pd.read_sql("SELECT mountain FROM sys.summits LIMIT 1", con=engine)
    print("✅ Successfully connected to CrateDB!")
    print("Sample query result from sys.summits:", result.iloc[0]["mountain"])
except Exception as e:
    print("❌ Failed to connect to CrateDB. Please check your connection string.")
    print("Error:", e)

### Define the Table Schema in CrateDB

Before inserting data, we explicitly define the `motor_readings` table in CrateDB. This ensures consistent data types and structure, which is especially important when working in production environments or collaborating across teams.

The table will store telemetry for each machine, including timestamped readings for vibration, temperature, and rotations.

In [None]:
from sqlalchemy import text

# Define the CREATE TABLE statement
create_table_sql = text(
    """
    CREATE TABLE IF NOT EXISTS motor_readings (
        machine_id INTEGER,
        timestamp TIMESTAMP WITHOUT TIME ZONE,
        vibration DOUBLE PRECISION,
        temperature DOUBLE PRECISION,
        rotations DOUBLE PRECISION,
        PRIMARY KEY (machine_id, timestamp)
    );
    """
)

try:
    connection.execute(create_table_sql)
    print("✅ Table 'motor_readings' created (if not already existing).")
except Exception as e:
    print("❌ Failed to create table.")
    print("Error:", e)

### Generate & Load Timeseries Data

Let’s generate synthetic telemetry data for 10 machines and store it in CrateDB under the table `motor_readings`.
This table will serve as the base for all LLM queries and visual analytics in the next steps.

You can modify the number of machines, simulation days, or reading frequency by adjusting the configuration block below.
This gives you full control over the size and granularity of your synthetic timeseries dataset.

In [None]:
from tqdm import tqdm
import numpy as np
import datetime

# --- Configuration ---
num_machines = 10  # Number of machines to simulate
days = 30  # Number of days to simulate
freq_minutes = 15  # Frequency of readings (in minutes)


# --- Data Generation ---
def generate_timeseries_data(num_machines, days, freq_minutes):
    total_intervals = int((24 * 60 / freq_minutes) * days)
    timestamps = [
        datetime.datetime.now() - datetime.timedelta(minutes=freq_minutes * i)
        for i in range(total_intervals)
    ]
    data = []

    for machine_id in tqdm(range(num_machines)):
        for t in timestamps:
            vibration = np.round(np.random.normal(1.0, 0.2), 4)
            temperature = np.round(np.random.normal(45, 2.5), 2)
            rotations = np.round(np.random.normal(1600, 30), 2)
            data.append([t, vibration, temperature, rotations, machine_id])

    df = pd.DataFrame(
        data,
        columns=["timestamp", "vibration", "temperature", "rotations", "machine_id"],
    )
    return df


# --- Generate & Preview ---
df_ts = generate_timeseries_data(num_machines, days, freq_minutes)
print(f"✅ Generated {len(df_ts)} rows of synthetic timeseries data.")

# --- Load to CrateDB ---
df_ts.to_sql("motor_readings", con=engine, if_exists="append", index=False)
print("✅ Data loaded into CrateDB table 'motor_readings'.")

## Step 3: Previewing and Exploring the Data

### Explore the Timeseries Data in CrateDB
Now that we’ve generated and loaded our synthetic telemetry data, let’s run some SQL queries to explore it.
We’ll check how many rows were inserted, and preview a few example records.

In [None]:
# Query: Count total records
df_count = pd.read_sql("SELECT COUNT(*) as total_rows FROM motor_readings", con=engine)
print(f"🔢 Total rows in 'motor_readings': {df_count.iloc[0]['total_rows']}")

# Query: Preview 5 sample rows (formatted timestamps)
query_preview = """
SELECT
    timestamp,
    vibration,
    temperature,
    rotations,
    machine_id
FROM motor_readings
ORDER BY timestamp DESC
LIMIT 5
"""

df_preview = pd.read_sql(query_preview, con=engine)
df_preview["timestamp"] = pd.to_datetime(df_preview.timestamp, unit="ms")

print("👀 Sample records:")
df_preview

## Step 4: Natural Language Querying with an LLM (Table-Augmented Generation)

### Ask Questions in Natural Language

In this step, we use an LLM to convert plain language questions into SQL queries and run them against our timeseries data in CrateDB.
This is an example of **Table-Augmented Generation (TAG)** — combining large language models with structured data.

You’ll be able to ask questions like:

- What is the average rotation for machine 3?
- When was the last anomaly for machine 5?
- How many temperature spikes were there last week?

We will start with an implementation that still relies on classical programming to extract schema information. Our assistant will becomes more lightweight later on.

### Schema Extraction

Here we define a helper method that queries column information for a given table. It will provide the LLM with knowledge of how columns are named and what their data type are.

In [None]:
# Get table schema (columns and types) from CrateDB
def fetch_table_schema(table_name):
    "Fetch the column names and data types for a given table from CrateDB's system catalog."

    query = text(
        """
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_name = :tbl
        ORDER BY ordinal_position
        """
    )
    try:
        stmt = query.bindparams(tbl=table_name)
        df = pd.read_sql(stmt, con=engine)

        schema_text = f"Table: {table_name}\nColumns:\n"
        for _, row in df.iterrows():
            schema_text += f"- {row['column_name']} ({row['data_type']})\n"
        return schema_text
    except Exception as e:
        print(f"❌ Error fetching schema for table '{table_name}':", e)
        return f"Error fetching schema for {table_name}"


print("📁 Schema output for motor_readings table:\n")
print(fetch_table_schema("motor_readings"))

### Define Prompt Template

We now define the prompt to the LLM. We use [AWS Bedrock](https://aws.amazon.com/bedrock/), which is a marketplace for a large number of LLMs. The `modelId` parameter configures which exact model to use, and we can easily change it to query other models if needed. For this workshop, we will go with AWS' [Nova model](https://aws.amazon.com/ai/generative-ai/nova/). 

The LLM returns a reply with the resulting SQL query, as well as some additional explanations about its thought process. The LLM doesn't execute the SQL itself, so we need to:

1. **Extract the SQL statement from the LLM's reply.**

   The LLM comes back with a rather verbose reply. It may repeat again the question it was asked or state certain assumptions that it made. We are only interested in the plain SQL, so we need to extract it from the reply.

2. **Connect to CrateDB and execute the SQL statement ourselves.**

   Once we know the SQL statement, we can easily connect to CrateDB using our standard Python driver and run the statement.

***

The LLM prompt itself also consists of two parts:

1. 🤖 **System prompt**

   This is a set of instructions for the LLM on how it should behave. We can explain what expectations we have towards the LLM, and help it understand the context of our user questions.
   When it comes to the generation of SQL statements, we can formulate certain rules that it should follow, e.g. patterns to apply when a specific keyword is used in the question.

   The system prompt is typically static and doesn't change across different user questions.

2. 💁 **User prompt**

   Here we inject the actual user question. We do not do any preprocessing, validation, etc. but leave it all to the LLM to make sense of it in the context of the provided system prompt.

Let's define a method that talks to the LLM for us:

In [None]:
import json
import re
import boto3

client = boto3.client("bedrock-runtime")


def prompt_llm(table_schema: str, question: str) -> str:
    """
    Defines the prompt for the LLM.

    It uses system prompts to give context to the LLM.
    The user prompt consists of schema information and the actual question.
    """

    # Format the request payload using the model's native structure.
    # See https://docs.aws.amazon.com/nova/latest/userguide/complete-request-schema.html
    native_request = {
        "system": [
            {
                "text": "You are a CrateDB expert. Your task is to generate SQL queries in CrateDB's SQL syntax.",
            },
            {
                "text": """
                    Assume an **anomaly** is defined as:
                    vibration > 1.5 OR temperature > 80 OR rotations > 500

                    This definition is for reference only — do not apply anomaly filters unless the user’s question explicitly asks about anomalies.
                """
            },
            {
                "text": """
                    Rules for generating SQL queries:
                        - Always include the 'timestamp' column in the SELECT clause for any question involving plotting, visualizations, trends, over time, or per day / week / hour.
                        - Only exclude 'timestamp' for pure aggregations (e.g., total counts without time).
                        - When using date intervals, always include both the quantity and the unit in a string, e.g. INTERVAL '7 days'.
                        - If using an aggregation function (e.g., MAX, AVG) with other fields, include a proper GROUP BY clause.
                """
            },
        ],
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": f"The following table schema is available: {table_schema}",
                    },
                    {
                        "text": question,
                    },
                ],
            },
        ],
    }

    # Convert the native request to JSON.
    request = json.dumps(native_request)

    # Invoke the model
    response = client.invoke_model(
        modelId="us.amazon.nova-pro-v1:0",
        body=request,
    )

    # There is lots of metadata returned, we are only interested in the actual response
    response_body = json.loads(response["body"].read())
    return response_body["output"]["message"]["content"][0]["text"]


def get_sql_from_llm(question: str, table_name: str = "motor_readings"):
    """
    This puts all the building blocks together by:
    1. Retrieving the table schema
    2. Invoking the LLM
    3. Extracting the SQL query
    """

    # Dynamically fetch schema
    table_schema = fetch_table_schema(table_name)

    # Prompt the LLM
    response = prompt_llm(table_schema, question)

    # Optional: Uncomment the line below to see the complete LLM response
    # print("Debug: The LLM output was: \n", response)

    # Extract the actual SQL query from the response
    match = re.search("(?<=`sql)([^`]*)(?=`)", response)
    if match:
        return match.group(1).strip()

    raise ValueError("Failed to extract SQL from LLM reply", response)

print("✅ Functions `prompt_llm` and `get_sql_from_llm` defined successfully")

### Code Cell: Ask a Question → Run SQL → Show Result
You can customize the question below to ask anything about the timeseries data using plain English.

The assistant will translate your question into SQL, run it on the `motor_readings` table, and return the result.

Below you will find a set of example questions. **Uncomment** one of the `question` assignments **and run** them to see the result. Feel free to **try you own questions** as well to get a feeling for how the LLM works.


---
**💡 NOTE**

When trying your own questions, you may also have to extend the system prompt if the LLM has difficulties with your question.

---

In [None]:
# Ask a question in natural language

question = "What was the average rotation for machine 3 the last week?"
# question = "When was the last recorded anomaly?"
# question = "How many readings had a vibration greater than 1.5?"
# question = "What was the number of anomalies per machine in the last 48 hours?"

# Convert to SQL
sql_query = get_sql_from_llm(question)

# Collect and format output in a single list
output = []
output.append("🧠 Generated SQL:")
output.append(sql_query)

try:
    # Execute the SQL
    df_result = pd.read_sql(sql_query, con=engine)
    output.append("\n✅ Query executed. Result:")
    output.append(df_result.to_string(index=False))  # One box output
except Exception as e:
    output.append("❌ Error running query:")
    output.append(str(e))

# Print all in one block
print("\n".join(output))

## Step 5: Visualizing Timeseries Data with Natural Language

Getting a reply to our questions is already a nice result. But often, the answer to our questions is not just a single number or a few rows. And at some point, humans have difficulties grasping the content of large amounts of text.

Therefore, we add a bit of processing logic to detect if it makes sense to switch to a visual representation of the result. In one of the system prompts, we told the LLM to always include a `timestamp` column unless just a single row or value is returned. We take advantage of this rule now, and decide if we should plot the result or not based on the presence of the `timestamp` column using Matplotlib.

This lets you:
- Plot machine readings over time
- Compare metrics like vibration, temperature, and rotations
- Quickly identify anomalies or trends

Example questions:

- Show temperature and vibration for machine 2 over time.
- Plot the average rotation per machine.
- Show the number of anomalies per day.

### Ask a Question → LLM Generates SQL → Plot with Matplotlib

We’ll use the same `get_sql_from_llm` function, then add a logic layer to check if the result has a `timestamp` column (for time-based plotting). 

The assistant will generate SQL and visualize the result as a timeseries chart.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Step 1: Ask a visualization-friendly question
question = "Show temperature, rotation and vibration for machine 2 last Monday."
# question = "Show average temperature, rotation and vibration for machine 2 on Mondays."
# question = "Plot the average rotation and temperature for the last week. Rename the start_time column to timestamp."

# Step 2: Get SQL from LLM
sql_query = get_sql_from_llm(question)
print("🧠 Generated SQL:\n", sql_query)

# Step 3: Run the query
try:
    df_result = pd.read_sql(sql_query, con=engine)
    print("✅ Query returned", len(df_result), "rows.")

    # Step 4: Try to plot if timestamp column is present
    if "timestamp" in df_result.columns:
        # Ensure timestamp is datetime and sorted (handle epoch ms)
        df_result = df_result.sort_values("timestamp")

        # Convert epoch millis to datetime if needed
        if df_result["timestamp"].dtype in ["int64", "float64"]:
            df_result["timestamp"] = pd.to_datetime(df_result["timestamp"], unit="ms")
        else:
            df_result["timestamp"] = pd.to_datetime(df_result["timestamp"])

        df_result.set_index("timestamp", inplace=True)

        # Plot numeric columns
        fig, ax = plt.subplots(figsize=(14, 5))
        df_result.plot(ax=ax, title=question)

        # Format x-axis for better readability
        ax.set_xlabel("Timestamp")
        ax.set_ylabel("Value")
        ax.grid(True)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d\n%H:%M"))

        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
    else:
        print("No 'timestamp' column in result — skipping visualization.")
        print(
            "Tip: Ask a time-based question like '...over time' or 'per day' to enable plotting."
        )
        display(df_result)
except Exception as e:
    print("❌ Error during SQL execution or plotting:")
    print(e)

## Step 6: Turn it into an AI Agent

The previous steps have established the ability to generate and run queries from natural language. However, there are a few difficulties with this approach:

🟧 We need to manually implement retrieval of table schemas, and pass them as part of the prompt.

🟧 The LLM doesn't get the CrateDB SQL grammar right in a number of cases. System prompts are needed to steer the LLM into the right direction. Asking new questions likely will result in the need for additional system propmpts.

🟧 We still need to program the execution of the SQL query the LLM came up with.

Advantages of the **agent approach**:

✅ An agent allows to integrate multiple services (called "tools") and process information from them. It is able to bridge the gap between tools.

✅ The CrateDB MCP Server implements a `get_table_metadata` tool for retrieving schema information. The LLM will call this tool when it needs schema information to generate a query. We no longer need to manually encode the schema into the prompt.

✅ Also due to the `query_sql` tool of the CrateDB MCP Server, the agent can execute the SQL query it came up with.

✅ The CrateDB MCP Server also has a `fetch_cratedb_docs` tool. If the LLM is in doubt how to generate a query, it can consult the CrateDB documentation. This is also helpful for features added to CrateDB after the LLM's cutoff date, which is often more than a year back.

---

**Agents** are usually defined as permanent resources in AWS Bedrock. But there is also the option for them to be **transient (inline)** and only exist during runtime. This is very useful for **development and rapid prototyping**, which comes handy for this workshop.

Let's start with defining an inline agent, configure the CrateDB MCP server, and pass an action group with it to the agent.

In [None]:
from urllib import parse

from mcp import StdioServerParameters

from InlineAgent.tools.mcp import MCPStdio
from InlineAgent.action_group import ActionGroup
from InlineAgent.agent import InlineAgent
from InlineAgent import AgentAppConfig

config = AgentAppConfig()


def sqlalchemy_to_http(connection_string: str) -> str:
    """
    Due to a technicality, we need to translate between two different types of connection URLs.
    pandas was using a SQLAlchemy-style connection URL (crate://...),
    while the MCP server uses a standard HTTP URL with basic auth.
    """

    parsed = parse.urlparse(connection_string)
    if "ssl=true" in parsed.query:
        protocol = "https"
    else:
        protocol = "http"

    return f"{protocol}://{parsed.username}:{parse.quote(parsed.password)}@{parsed.hostname}:4200"


async def query_agent(question):
    """
    This method defines an AWS Bedrock inline agent during runtime:
        https://docs.aws.amazon.com/bedrock/latest/userguide/agents-create-inline.html

    The action group contains one or more actions that the agent can perform.
    In our case, we have only one action, which is reaching out to the CrateDB MCP server.
    The CrateDB MCP server has several tools that it offers, such as `query_sql`, `get_table_metadata`, etc.
    """

    cratedb_mcp_client = await MCPStdio.create(
        server_params=StdioServerParameters(
            # The MCP server is a Python script that we installed earlier as part of `pip install`.
            # It got placed in the `bin` directory of our virtual environment.
            command="cratedb-mcp",
            args=["serve"],
            env={
                "CRATEDB_CLUSTER_URL": sqlalchemy_to_http(CONNECTION_STRING),
                "CRATEDB_MCP_TRANSPORT": "stdio",
            },
        )
    )

    # The action group containing our MCP server.
    # Other types of actions may be OpenAPI schemas or Python methods.
    # https://docs.aws.amazon.com/bedrock/latest/userguide/action-define.html
    #
    # We stick to only CrateDB here for the sake of the workshop,
    # although the real power of agents comes from connecting multiple components.
    action_group = ActionGroup(
        name="CratateDBActionGroup",
        mcp_clients=[cratedb_mcp_client],
    )

    return await InlineAgent(
        foundation_model="us.amazon.nova-pro-v1:0",
        instruction=f"""
            You are a friendly assistant who receives information from CrateDB.
            Your task is to translate questions into SQL queries, run them on CrateDB, and return back results.
            Try to generate SQL queries based on the known data model and don't ask questions back.

            You have the following tools available:
            1. `query_sql`: Executes SQL queries on CrateDB
            2. `get_cratedb_documentation`: Returns the table of contents for the CrateDB documentation. If in doubt about CrateDB-specific syntax, you can obtain the documentation here.
            3. `fetch_cratedb_docs`: Once a specific link within the CrateDB documentation is identified, you can download its content here by providing the link.
            4. `get_table_metadata`: This returns all metadata for tables in CrateDB.

            Try to reason and give an interpretation of the result.

            When asked about manuals, query the `manual` column of the `machine_manuals` table to retrieve the manual. Interpret its content to provide an answer.

            Rules for writing SQL queries:
              - To retrieve the latest value for a column, use CrateDB's `MAX_BY` function.
              - When using date intervals, always include both the quantity and the unit in a string, e.g. INTERVAL '7 days'.
              - Don't use DATE_SUB, it does not exist in CrateDB. Use DATE_TRUNC instead.
        """,
        agent_name="cratedb_query_agent",
        action_groups=[action_group],
    ).invoke(input_text=question)

print("✅ Function `query_agent` defined successfully")

As before when communicating directly with the LLM, we can try out a few different questions. But this time, we ask it to the agent, and not the LLM.

In [None]:
# fmt: off
question = "Is any of my machines behaving significantly different compared to others? I'm interested in vibration from motor_readings."
# question = "Did the vibration of machine 4 change between today and yesterday? Query the table motor_readings."
# question = "How recent is my data in motor_readings? Is there any machine that lacks behind?"
# question = "What was the highest temperature ever observed over all machines? Apply DATE_TRUNC to generate a weekly overview and include the week in your reply. The week is returned as a timestamp in millisecond, format it in a human-readable way."
# fmt: on

print(await query_agent(question))

## Step 7: Integrate Machine Manuals into the QA Pipeline

We so far worked on timeseries data only. Now we dynamically generate fictional manuals for each machine based on the IDs in `motor_readings`.

Each manual includes:
- Operational limits
- Maintenance schedules
- Emergency protocols
- Manufacturer and contact info

This ensures the manual data matches whatever telemetry data has been created, even if someone customized the setup earlier.

We store the results in a CrateDB table: `machine_manuals`.


In [None]:
import random

# === Configuration ===
include_branding = True
include_contact_info = True

brands = ["AtlasTech", "RotoFlow", "MechAxis", "IndustraCore"]
models = ["VX100", "MX200", "TQ350", "RG450"]
year_range = list(range(2017, 2023))

# === Load unique machine IDs from CrateDB ===
machine_ids = pd.read_sql("SELECT DISTINCT machine_id FROM motor_readings", con=engine)
machine_ids = machine_ids["machine_id"].tolist()


def generate_manual(machine_id):
    brand = random.choice(brands)
    model = random.choice(models)
    year = random.choice(year_range)

    vib_max = round(random.uniform(1.2, 1.6), 2)
    temp_max = round(random.uniform(65, 75), 1)
    rpm_max = random.randint(1550, 1650)

    # Build optional blocks
    branding_section = ""
    if include_branding:
        branding_section = f"""**Manufacturer:** {brand}
**Model:** {model}
**Year of Installation:** {year}"""

    contact_section = ""
    if include_contact_info:
        contact_section = f"""**Contact:**
- Support: support@{brand.lower()}.com
- Manual Version: 1.0"""

    # Build the full manual string
    content = f"""
🛠️ Machine Manual — ID: {machine_id}

{branding_section}

---

**Operational Limits:**
- Max Vibration: {vib_max} units
- Max Temperature: {temp_max}°C
- Max RPM: {rpm_max} rotations/min

**Anomaly Detection:**
- Vibration > {vib_max} may indicate imbalance or bearing issues
- Temperature > {temp_max} may suggest overheating
- RPM deviations > ±100 RPM require inspection

---

**Maintenance Schedule:**
- Weekly: Inspect vibration and temperature logs
- Monthly: Lubricate bearings and check alignment
- Quarterly: Full motor calibration and safety check

**Emergency Protocol:**
If vibration exceeds {vib_max + 0.2} or temperature exceeds {temp_max + 5}:
1. Immediately reduce load
2. Shut down the motor if anomaly persists for >5 mins
3. Notify operations lead and schedule maintenance

---

{contact_section}
""".strip()

    return {"machine_id": machine_id, "manual": content}


# Generate manuals for all machine IDs found
manuals = [generate_manual(mid) for mid in machine_ids]
df_manuals = pd.DataFrame(manuals)

# Store in CrateDB
df_manuals.to_sql("machine_manuals", con=engine, if_exists="append", index=False)
print(f"✅ Stored manuals for {len(df_manuals)} machines in 'machine_manuals'.")
_ = connection.execute(sa.text("REFRESH TABLE machine_manuals;"))

### View a Random Machine Manual

Below is a randomly selected machine manual from the `machine_manuals` table.  
Each manual includes operational guidelines, maintenance schedules, and emergency protocols — all of which can be referenced by the assistant in later steps.

In [None]:
from IPython.display import display, HTML

# Step 1: Load a random manual's content
manual = pd.read_sql(
    "SELECT machine_id, manual FROM machine_manuals ORDER BY RANDOM() LIMIT 1",
    con=engine,
)
machine_id = manual.iloc[0]["machine_id"]
manual_text = manual.iloc[0]["manual"]

# Step 2: Display in scrollable, formatted box
display(
    HTML(
        f"""
<h4>📘 Manual for Machine ID: {machine_id}</h4>
<div style="border:1px solid #ccc; padding:10px; max-height:400px; overflow:auto; font-family:monospace; white-space:pre-wrap;">
{manual_text}
</div>
"""
    )
)

### Context-Aware Assistant Using Data + Manuals

We can now ask our assistent questions involving both telemetry data (`motor_readings`) and manual guidance (`machine_manuals`).

This allows it to:
- Detect anomalies
- Reference emergency protocols or limits
- Provide maintenance guidance

This is your main interaction point with the assistant. Just type in a natural language question, and the assistant will:
- Analyze your question
- Decide if telemetry data or manual context is needed
- Generate and run one or more SQL queries
- Explain the results in plain language
- Optionally summarize emergency protocols from manuals


Here are some example queries and what kind of answers you can expect:

| Question                                                                 | Assistant Behavior                                                                 |
|--------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| What was the average temperature for machine 3 last week?                | Retrieves average temperature from telemetry and explains the result.              |
| Is machine 4 overheating?                                                | Checks the latest temperature and compares it with manual thresholds if present.   |
| What should I do if machine 2 has an anomaly?                            | Loads the manual for machine 2 and summarizes anomaly and emergency protocols.     |
| Give me the max and min vibration for machine 6 when rotations > 1600.   | Executes a filtered SQL query and summarizes the max/min vibration values.         |
| Show me the maintenance steps for machine 1.                             | Extracts and summarizes the maintenance section from the manual.                   |
| Is machine 5's most recent temperature still ok according to the manual? | Retrieves the latest temperature from telemetry and correlates it with the manual. |

In [None]:
# fmt: off
question = "Show me the maintenance schedule for machine 5. Retrieve the manual column from the machine_manuals table and extract the maintenance schedule from its content."
# question = "Is machine 4 overheating?"
# question = "Give me the max and min vibration observed for machine 6 when rotations > 1600. Rotations are stored in the rotations column."
# question = "What should I do if machine 2 has an anomaly?"
# question = "What can be a reason for higher than usual values for the column vibration in motor_readings for machine 2?"
# question = "Is machine 5's most recent temperature still ok according to the manual? Look up the most recent temperature from motor_readings."
# fmt: on

print(await query_agent(question))

--- 
## Step 8: Recap & Lessons Learned

Congratulations on completing the Timeseries QA with LLMs & Manuals workshop!
Let’s wrap things up with a quick recap of what we’ve achieved and the key takeaways.

### Workshop Recap

Over the course of this notebook, you:
- Set up your environment and connected to CrateDB, a powerful distributed SQL database optimized for timeseries data.
- Generated and loaded synthetic telemetry data simulating real-world sensor readings from industrial machines.
- Built a natural language interface with an LLM to convert plain English into CrateDB-compatible SQL queries.
- Visualized time-series data directly from natural language questions, bringing clarity to trends, outliers, and anomalies.
- Defined an inline AI agent, communicating with CrateDB through an MCP server.
- Generated structured machine manuals, complete with thresholds, anomaly rules, and emergency procedures.
- Merged telemetry with manual context, allowing a single question to yield data-driven insights and operational guidance.

### Key Lessons Learned

| Skill                                | What You Practiced                                                                     |
|--------------------------------------|----------------------------------------------------------------------------------------|
| **LLM Prompt Engineering**           | How to design system prompts and templates that turn natural language into SQL.        |
| **Table-Augmented Generation (TAG)** | Augmenting LLMs with live table schemas to improve query accuracy.                     |
| **Timeseries Analysis**              | Using SQL and pandas to inspect, filter, and visualize sensor data over time.          |
| **AI agents**                        | The key components of an agent, including action groups and MCP servers.               |
| **CrateDB Features**                 | Leveraging a scalable time-series database with SQL support.                           |
| **RAG-like Patterns**                | Combining structured telemetry with unstructured manuals for richer QA experiences.    |


🙌 Thanks for Participating!

We hope this workshop has inspired you to combine structured data, unstructured manuals, and language models in powerful new ways.