Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 107 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<div align="center">

# AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
# AssetOpsBench: Benchmarking AI Agents for Industrial Asset Operations & Maintenance

![AssetOps](https://img.shields.io/badge/Domain-Asset_Operations-blue)
![MultiAgentBench](https://img.shields.io/badge/Domain-Multi--agent_Bench-blue)
Expand All @@ -9,37 +9,116 @@
![Mistral](https://img.shields.io/badge/Model-Mistral-21C2A4)
![Granite](https://img.shields.io/badge/Model-Granite-21C2A4)

📄 [Paper](https://arxiv.org/pdf/2506.03828), 🤗 [Huggingface](https://huggingface.co/papers/2506.03828), 📢 [Blog](https://research.ibm.com/blog/asset-ops-benchmark)
📄 [Paper](https://arxiv.org/pdf/2506.03828) | 🤗 [Huggingface](https://huggingface.co/papers/2506.03828) | 📢 [Blog](https://research.ibm.com/blog/asset-ops-benchmark)

</div>

## Introduction
AssetOpsBench is a unified framework and environment designed to guide the development, orchestration, and evaluation of domain-specific agents for task automation in industrial asset operations and maintenance. The release of the benchmark focuses on scenarios commonly posed by domain experts—such as maintenance engineers, reliability specialists, and facility planners. We devloped 4 individual domain-specific agents and 2 multi-agent orchestration frameworks to create a simulated industrial environment enabling end-to-end benchmarking of multi-agent workflows in asset operations.

## Datasets: 140+ Scenarios
AssetOpsBench created a collection of tasks that we call scenarios, which covers domains of IoT data retrieval (IoT), failure mode and sensor relation discovery (FSMR), time series anomaly detection (TSFM) and work order generation (WO). Some of the tasks are focused on solving problems in single domain, e.g. "List all sensors of Chiller 6 in MAIN site". Others are focused on end-to-end multi-step tasks, e.g. "What is the forecast for 'Chiller 9 Condenser Water Flow' in the week of 2020-04-27 based on data from the MAIN site?" All scenarios can be found [here](https://github.com/IBM/AssetOpsBench/tree/main/scenarios).

## AI Agents and Multi-agent Frameworks
We developed 4 domain-specific AI agents while each agent has its own agent tools to be invoked.
- IoT Agent: `get_sites`, `get_history`, `get_assets`, `get_sensors`, ...
- FMSR Agent: `get_sensors`, `get_failure_modes`, `get_failure_sensor_mapping`.
- TSFM Agent: `forecasting`, `timeseries_anomaly_detection`, ...
- WO Agent: `generate_word_order`

To orchestrate multiple agents and run end-to-end workflow, we developed two frameworks:
- [MetaAgent](https://github.com/IBM/AssetOpsBench/tree/main/src/meta_agent): a reAct based single-agent-as-tool agent
- [AgentHive](https://github.com/IBM/AssetOpsBench/tree/main/src/agent_hive): a plan-and-execute sequential workflow

## Leaderboards
We run AssetOpsBench with 7 Large Language Models and evaluate the trajectories of each run using LLM judge (Llama-4-Maverick-17B) on 6-dimentional criteria. The following is the result of MetaAgent. Please find more results in the paper.
---

## 📑 Table of Contents
1. [Announcements](#announcements)
2. [Introduction](#introduction)
3. [Datasets](#datasets-140-scenarios)
4. [AI Agents](#ai-agents)
5. [Multi-Agent Frameworks](#multi-agent-frameworks)
6. [System Diagram](#system-diagram)
7. [Leaderboards](#leaderboards)
8. [Docker Setup](#run-assetopsbench-in-docker)
9. [Talks & Events](#talks--events)
10. [External Resources](#external-resources)
11. [Contributors](#contributors)

---

## 📣 Announcements
- **2025-06-01**: AssetOpsBench v1.0 released with 140+ industrial scenarios.
- **2025-09-01**: [CODS](https://ikdd.acm.org/cods-2025/) Competition launched.
- **Upcoming Events**: *Tutorial at AAAI 2026* – Agents for Industry 4.0 Applications.
- Stay tuned for new tracks, competitions, and community events.
---

## 🏗️ Introduction
AssetOpsBench is a **unified framework for developing, orchestrating, and evaluating domain-specific AI agents** in industrial asset operations and maintenance.

It provides:
- 4 **domain-specific agents**
- 2 **multi-agent orchestration frameworks**

Designed for **maintenance engineers, reliability specialists, and facility planners**, it allows reproducible evaluation of multi-step workflows in simulated industrial environments.

---

## 📂 Datasets: 140+ Scenarios
AssetOpsBench scenarios span multiple domains:

| Domain | Example Task |
|--------|--------------|
| IoT | "List all sensors of Chiller 6 in MAIN site" |
| FSMR | "Identify failure modes detected by Chiller 6 Supply Temperature" |
| TSFM | "Forecast 'Chiller 9 Condenser Water Flow' for the week of 2020-04-27" |
| WO | "Generate a work order for Chiller 6 anomaly detection" |

Some tasks focus on a **single domain**, others are **multi-step end-to-end workflows**.
Explore all scenarios [here](https://github.com/IBM/AssetOpsBench/tree/main/scenarios).

---

## 🤖 AI Agents
### Domain-Specific Agents
- **IoT Agent**: `get_sites`, `get_history`, `get_assets`, `get_sensors`
- **FMSR Agent**: `get_sensors`, `get_failure_modes`, `get_failure_sensor_mapping`
- **TSFM Agent**: `forecasting`, `timeseries_anomaly_detection`
- **WO Agent**: `generate_work_order`

### Multi-Agent Frameworks
- **[MetaAgent](https://github.com/IBM/AssetOpsBench/tree/main/src/meta_agent)**: reAct-based single-agent-as-tool orchestration
- **[AgentHive](https://github.com/IBM/AssetOpsBench/tree/main/src/agent_hive)**: plan-and-execute sequential workflow

---

## 🖼️ System Diagram
Visual overview of AssetOpsBench workflow:

![System Diagram](path/to/system_diagram.png) <!-- Replace with your image path -->

---

## 🏆 Leaderboards
- Evaluated with **7 Large Language Models**
- Trajectories scored using **LLM Judge (Llama-4-Maverick-17B)**
- **6-dimensional criteria** measure reasoning, execution, and data handling

Example: MetaAgent leaderboard

![meta_agent_leaderboard](https://github.com/user-attachments/assets/615059be-e296-40d3-90ec-97ee6cb00412)

## Run AssetOpsBench in Docker
We provide a comprehensive documentation on how to run AssetOpsBench in a pre-built dockerized environment. Please refer to the [guidance](https://github.com/IBM/AssetOpsBench/tree/main/benchmark/README.md).
---

## 🐳 Run AssetOpsBench in Docker
- Pre-built Docker Images: `assetopsbench-basic` (minimal) & `assetopsbench-extra` (full)
- Conda environment: `assetopsbench`
- [Full setup guide](https://github.com/IBM/AssetOpsBench/tree/main/benchmark/README.md)

```bash
cd /path/to/AssetOpsBench
chmod +x benchmark/entrypoint.sh
docker-compose -f benchmark/docker-compose.yml build
docker-compose -f benchmark/docker-compose.yml up
```

---

## 🎤 Talks & Events
- **Workshops**: Participate in *GenAIBench-26* at AAAI 2025 focusing on multi-agent AI workflows.
- **Webinars & Seminars**: Learn best practices for industrial task automation with AI agents.
- **Competitions**: Benchmark your agents on real-world industrial scenarios using AssetOpsBench.

---

## Contributors and Contact(*)
## 🔗 External Resources
- 📄 **Paper**: [AssetOpsBench: Benchmarking AI Agents for Industrial Asset Operations](https://arxiv.org/pdf/2506.03828)
- 🤗 **HuggingFace**: [Scenario & Model Hub](https://huggingface.co/papers/2506.03828)
- 📢 **Blog**: [Insights, Tutorials, and Updates](https://research.ibm.com/blog/asset-ops-benchmark)
- 🎥 **Recorded Talks**: Link coming soon.

- Dhaval Patel (pateldha@us.ibm.com)
- Shuxin Lin
- James Rayfield
- Nianjun Zhou
---
7 changes: 7 additions & 0 deletions aaai_website/Part 1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# [Part 1] - Introduction and Overview

📂✨
## **Objectives of the Lab**


🔙 [Return to the main page](../)
7 changes: 7 additions & 0 deletions aaai_website/Part 2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# [Part 2] -

📂 ✨



🔙 [Return to the main page](../)
4 changes: 4 additions & 0 deletions aaai_website/Part 3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# [Part 3] - Addressing Data Scarcity and Improving the Quality


🔙 [Return to the main page](../)
6 changes: 6 additions & 0 deletions aaai_website/Part 4/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# [Part 3] - Model Selection, Development and Evaluation




🔙 [Return to the main page](../)
6 changes: 6 additions & 0 deletions aaai_website/Part 5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# [Part 5] - The Development and Use of Process Ontology




🔙 [Return to the main page](../)
92 changes: 92 additions & 0 deletions aaai_website/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# [AAAI 2026] From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0

### Introduction
Welcome to our interactive lab session at **AAAI 2026**! 🎉
This lab focuses on **Agentic AI** in Industry 4.0, where autonomous agents must reason across multimodal data sources (sensor streams, structured knowledge graphs, maintenance logs) and operate adaptively under uncertainty.

Participants will gain **hands-on experience** with data integration, benchmarking, and evaluation of multimodal agents. The lab bridges concepts of symbolic-neural integration, agent reasoning strategies, and real-world evaluation workflows. By the end of the session, attendees will leave with the practical skills and modular tools needed to build trustworthy, explainable, and deployable **Agentic AI systems**.

---

### Lab Schedule

#### **Part 1 – Introduction and Overview**
- Multimodal AI agents in Industry 4.0.
- Use cases and journey from inception to productization.
- Tutorial structure and objectives.

#### **Part 2 – Architectures for Multimodal Agents**
- Overview of agent designs (Plan-Execute, ReAct, Reflexion, RAFA).
- Hybrid symbolic + neural integration for interoperability and explainability.

#### **Part 3 – Lab Session 1: Addressing Data Silos**
- Hands-on session on multimodal data integration in hybrid multi-agent systems.
- Techniques for handling fragmented data and modality alignment.

#### **Part 4 – Governance and Operationalization**
- Best practices for deploying multimodal agents in real-world Industry 4.0.
- Governance, monitoring, observability, and traceability techniques.

#### **Part 5 – Lab Session 2: Evaluation Benchmarking**
- Hands-on session with **AssetOpsBench** and **SmartPilot**.
- Evaluation with human feedback, sensor-grounded metrics, and LLM-as-a-judge.

#### **Summary and Q&A**
- Wrap-up and open discussion with participants.

---

### **Prerequisites**
Participants should have:
- Basic understanding of AI/ML concepts.
- Familiarity with Python programming and libraries (PyTorch, TensorFlow, Scikit-learn).
- Interest in multimodal data, intelligent agents, and Industry 4.0.

---

### **Resources**
- [Slides](https://shorturl.at/s7QKA)
- [AssetOpsBench dataset on Hugging Face](https://huggingface.co/datasets/ibm-research/AssetOpsBench)
- [SmartPilot CoPilot System](https://github.com/ChathurangiShyalika/SmartPilot)
- [FailureSensorIQ dataset](https://github.com/IBM/FailureSensorIQ)

---

### **Contributors**
- **[Chathurangi Shyalika](https://www.linkedin.com/in/chathurangi-shyalika-1b89229b/)** (University of South Carolina)
- **[Saumya Ahuja](https://www.linkedin.com/in/saumyahuja/)** (IBM, WatsonX ASEAN)
- **[Shuxin Lin](https://www.linkedin.com/in/shuxin-lin/)** (IBM Research)
- **[Ruwan Wickramarachchi](https://ruwantw.github.io/)** (Bosch Center for AI)
- **[Dhaval Patel](https://www.linkedin.com/in/dhaval-patel-2b287033/)** (IBM Research)
- **[Amit Sheth](https://amit.aiisc.ai/)** (University of South Carolina)

---

### **Contact**
📎 **Chathurangi Shyalika**: [jayakodc@email.sc.edu](mailto:jayakodc@email.sc.edu)
📎 **Saumya Ahuja**: [saumya.ahuja@ibm.com](mailto:saumya.ahuja@ibm.com)
📎 **Shuxin Lin**: [shuxin.lin@ibm.com](mailto:shuxin.lin@ibm.com)
📎 **Ruwan Wickramarachchi**: [ruwan@email.sc.edu](mailto:ruwan@email.sc.edu)
📎 **Dhaval Patel**: [pateldha@us.ibm.com](mailto:pateldha@us.ibm.com)
📎 **Amit Sheth**: [amit@sc.edu](mailto:amit@sc.edu)

---

### **Contributing**
We welcome contributions and feedback to improve the lab and its materials. Please open an issue or PR in this repository.

---

### **Citation**
If you use materials from this lab, please cite as:

**Chathurangi Shyalika, Saumya Ahuja, Shuxin Lin, Ruwan Wickramarachchi, Dhaval Patel, & Amit Sheth** (2026, January).
*From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0*. In AAAI Conference on Artificial Intelligence.

```bibtex
@inproceedings{agenticai2026lab,
title={From Inception to Productization: Hands-on Lab for the Lifecycle of Multimodal Agentic AI in Industry 4.0},
author={Shyalika, Chathurangi and Ahuja, Saumya and Lin, Shuxin and Wickramarachchi, Ruwan and Patel, Dhaval and Sheth, Amit},
booktitle={AAAI Conference on Artificial Intelligence},
year={2026}
}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added aaai_website/figs/AIISC_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/Amit_Sheth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/Chathurangi_Shyalika.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/Dhaval_Patel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/Ruwan_Wickramarachchi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/USClogo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/aaai-logo-RGB.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/ibm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/ibm1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/saumya_ahuja.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added aaai_website/figs/shuxin_lin.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading