-
Notifications
You must be signed in to change notification settings - Fork 51
feat: planning task and megamind agent #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jmatejcz
merged 32 commits into
development
from
jm/feat/warehouse-task-and-megamind-agent
Sep 12, 2025
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
ccc8459
feat: add bench for demo and supervisor agent
jmatejcz d96b113
feat: add validator to demo task
jmatejcz 38b3799
feat: customized planner supervisor
jmatejcz cfd4313
refactor: moved planner supoervisor agent to rai core
jmatejcz 8e386b0
feat: added megamind
jmatejcz 43f9cef
refactor: level arg to logger
jmatejcz 1eaed3e
feat: modifications to megamind
jmatejcz 7941b41
fix: validators in SortTask fix
jmatejcz 7ca6c75
style: formatting fixed
jmatejcz b331e58
refactor: updated prompts
jmatejcz 9ea3239
style: remove unused code
jmatejcz 68ee2a5
style: remove commented code
jmatejcz cfef6a7
style: change file name
jmatejcz 100e409
refactor: move plan agent to rai core
jmatejcz 3cae02b
style: format
jmatejcz b2633e7
style: names change
jmatejcz 1249161
style: renamed example file
jmatejcz 44d4a6d
fix: import fix
jmatejcz 01cb10d
refactor: remove examples from megamind prompt
jmatejcz 658d1ce
style: applied requested changes
jmatejcz ab78de7
refactor: added toolrunner for subagents
jmatejcz 6d0acc4
refactor: megamind into generic class and fucntion
jmatejcz 6b8b683
chore: delete planner supervisor
jmatejcz 84205ef
fix: remove left import
jmatejcz 2123d25
refactor: change to gpts in example
jmatejcz 6c8f224
refactor: recurssion limit adjutment
jmatejcz 5ad369e
fix: deafult langfuse to false
jmatejcz a8b3b5d
refactor: move langgraph agents to core
jmatejcz 7ce2cf5
refactor: remove megamind class
jmatejcz f365ecd
style: applied requested changes
jmatejcz ac60d58
feat: introduced planning prompt
jmatejcz c98a3f7
feat: added helper funtions
jmatejcz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,7 +31,6 @@ | |
model_name=args.model_name, | ||
vendor=args.vendor, | ||
) | ||
|
||
run_benchmark( | ||
llm=llm, | ||
out_dir=experiment_dir, | ||
|
100 changes: 100 additions & 0 deletions
100
src/rai_bench/rai_bench/examples/tool_calling_custom_agent.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Copyright (C) 2025 Robotec.AI | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import logging | ||
import uuid | ||
from datetime import datetime | ||
from pathlib import Path | ||
|
||
from rai.agents.langchain.core import ( | ||
Executor, | ||
create_megamind, | ||
get_initial_megamind_state, | ||
) | ||
|
||
from rai_bench import ( | ||
define_benchmark_logger, | ||
) | ||
from rai_bench.tool_calling_agent.benchmark import ToolCallingAgentBenchmark | ||
from rai_bench.tool_calling_agent.interfaces import TaskArgs | ||
from rai_bench.tool_calling_agent.tasks.warehouse import SortingTask | ||
from rai_bench.utils import get_llm_for_benchmark | ||
|
||
if __name__ == "__main__": | ||
now = datetime.now() | ||
out_dir = f"src/rai_bench/rai_bench/experiments/tool_calling/{now.strftime('%Y-%m-%d_%H-%M-%S')}" | ||
experiment_dir = Path(out_dir) | ||
experiment_dir.mkdir(parents=True, exist_ok=True) | ||
bench_logger = define_benchmark_logger(out_dir=experiment_dir, level=logging.DEBUG) | ||
|
||
task = SortingTask(task_args=TaskArgs(extra_tool_calls=50)) | ||
task.set_logger(bench_logger) | ||
|
||
supervisor_name = "gpt-4o" | ||
|
||
executor_name = "gpt-4o-mini" | ||
model_name = f"supervisor-{supervisor_name}_executor-{executor_name}" | ||
supervisor_llm = get_llm_for_benchmark(model_name=supervisor_name, vendor="openai") | ||
executor_llm = get_llm_for_benchmark( | ||
model_name=executor_name, | ||
vendor="openai", | ||
) | ||
|
||
benchmark = ToolCallingAgentBenchmark( | ||
tasks=[task], | ||
logger=bench_logger, | ||
model_name=model_name, | ||
results_dir=experiment_dir, | ||
) | ||
manipulation_system_prompt = """You are a manipulation specialist robot agent. | ||
Your role is to handle object manipulation tasks including picking up and droping objects using provided tools. | ||
|
||
Ask the VLM for objects detection and positions before perfomring any manipulation action. | ||
If VLM doesn't see objects that are objectives of the task, return this information, without proceeding""" | ||
|
||
navigation_system_prompt = """You are a navigation specialist robot agent. | ||
Your role is to handle navigation tasks in space using provided tools. | ||
|
||
After performing navigation action, always check your current position to ensure success""" | ||
|
||
executors = [ | ||
Executor( | ||
name="manipulation", | ||
llm=executor_llm, | ||
tools=task.manipulation_tools(), | ||
system_prompt=manipulation_system_prompt, | ||
), | ||
Executor( | ||
name="navigation", | ||
llm=executor_llm, | ||
tools=task.navigation_tools(), | ||
system_prompt=navigation_system_prompt, | ||
), | ||
] | ||
agent = create_megamind( | ||
megamind_llm=supervisor_llm, | ||
megamind_system_prompt=task.get_system_prompt(), | ||
executors=executors, | ||
task_planning_prompt=task.get_planning_prompt(), | ||
) | ||
|
||
experiment_id = uuid.uuid4() | ||
benchmark.run_next( | ||
agent=agent, | ||
initial_state=get_initial_megamind_state(task=task.get_prompt()), | ||
experiment_id=experiment_id, | ||
) | ||
|
||
bench_logger.info("===============================================================") | ||
bench_logger.info("ALL SCENARIOS DONE. BENCHMARK COMPLETED!") | ||
bench_logger.info("===============================================================") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also add a
planning_prompt
for megamind (example) and use it in strucuture_output_node to summarize the overall task progress based on steps done. In my experiments, this helps the decision-making for agent using qwen3 fairly consistently. This will be provided by bench task (example).Note, the planning_prompt didn't help agent with
gpt-4o-mini
that much, perhaps we need a different prompt for it. Not sure howgpt-4o
works for you, for me, the results vary a lot.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, i copied to this branch here ac60d58