feat: planning task and megamind agent #679

jmatejcz · 2025-09-01T13:24:52Z

Purpose

Add new task that will demand planning, replanning and remembering.
Develop and test agents that will be better performers in such tasks than just a regular react agent

Proposed Changes

New type task to tool calling benchmark task SortTask, it is complicated and long, no need for more for now.

Added new agents:

Plan&Execute - classic implementation
PlannerSupervisor - a mix of supervisor and plan&execute
Megamind - Best agent i developed so far (testing on this task)
It has subagents like supervisor, after which there is a structre output attacher which determines if a step has been done successfully and explains what happened. That is fed back to megamind node.
Megamind node itself is like a planner but instead of developing plan once at the start it develops just next step and takes into consideration the response from output attacher, previous step and overall objective.

It performs better on replanning than agent desinged for that - Plan&Execute

Additional small things:

added level param to bench logger
passing initial state to bench.run_next so it is possible to run different agents

THE RESULTS OF TESTS ARE IN COMMENTS.
the commits linked in results could be outdated as it was done some time ago

Issues

#642

Testing

I recommend using langfuse, without it you be able to see what the agent is doing

python3 src/rai_bench/rai_bench/examples/tool_calling_custom_agent.py

jmatejcz · 2025-09-01T13:46:44Z

Manipulation O3DE Benchmark

TESTING NEW AGENTS

Testing new agents.

Aim is to modify the plan and execute agent so it can do better.

I test agents on 10 trivial tasks, single repeat. I know 10 scenarios are not the most representative, but more samples would take forever with this much different setups. LLMs were run on CPU.

For reference these are results of simple react agent and standard plan and execute agent from https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/plan-and-execute/plan-and-execute.ipynb.
The system prompt is standard from our demo:

You are a robotic arm with interfaces to detect and manipulate objects.
Here are the coordinates information:
x - front to back (positive is forward)
y - left to right (positive is right)
z - up to down (positive is up)
Before starting the task, make sure to grab the camera image to understand the environment.

agent	success rate	avg_time(on CPU)
Conversational(gpt-4o)	90%	28.9
Conversational(qwen3:8b)	70%	32.2
Conversational(qwen3:14b)	70%	77.9
Plan&Execute(plan/replan=gpt-4o, execute=qwen3:14b[no thinking])	0.0	timeout
Plan&Execute(plan/replan=qwen3:14b[no thinking], execute=qwen3:8b[no thinking])	0.0	timeout

The standard agent from the link does not wotk as it is supposed to, because the executor tries to do many steps at once, returns only last message as past_steps, which confuses the replanner. Replanner does not remove the 1st step, so executor gets it again and so on… i, version→c393921

I modified the agent slightly, so that after every executor invoke the 1st step is removed from the plan, 66caef9. But still timeouts and recursion limits occured as executor wants to do everything

The main issues i noticed:

planner node make steps that do not correspond to tool available:

often the executor node tries to complete whole plan even when asked to do only 1 step

replanner gets the last message outputed by executor - in case when executor did couple steps replanner won’t even know this. Second issue is that the message outputed by executor can be halucinated. Often times replanner and agent get looped on the same step as replanner does not understand that a step was already done
executor is a model with no vision capabilities so step of recognizing image(which is suggested in system prompt) is not doable

Solutions to try:

bound tools to planner and adjust prompt so it can come up with a plan suitable for given tools
pass only the task that is supposed to be done to executor
structured output from executor with field success, then replace replanner with simple if statement.
examples of tools calls in system prompt to executor
examples of a good plan to planner
add 2nd executor with vision model to understand the camera image

First I tried passing only the step that is supposed to be done:

It helped, executor now in majority of cases calls tool to the task, sometimes still calls excessive tools, but less often, version → 6d57ad9

Then i bounded tools to planner and changed the prompt, version→8c9d11b

Plan is shorter but still to long, the manipulation tool is spliited into 2 steps, example 3 and 4 or 5 and 6. This can be an issue with tool description,

I will try MoveFromToTool instead of MoveToPointTool which is easier to use. 91dd2ce
Plan still is too fragmented, 1 tool call worth of action is spread across 2-3 steps. I will try adding examples of a good plan to planner

jmatejcz · 2025-09-01T13:47:29Z

WAREHOUSE TASK

Mock of a warehouse state was created to test agents and models in scenario somewhat similar to the use case of demo, check out src/rai_bench/rai_bench/examples/tool_calling_demo.py
and
https://github.com/RobotecAI/rai/blob/jm/feat/demo-bench/src/rai_bench/rai_bench/tool_calling_agent/tasks/demo.py

There is state of env that keeps track of robot and objects positions. based on that state tools return output. Tools are very simple mocks of navigation, detection and manipulation tools.

DEVELOPING AGENTS

I tried to develop new agent that will perform better in this scenario. I began with classic plan&execute agent, but it had issues with reflecting if step was done correctly.

I tested also the classic supervisor agent but it lacked planning capabilities.

Then i started to modify it by adding and removing different nodes. It resulted in plan & supervisor agent which is a hybrid - planner node makes plan at the beggining, then supervisor delegate each step to certain executor node (like navigation agent). After executor node there is structure output node which asseses if step was completed (bool). This info is brought back to supervisor, if success go with next step, if not repeat(maybe change somth). The problem is that supervisor doesn't have replan capabilities, which is a problem when a step depends on what happened in prevoius steps. https://github.com/RobotecAI/rai/blob/jm/feat/demo-bench/src/rai_core/rai/agents/langchain/planner_supervisor.py

I tried to introduce more nodes to split the decision making even more, but i felt like more nodes made making good decision on the next step harder. All these conclusions brought me to another idea - MegaMind B)

MegaMind merges supervisor planner and replanner into one node. Instead of making whole plan at the start it makes only first step, delegates it to proper executor, then structure output node marks success and provides brief explanation what was done. This explanation is then saved and provided to megamind node as step done already. Based on that megamind comes up with the next step. verision f353762c9886e74d30d10a97f8954ae6296fe9afhttps://github.com/RobotecAI/rai/blob/jm/feat/demo-bench/src/rai_core/rai/agents/langchain/megamind.py

With a lot of iteration of prompts adjustment this is the best performing agent so far. To be fair i tested all previous agent mostly with gpt-4o, but i assume if llm is better with certain agent the rest of models will also be. gpt was performing promising on Megamind so i decided to test other models on this agent. It is worth noting that executor llm is qwen3:8b but it is not a problem as tools are very simple.

MEGAMIND TESTING

running on GPU 16GB VRAM, if model does not fit in the VRAM, its partially offloaded to CPU

gpt-4o-mini - quite bad, gets stuck after around 4-5 steps, keeps doing same steps in the loop

qwen3:30b-a3b-instruct-2507-q8_0 - better, but sometimes forgets to drop before picking up another and then starts to make mistakes - can’t salvage the scenario when mistake was made before. Sometimes manages to do objective properly. Don’t know when to stop after it. Takes from 8 to 15sec per step*

gpt-oss:20b (partially on CPU) - slightly better then gpt-4o-mini, but still quite bad. in thinking mode takes from 45-60 seconds and sometimes just generate text instead of calling tool. When thinking switched off takes 20-60 seconds and uses tools properly. In both cases after around 8-10 steps starts to struggle. when navigating, keeps coming up with coords, that are slightly off which is weird (probably prompt related)

qwen3:30b (partailly on CPU) - (thinking mode cannot be switched off?) sometimes it just outputs think block without the tool call, ending the scenario so hard to evaluate

qwen3:14b - 4-8 seconds per step, gets stuck after ~10 steps, starts to loop around navigating back and forth

Overall all models when they get stuck or even qwen3:30b-a3b-instruct-2507-q8_0 when manages to do whole sequence properly, keeps going until recurssion limit. They do not recognize that the objective was done, maybe it can be fixed via prompts.

Juliaj · 2025-09-04T05:13:22Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+        robot_x, robot_y = self.get_position()
+        # Find which box we're dropping into
+        for box_id, box_data in self._boxes.items():
+            if relative_pos == box_data["relative"]:


Based on my observations, this check failed most of the time.

Just for my curiosity on why this requires relative position. Is it coming from VLM for real world application ? For qwen3:8b, you can see agent rationalizes how to deal with this relative position, it was kinda funny.

it is relative to the robot, as manipulation operates on coordinates relative to the arm.

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

Juliaj · 2025-09-04T05:17:09Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+                        box_data["world_position"][1] + relative_pos[1],
+                    )
+
+        self._state["held_object"] = None


This is a bit unexpected. Basically, as long as drop_object is called, we consider the object dropped regardless whether there is any box found to drop it.

this variable says if there is object currently held in the gripper, regardles where it was dropped or even if the object was in the gripper before this function

Juliaj · 2025-09-04T05:34:52Z

src/rai_core/rai/agents/langchain/megamind.py

+        [
+            SystemMessage(
+                content=f"""
+Analyze if this task was completed successfully:


This is very helpful for the agent to track the progress. As an exploration idea, what would you think to introduce a status_summary_prompt_template that tasks could optionally provide for more structured status tracking ? For instance, in the warehouse scenario, such a template might help the agent better understand slot-level progress. Whether the slot has been checked, object picked up and sorted etc.

that sounds like a good idea, I will test it when i have time or maybe you want to test it?

See my comments below on introducing planning_prompt.

Juliaj · 2025-09-04T05:46:25Z

src/rai_core/rai/agents/langchain/megamind.py

+For example for the navigation tasks include the final location.
+For manipulation the coordinates of objects that have been detected, picked up from or dropped to.
+Below you have messages of agent doing the task:"""
+            ),


BTW, I changed this prompt to "Determine success and provide brief explanation of what happened by slot, for example Slot 2: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING.
Below you have messages of agent doing the task:"

This is the output from qwen3:8b

Steps that were already done successfully: 1. Slot 1: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING 2. Slot 1: [DETECTED] Object status: BLUE - IDENTIFIED 3. Slot 1: [PICKED UP] Object status: SUCCESSFUL - Object was picked up and transported to the first box. 4. Slot 2: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING. 5. Slot 2: [OBJECT_DETECTED] Object status: RED - NO ACTION REQUIRED 6. Slot 3: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING 7. Slot 3: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING. 8. Slot 3: [PICKED UP] Object status: GREEN - [DROPPED] in second box at x: 5.0, y: 5.0. 9. Slot 4: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING 10. Slot 4: [SUCCESS] Object status: GREEN - DROPPED IN SECOND BOX

Oh, that seems very helpful for the megamind node. It has to be adjusted to every sceanrio though. I can't put it in framework

Yep, above example was just to experiment with a structured status template and see the effect of it. It is scenario dependent which we could dynamically construct if task/scenario provides it.

I tried something similar recently and it was giving the agent current info about all slots in prompt every time it was his time to think and it worked! I think it can be helpful as agent can understand the "map" of environment better

Juliaj · 2025-09-05T05:19:40Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+        return [
+            self.pick_up_object,
+            self.drop_object,
+            self.ask_vlm,


Based on my experiments, adding nav_tool and where_am_i to manipulation_tools improved results with Qwen3:8b. The issue was that once an object is picked up, Megamind was unable to transfer control back to the navigation specialist. This resulted in drop_object being called before navigating to the correct box location.

yes, i encountered this issue to. I think it is hard for him to completed a sequence like pick , navigate and drop when these are in other agents. On the other hand in real world scenario, manipulation is a lot harder and i still think that having separate agent for that would be better. But I agree, when tools are simple, we can just make "movement agent". I don't know how to make megamind better at executing sequences between different subagents. Maybe combining it with status prompts, like: Currently holding: Box, or something like that

Juliaj · 2025-09-05T05:24:33Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+                },
+            )
+            validators = [
+                #### navigate to slot1, detect and pick up 1st object


It is very challenging to write validators that work in non-deterministic and hard scenario like this one.

I agree, but there is a optimal execution flow and even sometimes qwen3 a3b, managed to do it ;p

jmatejcz · 2025-09-11T18:11:03Z

i refactored megamind to be more generic, like we talked @boczekbartek, i made both class and function, because i didn't know which way is better - so tell me what you like more and i will delete the other

Juliaj · 2025-09-11T18:30:46Z

src/rai_bench/rai_bench/examples/tool_calling_custom_agent.py

-            "step_success": StepSuccess(success=False, explanation=""),
-            "step_messages": [],
-        }
+        megamind_system_prompt=task.get_system_prompt(),


We could also add a planning_prompt for megamind (example) and use it in strucuture_output_node to summarize the overall task progress based on steps done. In my experiments, this helps the decision-making for agent using qwen3 fairly consistently. This will be provided by bench task (example).

Note, the planning_prompt didn't help agent with gpt-4o-mini that much, perhaps we need a different prompt for it. Not sure how gpt-4o works for you, for me, the results vary a lot.

I like this idea, i copied to this branch here ac60d58

Juliaj · 2025-09-11T18:40:50Z

src/rai_core/rai/agents/langchain/core/tool_runner.py

+        self, input: dict[str, Any], outputs: List[Any]
+    ) -> None:
+        input["messages"].extend(outputs)
+        input["step_messages"].extend(outputs)


These will be available to both megamind and sub-task agents. Do we have concerns on providing "too" much context, i.e. context pollution ? ATM, the megamind is structured for task planning and specialists carry out actual steps. IMO, more focused context could help decision-making :). Do you have any observations how context affects agent behavior?

These messages are indeed passed to State, but megamind llm, does not have them in its prompt, so i m not sure what do you mean by context in this case.

Juliaj · 2025-09-11T18:43:03Z

src/rai_bench/rai_bench/examples/tool_calling_custom_agent.py

+    model_name = f"supervisor-{supervisor_name}_executor-{executor_name}"
+    supervisor_llm = get_llm_for_benchmark(model_name=supervisor_name, vendor="ollama")
+    executor_llm = get_llm_for_benchmark(
+        model_name=executor_name, vendor="ollama", reasoning=False


We'd want to take out reasoning=False. Like you said, we can't turn off qwen3 thinking, even with this. This code will fail for gpt-4o-*.

reasoning = False works for smaller qwen3, like 4b or 8b. It doesn't work for example with bigger 30b, idk why.
But this is a example script, if you want to use gpt, just replace these lines. Maybe you are right that by default there should be model which does not use this param as this might introduce confusion to new user. I applied it here 2123d25

Juliaj · 2025-09-11T18:45:41Z

Sorry, my bad, mistakenly close this PR by clicking on the wrong button.

Juliaj

As the code stands now, it looks solid overall! I have two minor suggestions which could be addressed in subsequent PRs:

The SortTask validators might be too strict for tool calling validation, which causes the scoring to be inaccurate.
Adding some validation for the megamind decision-making process would be a nice enhancement

Juliaj · 2025-09-11T19:24:43Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+        robot_x, robot_y = self.get_position()
+        # Find which box we're dropping into
+        for box_id, box_data in self._boxes.items():
+            if relative_pos == box_data["relative"]:


Just for my curiosity on why this requires relative position. Is it coming from VLM for real world application ? For qwen3:8b, you can see agent rationalizes how to deal with this relative position, it was kinda funny.

Juliaj · 2025-09-11T19:25:22Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+        }
+
+
+class SortTask(Task):


An alternative name is SortingTask.

@jmatejcz +1

applied here ac60d58

Juliaj · 2025-09-11T19:32:25Z

src/rai_bench/rai_bench/tool_calling_agent/tasks/warehouse.py

+        ]
+
+    def get_system_prompt(self) -> str:
+        return SYSTEM_PROMPT + "\n" + WAREHOUSE_ENVIRONMENT_DESCRIPTION


I also added a couple of helper functions for my experiments that might be useful to incorporate, your choice.

report_sorting_status - outputs the sorted objects from boxes

get_slot_position - helps the agent get back on track.

yea sure, there will be useful for sure, added them here c98a3f7

boczekbartek

@jmatejcz Thank you! Good job! I made and initial check of the PR. Please check my comments. I'll continue my review.
Also big thanks to @Juliaj for your comments!

config.toml

src/rai_bench/rai_bench/tool_calling_agent/benchmark.py

src/rai_core/rai/agents/langchain/__init__.py

boczekbartek · 2025-09-12T07:42:20Z

@jmatejcz

i refactored megamind to be more generic, like we talked @boczekbartek, i made both class and function, because i didn't know which way is better - so tell me what you like more and i will delete the other

Please use the function convention and move the pure langgraph agent to src/rai_core/rai/agents/langchain/core module.
src/rai_core/rai/agents/langchain/core contains pure langgraph agents
src/rai_core/rai/agents/langchain contains agents that can communicate with the world using HRIMessage

Also it would be good to avoid making nested functions and move plan_step out of create_megamind

moved plan step outside create funtion

jmatejcz · 2025-09-12T09:15:08Z

@jmatejcz

i refactored megamind to be more generic, like we talked @boczekbartek, i made both class and function, because i didn't know which way is better - so tell me what you like more and i will delete the other

Please use the function convention and move the pure langgraph agent to src/rai_core/rai/agents/langchain/core module. src/rai_core/rai/agents/langchain/core contains pure langgraph agents src/rai_core/rai/agents/langchain contains agents that can communicate with the world using HRIMessage

Also it would be good to avoid making nested functions and move plan_step out of create_megamind

thanks for info, applied here: 7ce2cf5

boczekbartek

@jmatejcz I left some minor comments - please have a look. Overall the PR looks good and I think it's ready to be merged.

I tested with the tool_calling_custom_agent.py and I got score:

TASK SCORE: 0.2, TOTAL TIME: 50.188
```.

boczekbartek · 2025-09-12T12:53:23Z

src/rai_core/rai/agents/langchain/core/plan_agent.py

+        )
+        return {
+            "past_steps": [(task, agent_response["messages"][-1].content)],
+            # "plan": plan[1:],  # removing the step that was executed


remove comment if it's not needed

fixed here f365ecd

boczekbartek · 2025-09-12T12:53:33Z

src/rai_core/rai/agents/langchain/core/plan_agent.py

+
+    def execute_step(state: PlanExecuteState):
+        """Execute the current step of the plan."""
+        # TODO (jmatejcz) should we pass whole plan or only single the to the executor?


Is it TODO required?

fixed here f365ecd

boczekbartek · 2025-09-12T12:54:28Z

src/rai_core/rai/agents/langchain/core/plan_agent.py

+class PlanExecuteState(ReActAgentState):
+    """State for the plan and execute agent."""
+
+    # TODO (jmatejcz) should original_task be replaced with


how about this TODO?
Change to NOTE if it's just a mention for the future or remove

fixed here f365ecd

boczekbartek · 2025-09-12T12:54:41Z

src/rai_core/rai/agents/langchain/core/plan_agent.py

@@ -0,0 +1,277 @@
+# Copyright (C) 2024 Robotec.AI


Suggested change

# Copyright (C) 2024 Robotec.AI

# Copyright (C) 2025 Robotec.AI

fixed here f365ecd

jmatejcz · 2025-09-12T13:26:25Z

As the code stands now, it looks solid overall! I have two minor suggestions which could be addressed in subsequent PRs:
* The SortTask validators might be too strict for tool calling validation, which causes the scoring to be inaccurate.

* Adding some validation for the megamind decision-making process would be a nice enhancement

Thank you! I won't change the validation or add more tasks in this PR, it can be done in the future.
What do you mean by decision making validation?

Co-authored-by: Julia Jia"

jmatejcz · 2025-09-12T13:58:04Z

thank you @Juliaj for the attention and comments on this PR, im merging it, but we can continue discussion here or anywhere you like

…#620) feat: basic tasks extension (#644) feat: tool calling custom interfaces tasks extension (#636) feat: tool calling spatial reasoning tasks extension (#637) refactor: remove navigation tasks (#638) refactor: o3de config (#630) refactor(`nav2_toolkit`): remove unused `action_client` (#670) fix: manipulaiton bench fixes (#653) docs: rai simbench docs update (#665) feat: planning task and megamind agent (#679) feat: megamind context providers (#687) feat: tool calling bench - manipulation tasks extenstion (#656) chore: resolving conflicts (#690) Co-authored-by: Julia Jia <juliajster@gmail.com> Co-authored-by: Magdalena Kotynia <magdalena.kotynia@robotec.ai> Co-authored-by: Maciej Majek <46171033+maciejmajek@users.noreply.github.com> Co-authored-by: Pawel Kotowski <pawel.kotowski@oleahealth.ai> Co-authored-by: Brian Tuan <btuan@users.noreply.github.com>

feat: tool calling benchmark unified across types and prompts variety… (#620) feat: basic tasks extension (#644) feat: tool calling custom interfaces tasks extension (#636) feat: tool calling spatial reasoning tasks extension (#637) refactor: remove navigation tasks (#638) refactor: o3de config (#630) refactor(nav2_toolkit): remove unused action_client (#670) fix: manipulaiton bench fixes (#653) docs: rai simbench docs update (#665) feat: planning task and megamind agent (#679) feat: megamind context providers (#687) feat: tool calling bench - manipulation tasks extenstion (#656) chore: resolving conflicts (#690) Co-authored-by: Jakub Matejczyk <58983084+jmatejcz@users.noreply.github.com> Co-authored-by: Julia Jia <juliajster@gmail.com> Co-authored-by: Magdalena Kotynia <magdalena.kotynia@robotec.ai> Co-authored-by: Pawel Kotowski <pawel.kotowski@oleahealth.ai> Co-authored-by: Brian Tuan <btuan@users.noreply.github.com> Co-authored-by: jmatejcz <jakub.matejczyk@robotec.ai>

jmatejcz marked this pull request as ready for review September 1, 2025 13:49

jmatejcz requested review from maciejmajek, boczekbartek and Juliaj September 1, 2025 13:49

jmatejcz changed the title ~~feat: warehouse task and megamind agent~~ feat: planning task and megamind agent Sep 1, 2025

Juliaj reviewed Sep 4, 2025

View reviewed changes

jmatejcz requested a review from Juliaj September 4, 2025 12:20

Juliaj reviewed Sep 5, 2025

View reviewed changes

jmatejcz added 18 commits September 11, 2025 20:11

feat: add bench for demo and supervisor agent

ccc8459

feat: add validator to demo task

d96b113

feat: customized planner supervisor

38b3799

refactor: moved planner supoervisor agent to rai core

cfd4313

feat: added megamind

8e386b0

refactor: level arg to logger

43f9cef

feat: modifications to megamind

1eaed3e

fix: validators in SortTask fix

7941b41

style: formatting fixed

7ca6c75

refactor: updated prompts

b331e58

style: remove unused code

9ea3239

style: remove commented code

68ee2a5

style: change file name

cfef6a7

refactor: move plan agent to rai core

100e409

style: format

3cae02b

style: names change

b2633e7

style: renamed example file

1249161

fix: import fix

44d4a6d

Juliaj reviewed Sep 11, 2025

View reviewed changes

Juliaj closed this Sep 11, 2025

Juliaj reopened this Sep 11, 2025

Juliaj reviewed Sep 11, 2025

View reviewed changes

boczekbartek requested changes Sep 12, 2025

View reviewed changes

config.toml Outdated Show resolved Hide resolved

src/rai_bench/rai_bench/tool_calling_agent/benchmark.py Outdated Show resolved Hide resolved

src/rai_core/rai/agents/langchain/__init__.py Outdated Show resolved Hide resolved

jmatejcz added 6 commits September 12, 2025 10:36

fix: remove left import

84205ef

refactor: change to gpts in example

2123d25

refactor: recurssion limit adjutment

6c8f224

fix: deafult langfuse to false

5ad369e

refactor: move langgraph agents to core

a8b3b5d

refactor: remove megamind class

7ce2cf5

moved plan step outside create funtion

jmatejcz requested a review from boczekbartek September 12, 2025 09:15

boczekbartek approved these changes Sep 12, 2025

View reviewed changes

style: applied requested changes

f365ecd

jmatejcz added 2 commits September 12, 2025 15:47

feat: introduced planning prompt

ac60d58

feat: added helper funtions

c98a3f7

Co-authored-by: Julia Jia"

jmatejcz merged commit 7d4d136 into development Sep 12, 2025
6 checks passed

jmatejcz deleted the jm/feat/warehouse-task-and-megamind-agent branch September 12, 2025 14:00

maciejmajek mentioned this pull request Sep 15, 2025

chore: sync development -> main #691

Merged

jmatejcz mentioned this pull request Sep 18, 2025

feat: New agent for smaller models #642

Closed

	# Copyright (C) 2024 Robotec.AI
	# Copyright (C) 2025 Robotec.AI

feat: planning task and megamind agent #679

feat: planning task and megamind agent #679

Uh oh!

Conversation

jmatejcz commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Proposed Changes

Issues

Testing

Uh oh!

jmatejcz commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Manipulation O3DE Benchmark

TESTING NEW AGENTS

The main issues i noticed:

Solutions to try:

Uh oh!

jmatejcz commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WAREHOUSE TASK

DEVELOPING AGENTS

MEGAMIND TESTING

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmatejcz Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmatejcz Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmatejcz commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Juliaj commented Sep 11, 2025

Uh oh!

Juliaj left a comment

Choose a reason for hiding this comment

jmatejcz commented Sep 1, 2025 •

edited

Loading

jmatejcz commented Sep 1, 2025 •

edited

Loading

jmatejcz commented Sep 1, 2025 •

edited

Loading

jmatejcz Sep 4, 2025 •

edited

Loading

jmatejcz Sep 12, 2025 •

edited

Loading

boczekbartek commented Sep 12, 2025 •

edited

Loading