Skip to content

Conversation

jmatejcz
Copy link
Contributor

@jmatejcz jmatejcz commented Sep 1, 2025

Purpose

Add new task that will demand planning, replanning and remembering.
Develop and test agents that will be better performers in such tasks than just a regular react agent

Proposed Changes

New type task to tool calling benchmark task SortTask, it is complicated and long, no need for more for now.

Added new agents:

  • Plan&Execute - classic implementation
  • PlannerSupervisor - a mix of supervisor and plan&execute
  • Megamind - Best agent i developed so far (testing on this task)
    It has subagents like supervisor, after which there is a structre output attacher which determines if a step has been done successfully and explains what happened. That is fed back to megamind node.
    Megamind node itself is like a planner but instead of developing plan once at the start it develops just next step and takes into consideration the response from output attacher, previous step and overall objective.

It performs better on replanning than agent desinged for that - Plan&Execute

Additional small things:

  • added level param to bench logger
  • passing initial state to bench.run_next so it is possible to run different agents

THE RESULTS OF TESTS ARE IN COMMENTS.
the commits linked in results could be outdated as it was done some time ago

Issues

#642

Testing

  1. I recommend using langfuse, without it you be able to see what the agent is doing
python3 src/rai_bench/rai_bench/examples/tool_calling_custom_agent.py 

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Sep 1, 2025

Manipulation O3DE Benchmark

TESTING NEW AGENTS

Testing new agents.

Aim is to modify the plan and execute agent so it can do better.

I test agents on 10 trivial tasks, single repeat. I know 10 scenarios are not the most representative, but more samples would take forever with this much different setups. LLMs were run on CPU.

For reference these are results of simple react agent and standard plan and execute agent from https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/plan-and-execute/plan-and-execute.ipynb.
The system prompt is standard from our demo:

You are a robotic arm with interfaces to detect and manipulate objects.
Here are the coordinates information:
x - front to back (positive is forward)
y - left to right (positive is right)
z - up to down (positive is up)
Before starting the task, make sure to grab the camera image to understand the environment.
agent success rate avg_time(on CPU)
Conversational(gpt-4o) 90% 28.9
Conversational(qwen3:8b) 70% 32.2
Conversational(qwen3:14b) 70% 77.9
Plan&Execute(plan/replan=gpt-4o, execute=qwen3:14b[no thinking]) 0.0 timeout
Plan&Execute(plan/replan=qwen3:14b[no thinking], execute=qwen3:8b[no thinking]) 0.0 timeout

The standard agent from the link does not wotk as it is supposed to, because the executor tries to do many steps at once, returns only last message as past_steps, which confuses the replanner. Replanner does not remove the 1st step, so executor gets it again and so on… i, version→c393921

I modified the agent slightly, so that after every executor invoke the 1st step is removed from the plan, 66caef9. But still timeouts and recursion limits occured as executor wants to do everything

The main issues i noticed:

  • planner node make steps that do not correspond to tool available:
image
  • often the executor node tries to complete whole plan even when asked to do only 1 step
image
  • replanner gets the last message outputed by executor - in case when executor did couple steps replanner won’t even know this. Second issue is that the message outputed by executor can be halucinated. Often times replanner and agent get looped on the same step as replanner does not understand that a step was already done
  • executor is a model with no vision capabilities so step of recognizing image(which is suggested in system prompt) is not doable

Solutions to try:

  • bound tools to planner and adjust prompt so it can come up with a plan suitable for given tools
  • pass only the task that is supposed to be done to executor
  • structured output from executor with field success, then replace replanner with simple if statement.
  • examples of tools calls in system prompt to executor
  • examples of a good plan to planner
  • add 2nd executor with vision model to understand the camera image
  1. First I tried passing only the step that is supposed to be done:
image

It helped, executor now in majority of cases calls tool to the task, sometimes still calls excessive tools, but less often, version → 6d57ad9

  1. Then i bounded tools to planner and changed the prompt, version→8c9d11b
image Plan is shorter but still to long, the manipulation tool is spliited into 2 steps, example 3 and 4 or 5 and 6. This can be an issue with tool description,
  1. I will try MoveFromToTool instead of MoveToPointTool which is easier to use. 91dd2ce

  2. Plan still is too fragmented, 1 tool call worth of action is spread across 2-3 steps. I will try adding examples of a good plan to planner

@jmatejcz
Copy link
Contributor Author

jmatejcz commented Sep 1, 2025

WAREHOUSE TASK

Mock of a warehouse state was created to test agents and models in scenario somewhat similar to the use case of demo, check out src/rai_bench/rai_bench/examples/tool_calling_demo.py
and
https://github.com/RobotecAI/rai/blob/jm/feat/demo-bench/src/rai_bench/rai_bench/tool_calling_agent/tasks/demo.py

There is state of env that keeps track of robot and objects positions. based on that state tools return output. Tools are very simple mocks of navigation, detection and manipulation tools.

DEVELOPING AGENTS

I tried to develop new agent that will perform better in this scenario. I began with classic plan&execute agent, but it had issues with reflecting if step was done correctly.

I tested also the classic supervisor agent but it lacked planning capabilities.

Then i started to modify it by adding and removing different nodes. It resulted in plan & supervisor agent which is a hybrid - planner node makes plan at the beggining, then supervisor delegate each step to certain executor node (like navigation agent). After executor node there is structure output node which asseses if step was completed (bool). This info is brought back to supervisor, if success go with next step, if not repeat(maybe change somth). The problem is that supervisor doesn't have replan capabilities, which is a problem when a step depends on what happened in prevoius steps. https://github.com/RobotecAI/rai/blob/jm/feat/demo-bench/src/rai_core/rai/agents/langchain/planner_supervisor.py

I tried to introduce more nodes to split the decision making even more, but i felt like more nodes made making good decision on the next step harder. All these conclusions brought me to another idea - MegaMind B)

MegaMind merges supervisor planner and replanner into one node. Instead of making whole plan at the start it makes only first step, delegates it to proper executor, then structure output node marks success and provides brief explanation what was done. This explanation is then saved and provided to megamind node as step done already. Based on that megamind comes up with the next step. verision f353762c9886e74d30d10a97f8954ae6296fe9afhttps://github.com/RobotecAI/rai/blob/jm/feat/demo-bench/src/rai_core/rai/agents/langchain/megamind.py

With a lot of iteration of prompts adjustment this is the best performing agent so far. To be fair i tested all previous agent mostly with gpt-4o, but i assume if llm is better with certain agent the rest of models will also be. gpt was performing promising on Megamind so i decided to test other models on this agent. It is worth noting that executor llm is qwen3:8b but it is not a problem as tools are very simple.

MEGAMIND TESTING

running on GPU 16GB VRAM, if model does not fit in the VRAM, its partially offloaded to CPU

gpt-4o-mini - quite bad, gets stuck after around 4-5 steps, keeps doing same steps in the loop

qwen3:30b-a3b-instruct-2507-q8_0 - better, but sometimes forgets to drop before picking up another and then starts to make mistakes - can’t salvage the scenario when mistake was made before. Sometimes manages to do objective properly. Don’t know when to stop after it. Takes from 8 to 15sec per step*

gpt-oss:20b (partially on CPU) - slightly better then gpt-4o-mini, but still quite bad. in thinking mode takes from 45-60 seconds and sometimes just generate text instead of calling tool. When thinking switched off takes 20-60 seconds and uses tools properly. In both cases after around 8-10 steps starts to struggle. when navigating, keeps coming up with coords, that are slightly off which is weird (probably prompt related)

qwen3:30b (partailly on CPU) - (thinking mode cannot be switched off?) sometimes it just outputs think block without the tool call, ending the scenario so hard to evaluate

qwen3:14b - 4-8 seconds per step, gets stuck after ~10 steps, starts to loop around navigating back and forth

Overall all models when they get stuck or even qwen3:30b-a3b-instruct-2507-q8_0 when manages to do whole sequence properly, keeps going until recurssion limit. They do not recognize that the objective was done, maybe it can be fixed via prompts.

@jmatejcz jmatejcz marked this pull request as ready for review September 1, 2025 13:49
@jmatejcz jmatejcz changed the title feat: warehouse task and megamind agent feat: planning task and megamind agent Sep 1, 2025
robot_x, robot_y = self.get_position()
# Find which box we're dropping into
for box_id, box_data in self._boxes.items():
if relative_pos == box_data["relative"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my observations, this check failed most of the time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my curiosity on why this requires relative position. Is it coming from VLM for real world application ? For qwen3:8b, you can see agent rationalizes how to deal with this relative position, it was kinda funny.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is relative to the robot, as manipulation operates on coordinates relative to the arm.

box_data["world_position"][1] + relative_pos[1],
)

self._state["held_object"] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unexpected. Basically, as long as drop_object is called, we consider the object dropped regardless whether there is any box found to drop it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable says if there is object currently held in the gripper, regardles where it was dropped or even if the object was in the gripper before this function

[
SystemMessage(
content=f"""
Analyze if this task was completed successfully:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful for the agent to track the progress. As an exploration idea, what would you think to introduce a status_summary_prompt_template that tasks could optionally provide for more structured status tracking ? For instance, in the warehouse scenario, such a template might help the agent better understand slot-level progress. Whether the slot has been checked, object picked up and sorted etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a good idea, I will test it when i have time or maybe you want to test it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments below on introducing planning_prompt.

For example for the navigation tasks include the final location.
For manipulation the coordinates of objects that have been detected, picked up from or dropped to.
Below you have messages of agent doing the task:"""
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I changed this prompt to "Determine success and provide brief explanation of what happened by slot, for example Slot 2: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING.
Below you have messages of agent doing the task:"

This is the output from qwen3:8b

Steps that were already done successfully:
1. Slot 1: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING
2. Slot 1: [DETECTED] Object status: BLUE - IDENTIFIED
3. Slot 1: [PICKED UP] Object status: SUCCESSFUL - Object was picked up and transported to the first box.
4. Slot 2: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING.
5. Slot 2: [OBJECT_DETECTED] Object status: RED - NO ACTION REQUIRED
6. Slot 3: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING
7. Slot 3: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING.
8. Slot 3: [PICKED UP] Object status: GREEN - [DROPPED] in second box at x: 5.0, y: 5.0.
9. Slot 4: [NAVIGATED] Object status: UNKNOWN - NEEDS CHECKING
10. Slot 4: [SUCCESS] Object status: GREEN - DROPPED IN SECOND BOX

Copy link
Contributor Author

@jmatejcz jmatejcz Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that seems very helpful for the megamind node. It has to be adjusted to every sceanrio though. I can't put it in framework

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, above example was just to experiment with a structured status template and see the effect of it. It is scenario dependent which we could dynamically construct if task/scenario provides it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something similar recently and it was giving the agent current info about all slots in prompt every time it was his time to think and it worked! I think it can be helpful as agent can understand the "map" of environment better

@jmatejcz jmatejcz requested a review from Juliaj September 4, 2025 12:20
return [
self.pick_up_object,
self.drop_object,
self.ask_vlm,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my experiments, adding nav_tool and where_am_i to manipulation_tools improved results with Qwen3:8b. The issue was that once an object is picked up, Megamind was unable to transfer control back to the navigation specialist. This resulted in drop_object being called before navigating to the correct box location.

Copy link
Contributor Author

@jmatejcz jmatejcz Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i encountered this issue to. I think it is hard for him to completed a sequence like pick , navigate and drop when these are in other agents. On the other hand in real world scenario, manipulation is a lot harder and i still think that having separate agent for that would be better. But I agree, when tools are simple, we can just make "movement agent". I don't know how to make megamind better at executing sequences between different subagents. Maybe combining it with status prompts, like: Currently holding: Box, or something like that

},
)
validators = [
#### navigate to slot1, detect and pick up 1st object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very challenging to write validators that work in non-deterministic and hard scenario like this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but there is a optimal execution flow and even sometimes qwen3 a3b, managed to do it ;p

@jmatejcz
Copy link
Contributor Author

i refactored megamind to be more generic, like we talked @boczekbartek, i made both class and function, because i didn't know which way is better - so tell me what you like more and i will delete the other

"step_success": StepSuccess(success=False, explanation=""),
"step_messages": [],
}
megamind_system_prompt=task.get_system_prompt(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a planning_prompt for megamind (example) and use it in strucuture_output_node to summarize the overall task progress based on steps done. In my experiments, this helps the decision-making for agent using qwen3 fairly consistently. This will be provided by bench task (example).

Note, the planning_prompt didn't help agent with gpt-4o-mini that much, perhaps we need a different prompt for it. Not sure how gpt-4o works for you, for me, the results vary a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, i copied to this branch here ac60d58

self, input: dict[str, Any], outputs: List[Any]
) -> None:
input["messages"].extend(outputs)
input["step_messages"].extend(outputs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be available to both megamind and sub-task agents. Do we have concerns on providing "too" much context, i.e. context pollution ? ATM, the megamind is structured for task planning and specialists carry out actual steps. IMO, more focused context could help decision-making :). Do you have any observations how context affects agent behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages are indeed passed to State, but megamind llm, does not have them in its prompt, so i m not sure what do you mean by context in this case.

model_name = f"supervisor-{supervisor_name}_executor-{executor_name}"
supervisor_llm = get_llm_for_benchmark(model_name=supervisor_name, vendor="ollama")
executor_llm = get_llm_for_benchmark(
model_name=executor_name, vendor="ollama", reasoning=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd want to take out reasoning=False. Like you said, we can't turn off qwen3 thinking, even with this. This code will fail for gpt-4o-*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasoning = False works for smaller qwen3, like 4b or 8b. It doesn't work for example with bigger 30b, idk why.
But this is a example script, if you want to use gpt, just replace these lines. Maybe you are right that by default there should be model which does not use this param as this might introduce confusion to new user. I applied it here 2123d25

@Juliaj Juliaj closed this Sep 11, 2025
@Juliaj Juliaj reopened this Sep 11, 2025
@Juliaj
Copy link
Collaborator

Juliaj commented Sep 11, 2025

Sorry, my bad, mistakenly close this PR by clicking on the wrong button.

Copy link
Collaborator

@Juliaj Juliaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the code stands now, it looks solid overall! I have two minor suggestions which could be addressed in subsequent PRs:

  • The SortTask validators might be too strict for tool calling validation, which causes the scoring to be inaccurate.
  • Adding some validation for the megamind decision-making process would be a nice enhancement

robot_x, robot_y = self.get_position()
# Find which box we're dropping into
for box_id, box_data in self._boxes.items():
if relative_pos == box_data["relative"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my curiosity on why this requires relative position. Is it coming from VLM for real world application ? For qwen3:8b, you can see agent rationalizes how to deal with this relative position, it was kinda funny.

}


class SortTask(Task):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative name is SortingTask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied here ac60d58

]

def get_system_prompt(self) -> str:
return SYSTEM_PROMPT + "\n" + WAREHOUSE_ENVIRONMENT_DESCRIPTION
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a couple of helper functions for my experiments that might be useful to incorporate, your choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea sure, there will be useful for sure, added them here c98a3f7

Copy link
Member

@boczekbartek boczekbartek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmatejcz Thank you! Good job! I made and initial check of the PR. Please check my comments. I'll continue my review.
Also big thanks to @Juliaj for your comments!

@boczekbartek
Copy link
Member

boczekbartek commented Sep 12, 2025

@jmatejcz

i refactored megamind to be more generic, like we talked @boczekbartek, i made both class and function, because i didn't know which way is better - so tell me what you like more and i will delete the other

Please use the function convention and move the pure langgraph agent to src/rai_core/rai/agents/langchain/core module.
src/rai_core/rai/agents/langchain/core contains pure langgraph agents
src/rai_core/rai/agents/langchain contains agents that can communicate with the world using HRIMessage

Also it would be good to avoid making nested functions and move plan_step out of create_megamind

@jmatejcz
Copy link
Contributor Author

@jmatejcz

i refactored megamind to be more generic, like we talked @boczekbartek, i made both class and function, because i didn't know which way is better - so tell me what you like more and i will delete the other

Please use the function convention and move the pure langgraph agent to src/rai_core/rai/agents/langchain/core module. src/rai_core/rai/agents/langchain/core contains pure langgraph agents src/rai_core/rai/agents/langchain contains agents that can communicate with the world using HRIMessage

Also it would be good to avoid making nested functions and move plan_step out of create_megamind

thanks for info, applied here: 7ce2cf5

Copy link
Member

@boczekbartek boczekbartek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmatejcz I left some minor comments - please have a look. Overall the PR looks good and I think it's ready to be merged.

I tested with the tool_calling_custom_agent.py and I got score:

TASK SCORE: 0.2, TOTAL TIME: 50.188
```.

)
return {
"past_steps": [(task, agent_response["messages"][-1].content)],
# "plan": plan[1:], # removing the step that was executed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment if it's not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here f365ecd


def execute_step(state: PlanExecuteState):
"""Execute the current step of the plan."""
# TODO (jmatejcz) should we pass whole plan or only single the to the executor?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it TODO required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here f365ecd

class PlanExecuteState(ReActAgentState):
"""State for the plan and execute agent."""

# TODO (jmatejcz) should original_task be replaced with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this TODO?
Change to NOTE if it's just a mention for the future or remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here f365ecd

@@ -0,0 +1,277 @@
# Copyright (C) 2024 Robotec.AI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright (C) 2024 Robotec.AI
# Copyright (C) 2025 Robotec.AI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here f365ecd

@jmatejcz
Copy link
Contributor Author

As the code stands now, it looks solid overall! I have two minor suggestions which could be addressed in subsequent PRs:

* The SortTask validators might be too strict for tool calling validation, which causes the scoring to be inaccurate.

* Adding some validation for the megamind decision-making process would be a nice enhancement

Thank you! I won't change the validation or add more tasks in this PR, it can be done in the future.
What do you mean by decision making validation?

@jmatejcz
Copy link
Contributor Author

thank you @Juliaj for the attention and comments on this PR, im merging it, but we can continue discussion here or anywhere you like

@jmatejcz jmatejcz merged commit 7d4d136 into development Sep 12, 2025
6 checks passed
@jmatejcz jmatejcz deleted the jm/feat/warehouse-task-and-megamind-agent branch September 12, 2025 14:00
maciejmajek added a commit that referenced this pull request Sep 15, 2025
…#620)

feat: basic tasks extension (#644)

feat: tool calling custom interfaces tasks extension (#636)

feat: tool calling spatial reasoning tasks extension (#637)

refactor: remove navigation tasks (#638)

refactor: o3de config (#630)

refactor(`nav2_toolkit`): remove unused `action_client` (#670)

fix: manipulaiton bench fixes (#653)

docs: rai simbench docs update (#665)

feat: planning task and megamind agent (#679)

feat: megamind context providers (#687)

feat: tool calling bench - manipulation tasks extenstion (#656)

chore: resolving conflicts  (#690)

Co-authored-by: Julia Jia <juliajster@gmail.com>
Co-authored-by: Magdalena Kotynia <magdalena.kotynia@robotec.ai>
Co-authored-by: Maciej Majek <46171033+maciejmajek@users.noreply.github.com>
Co-authored-by: Pawel Kotowski <pawel.kotowski@oleahealth.ai>
Co-authored-by: Brian Tuan <btuan@users.noreply.github.com>
maciejmajek added a commit that referenced this pull request Sep 15, 2025
…#620)

feat: basic tasks extension (#644)

feat: tool calling custom interfaces tasks extension (#636)

feat: tool calling spatial reasoning tasks extension (#637)

refactor: remove navigation tasks (#638)

refactor: o3de config (#630)

refactor(`nav2_toolkit`): remove unused `action_client` (#670)

fix: manipulaiton bench fixes (#653)

docs: rai simbench docs update (#665)

feat: planning task and megamind agent (#679)

feat: megamind context providers (#687)

feat: tool calling bench - manipulation tasks extenstion (#656)

chore: resolving conflicts  (#690)

Co-authored-by: Julia Jia <juliajster@gmail.com>
Co-authored-by: Magdalena Kotynia <magdalena.kotynia@robotec.ai>
Co-authored-by: Maciej Majek <46171033+maciejmajek@users.noreply.github.com>
Co-authored-by: Pawel Kotowski <pawel.kotowski@oleahealth.ai>
Co-authored-by: Brian Tuan <btuan@users.noreply.github.com>
maciejmajek added a commit that referenced this pull request Sep 15, 2025
feat: tool calling benchmark unified across types and prompts variety… (#620)

feat: basic tasks extension (#644)

feat: tool calling custom interfaces tasks extension (#636)

feat: tool calling spatial reasoning tasks extension (#637)

refactor: remove navigation tasks (#638)

refactor: o3de config (#630)

refactor(nav2_toolkit): remove unused action_client (#670)

fix: manipulaiton bench fixes (#653)

docs: rai simbench docs update (#665)

feat: planning task and megamind agent (#679)

feat: megamind context providers (#687)

feat: tool calling bench - manipulation tasks extenstion (#656)

chore: resolving conflicts (#690)

Co-authored-by: Jakub Matejczyk <58983084+jmatejcz@users.noreply.github.com>
Co-authored-by: Julia Jia <juliajster@gmail.com>
Co-authored-by: Magdalena Kotynia <magdalena.kotynia@robotec.ai>
Co-authored-by: Pawel Kotowski <pawel.kotowski@oleahealth.ai>
Co-authored-by: Brian Tuan <btuan@users.noreply.github.com>
Co-authored-by: jmatejcz <jakub.matejczyk@robotec.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants