Skip to content

ECNU-SII/MIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MIA Title
Memory Intelligence Agent

An Agent Memory Framework Designed For Deep Research Agents

πŸš€ Latest News

  • [April 1, 2026]: 🌈 Whole Training and Evaluation Codebase, Models and Datasets are published.

πŸ“Œ Overview

MIA (Memory In Intelligence Agent) is a memory framewoek designed for deep research agents. It is developed by a joint team from the Shanghai Institute of Intelligence (SII) and East China Normal University (ECNU). To address the limitations of existing agents, such as ineffective memory evolution and expensive storage costs, MIA introduces a sophisticated Manager-Planner-Executor architecture. This framework features a Manager for memory storage, a Planner for memory usage, and an Executor for memory-guided inference. MIA’s core innovations include an Alternative Reinforcement Learning paradigm for seamless multi-agent cooperation and a Continual Test-Time Learning mechanism that allows the Planner to evolve on-the-fly during inference. By establishing a collaborative loop between parametric and non-parametric memories with Reflection and Unsupervised Judgment, MIA enables efficient autonomous evolution and robust reasoning in complex, open-world scenarios.

πŸ† Performance

πŸ“Š Experimental Analysis

Our comprehensive evaluation across multiple benchmarks demonstrates that MIA significantly improves the performance of Deep Research Agents:

  • Elevating the State-of-the-Art (a & b): Comparative bar charts on LiveVQA (text-only) and HotpotQA (multimodal, sandbox-based Wiki search) reveal that MIA consistently boosts the performance of current SOTA Large Language Models, proving its efficacy in both text and complex multimodal reasoning tasks.
  • The "Small-to-Great" Leap (c): Utilizing a Qwen-2.5-VL-7B-based Executor, MIA enables this 7B model to achieve a staggering performance breakthrough across 7 diverse datasets. Remarkably, the MIA-enhanced 7B model not only outperforms the larger Qwen-2.5-VL-32B but also surpasses former closed-source models such as GPT-4o and Gemini 2.5 Pro (in non-tool-calling settings). This underscores MIA’s ability to unlock "super-model" intelligence within efficient, smaller-scale parameters.
  • Superiority in Agent Memory (d): When benchmarked against contemporary SOTA agent memory frameworks using a unified Qwen-2.5-VL-7B Executor, MIA achieves top-tier results across all 7 datasets. These results establish MIA as a new benchmark in memory-augmented architectures, offering unparalleled efficiency and reasoning depth.

🦞 OpenClaw Skills

We also provide two MIA versions of OpenClaw skills in , which not only integrates MIA memory framework, but also includes trust-worthy judgment mechanism. Here are the MIA memory and trust-worthy demos.

MIA Memory Demo:

Trust-Worthy Demo:

πŸ› οΈ Tools

1. Online Text Search πŸ’»

The core implementation is mainly in web_tools/server. Open web_tools/run.sh and configure the Google Search Serper key:

export SERPER_KEY_ID="xxxxx"

Start the run script:

cd web_tools
bash ./run.sh

Service SERVICE_URL/server, method SERVICE_URL/server/search

2. Offline Text Search πŸ“–

The core implementation is mainly in local_search. Refer to the setup instructions in search-r1. This project uses wiki25 local retrieval. Configure the path and start the run script:

cd local_search
bash ./run.sh

Service http://localhost:8001/, method http://localhost:8001/retrieve

3. Image-to-Image Search 🎨

The image search cache used in this project: image_search_cache

βš™οΈ Environment

conda create -n verl python==3.10.12

Run the install.sh script in the train directory to install dependencies. Flash-attention needs to be installed separately:

wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install --no-cache-dir flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

🧬 Data Preparation

Training: πŸ€— Train

Testing: πŸ€— Test, πŸ€— TTRL

✨ Two-Stage RL Training

⚑ Executor Training

Our implementation is based on VeRL. Key modifications:

  • The core interaction implementation is in /Executor-Train/Train/verl/experimental/tool_agent_loop.py.
  • prompt is defined in /Executor-Train/Train/local_search/prompt.py.
  • Custom dataset processing (CustomRLHFDataset) and reward score computation (compute_score) are in Executor-Train/Train/local_search/mmsearch.py.
  • Tool implementations are in verl.tools.search_tool.SearchTool and verl.tools.web_image_to_image_search_tool.WebImageToImageSearchTool.
  • The run script is at /Executor-Train/Train/local_search/run_mmsearch_grpo.sh.

1. Deploy the local text search tool.

2. Configure /Executor-Train/Train/local_search/mm_search_tool_config.yaml and /Executor-Train/Train/local_search/mmsearch.yaml:

  • mm_search_tool_config.yaml
    • tools[0].config.retrieval_service_url: local search service URL
    • tools[1].config.fvqa_train_cache_path, tools[1].config.test_cache_path: image search cache paths for the test and validation sets
  • mmsearch.yaml
    • hydra.searchpath: trainer config path
    • data.custom_cls.path: custom dataset code path
    • actor_rollout_ref.rollout.multi_turn.tool_config_path: path to the tool config mm_search_tool_config.yaml

3. Deploy Qwen3-32B on Node 1 as Planner & Judger:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Qwen/Qwen3-32B \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

LLM service at your_url/8002/v1

4. Deploy the Memory-Planner service (can be on the same node as Planner & Judger):

cd Memory-Serve
cd TRAIN_PLANNER

Configure the run script run.sh: set both MEMORY_URL and PLAN_URL to the LLM service deployed in the previous step. To improve training efficiency, memory content and initial plan are collected in advance. Only the replan service is needed here: your_url/5000/replan_train.

5. Configure the training script /Executor-Train/Train/local_search/run_mmsearch_grpo.sh:

  • JUDGE_URL: judge service, set to your_url/8002/v1
  • REPLAN_URL: replan service, set to your_url/5000/replan_train
  • WANDB_API_KEY: WandB API key (optional)
  • SAVE_CHECKPOINT_DIR: model save path
  • DATASET_TRAIN: training dataset path
  • DATASET_VAL: validation dataset path
  • REF_MODEL_PATH: pretrained model path

6. Start training on Node 2. Navigate to /Executor-Train/Train/:

bash ./local_search/run_mmsearch_grpo.sh

7. Export the model:

python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir /your_path/actor \
    --target_dir /your_path

Download our trained Executor πŸ€— here

⚑ Planner Training

Our implementation is based on VeRL. Key modifications:

  • The core interaction implementation is in /Planner-Train/mem-plan/verl/experimental/multi_turn_loop.py.
  • prompt is defined in /Planner-Train/mem-plan/local_search/prompt.py.
  • Custom dataset processing (CustomRLHFDataset) and reward score computation (compute_score) are in /Planner-Train/mem-plan/local_search/mmsearch.py.
  • The run script is at /Planner-Train/mem-plan/local_search/run_mmsearch_grpo.sh.

1. Deploy the Judger service on Node 1:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Qwen/Qwen3-32B \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

2. Deploy the Executor service on Node 2.

Deploy the trained Executor:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Executor \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

Navigate to /Serve/Train_Planner and configure the run script serve.sh:

  • AGENT_URL: Executor service URL
  • SERVICE_URL: offline text search service URL
  • TEST_CACHE_DIR: image-to-image search cache path
  • MAX_LLM_CALL_PER_RUN: maximum number of interaction rounds between Executor and tools

Start the service:

bash serve.sh

3. Configure the training script /Planner-Train/mem-plan/local_search/run_mmsearch_grpo.sh:

  • JUDGE_URL: judge service, set to your_url/8002/v1
  • PLAN_URL: Executor service for plan responses, set to your_url/5000/plan
  • REPLAN_URL: Executor service for replan responses, set to your_url/5000/replan
  • WANDB_API_KEY: WandB API key (optional)
  • SAVE_CHECKPOINT_DIR: model save path
  • DATASET_TRAIN: training dataset path
  • DATASET_VAL: validation dataset path
  • REF_MODEL_PATH: pretrained model path

To improve training efficiency, memory content and image caption are collected in advance.

4. Start training on Node 3. Navigate to /Planner-Train/mem-plan/:

bash ./local_search/run_mmsearch_grpo.sh

5. Export the model:

python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir /your_path/actor \
    --target_dir /your_path

Download our trained Planner πŸ€— here

πŸ” Inference

πŸ’‘ TTRL

1. Deploy the Memory Manager & Judger service on Node 1:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Qwen/Qwen3-32B \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

2. Deploy the Executor service on Node 2:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Executor \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

Navigate to /Serve/MIA-TTRL and configure the run script serve.sh:

  • AGENT_URL: Executor service URL
  • SERVICE_URL: offline/online text search service URL
  • TEST_CACHE_DIR: image-to-image search cache path
  • MAX_LLM_CALL_PER_RUN: maximum number of interaction rounds between Executor and tools
  • MEMORY_URL: Memory Manager service URL
  • TTRL_SAVE: output path during exploration
  • PARQUET_PATH: test set path with images (required for image datasets)

In /Serve/MIA-TTRL/call_agent.py, configure the text search method (choose one):

from tool_search_local import *   # offline text search
from tool_serper import *   # online text search

Switch the Python script in /Serve/MIA-TTRL/serve.sh to select the mode (supervised / unsupervised):

python agent_serve_ttrl.py ....   # scenario where Ground-Truth is available after each question
python agent_serve_ttrl_nogt.py ....   # scenario where Ground-Truth is unavailable after each question

3. Configure the script /TTRL/TTRL/local_search/run_mmsearch_grpo.sh (supervised) or /TTRL/TTRL-nogt/local_search/run_mmsearch_grpo.sh (unsupervised):

  • JUDGE_URL: judge service, set to your_url/8002/v1
  • MEMORY_URL: memory retrieval service, set to your_url/5000/memory
  • PLAN_URL: Executor service for plan responses, set to your_url/5000/plan
  • REPLAN_URL: Executor service for replan responses, set to your_url/5000/replan
  • MEMORY_BANK_SAVE_URL: service for saving memories to the buffer (current batch exploration not yet complete), set to your_url/5000/memory_bank_save
  • BATCH_EVALUATE_URL: service for evaluating current batch samples, set to your_url/5000/batch_evaluate
  • CONSOLIDATE_MEMORIES_URL: service for extracting memories from all buffered samples, set to your_url/5000/consolidate_memories
  • SAVE_MEMORIES_URL: service for saving all memories, set to your_url/5000/save_memory
  • WANDB_API_KEY: WandB API key (optional)
  • SAVE_CHECKPOINT_DIR: model save path
  • DATASET_TRAIN: dataset path
  • DATASET_VAL: unused, set to the same as DATASET_TRAIN
  • REF_MODEL_PATH: initial Planner path

Start the service:

bash serve.sh

4. Start the Planner training on Node 3. Navigate to /TTRL/TTRL/ or /TTRL/TTRL-nogt/:

bash ./local_search/run_mmsearch_grpo.sh

βš–οΈ License

Released under the MIT License.

πŸŽ“ Contributors

PhD Student: Weicheng Meng, Yu Cheng, Zhihang Lin

Student Leader: Jingyang Qiao

Professor: Zhizhong Zhang, Xin Tan, Jingyu Gong, Zhaoxia Yin

Projector Leader: Yuan Xie

🎯 To-Do List

We also plan to release the following next versions:

  1. High-Efficiency Version

  2. Trust-worthy Version

Star History Chart

About

Memory Intelligence Agent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors