Memory Intelligence Agent

An Agent Memory Framework Designed For Deep Research Agents

🚀 Latest News

[April 1, 2026]: 🌈 Whole Training and Evaluation Codebase, Models and Datasets are published.

📌 Overview

MIA (Memory In Intelligence Agent) is a memory framewoek designed for deep research agents. It is developed by a joint team from the Shanghai Institute of Intelligence (SII) and East China Normal University (ECNU). To address the limitations of existing agents, such as ineffective memory evolution and expensive storage costs, MIA introduces a sophisticated Manager-Planner-Executor architecture. This framework features a Manager for memory storage, a Planner for memory usage, and an Executor for memory-guided inference. MIA’s core innovations include an Alternative Reinforcement Learning paradigm for seamless multi-agent cooperation and a Continual Test-Time Learning mechanism that allows the Planner to evolve on-the-fly during inference. By establishing a collaborative loop between parametric and non-parametric memories with Reflection and Unsupervised Judgment, MIA enables efficient autonomous evolution and robust reasoning in complex, open-world scenarios.

🏆 Performance

📊 Experimental Analysis

Our comprehensive evaluation across multiple benchmarks demonstrates that MIA significantly improves the performance of Deep Research Agents:

Elevating the State-of-the-Art (a & b): Comparative bar charts on LiveVQA (text-only) and HotpotQA (multimodal, sandbox-based Wiki search) reveal that MIA consistently boosts the performance of current SOTA Large Language Models, proving its efficacy in both text and complex multimodal reasoning tasks.
The "Small-to-Great" Leap (c): Utilizing a Qwen-2.5-VL-7B-based Executor, MIA enables this 7B model to achieve a staggering performance breakthrough across 7 diverse datasets. Remarkably, the MIA-enhanced 7B model not only outperforms the larger Qwen-2.5-VL-32B but also surpasses former closed-source models such as GPT-4o and Gemini 2.5 Pro (in non-tool-calling settings). This underscores MIA’s ability to unlock "super-model" intelligence within efficient, smaller-scale parameters.
Superiority in Agent Memory (d): When benchmarked against contemporary SOTA agent memory frameworks using a unified Qwen-2.5-VL-7B Executor, MIA achieves top-tier results across all 7 datasets. These results establish MIA as a new benchmark in memory-augmented architectures, offering unparalleled efficiency and reasoning depth.

🦞 OpenClaw Skills

We also provide two MIA versions of OpenClaw skills in , which not only integrates MIA memory framework, but also includes trust-worthy judgment mechanism. Here are the MIA memory and trust-worthy demos.

MIA Memory Demo:

Trust-Worthy Demo:

🛠️ Tools

1. Online Text Search 💻

The core implementation is mainly in web_tools/server. Open web_tools/run.sh and configure the Google Search Serper key:

export SERPER_KEY_ID="xxxxx"

Start the run script:

cd web_tools
bash ./run.sh

Service SERVICE_URL/server, method SERVICE_URL/server/search

2. Offline Text Search 📖

The core implementation is mainly in local_search. Refer to the setup instructions in search-r1. This project uses wiki25 local retrieval. Configure the path and start the run script:

cd local_search
bash ./run.sh

Service http://localhost:8001/, method http://localhost:8001/retrieve

3. Image-to-Image Search 🎨

The image search cache used in this project: image_search_cache

⚙️ Environment

conda create -n verl python==3.10.12

Run the install.sh script in the train directory to install dependencies. Flash-attention needs to be installed separately:

wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install --no-cache-dir flash_attn-2.8.3+cu12torch2.7cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

🧬 Data Preparation

Training: 🤗 Train

Testing: 🤗 Test, 🤗 TTRL

✨ Two-Stage RL Training

⚡ Executor Training

Our implementation is based on VeRL. Key modifications:

The core interaction implementation is in /Executor-Train/Train/verl/experimental/tool_agent_loop.py.
prompt is defined in /Executor-Train/Train/local_search/prompt.py.
Custom dataset processing (CustomRLHFDataset) and reward score computation (compute_score) are in Executor-Train/Train/local_search/mmsearch.py.
Tool implementations are in verl.tools.search_tool.SearchTool and verl.tools.web_image_to_image_search_tool.WebImageToImageSearchTool.
The run script is at /Executor-Train/Train/local_search/run_mmsearch_grpo.sh.

1. Deploy the local text search tool.

2. Configure /Executor-Train/Train/local_search/mm_search_tool_config.yaml and /Executor-Train/Train/local_search/mmsearch.yaml:

mm_search_tool_config.yaml
- tools[0].config.retrieval_service_url: local search service URL
- tools[1].config.fvqa_train_cache_path, tools[1].config.test_cache_path: image search cache paths for the test and validation sets
mmsearch.yaml
- hydra.searchpath: trainer config path
- data.custom_cls.path: custom dataset code path
- actor_rollout_ref.rollout.multi_turn.tool_config_path: path to the tool config mm_search_tool_config.yaml

3. Deploy Qwen3-32B on Node 1 as Planner & Judger:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Qwen/Qwen3-32B \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

LLM service at your_url/8002/v1

4. Deploy the Memory-Planner service (can be on the same node as Planner & Judger):

cd Memory-Serve
cd TRAIN_PLANNER

Configure the run script run.sh: set both MEMORY_URL and PLAN_URL to the LLM service deployed in the previous step. To improve training efficiency, memory content and initial plan are collected in advance. Only the replan service is needed here: your_url/5000/replan_train.

5. Configure the training script /Executor-Train/Train/local_search/run_mmsearch_grpo.sh:

JUDGE_URL: judge service, set to your_url/8002/v1
REPLAN_URL: replan service, set to your_url/5000/replan_train
WANDB_API_KEY: WandB API key (optional)
SAVE_CHECKPOINT_DIR: model save path
DATASET_TRAIN: training dataset path
DATASET_VAL: validation dataset path
REF_MODEL_PATH: pretrained model path

6. Start training on Node 2. Navigate to /Executor-Train/Train/:

bash ./local_search/run_mmsearch_grpo.sh

7. Export the model:

python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir /your_path/actor \
    --target_dir /your_path

Download our trained Executor 🤗 here

⚡ Planner Training

Our implementation is based on VeRL. Key modifications:

The core interaction implementation is in /Planner-Train/mem-plan/verl/experimental/multi_turn_loop.py.
prompt is defined in /Planner-Train/mem-plan/local_search/prompt.py.
Custom dataset processing (CustomRLHFDataset) and reward score computation (compute_score) are in /Planner-Train/mem-plan/local_search/mmsearch.py.
The run script is at /Planner-Train/mem-plan/local_search/run_mmsearch_grpo.sh.

1. Deploy the Judger service on Node 1:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Qwen/Qwen3-32B \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

2. Deploy the Executor service on Node 2.

Deploy the trained Executor:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Executor \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

Navigate to /Serve/Train_Planner and configure the run script serve.sh:

AGENT_URL: Executor service URL
SERVICE_URL: offline text search service URL
TEST_CACHE_DIR: image-to-image search cache path
MAX_LLM_CALL_PER_RUN: maximum number of interaction rounds between Executor and tools

Start the service:

bash serve.sh

3. Configure the training script /Planner-Train/mem-plan/local_search/run_mmsearch_grpo.sh:

JUDGE_URL: judge service, set to your_url/8002/v1
PLAN_URL: Executor service for plan responses, set to your_url/5000/plan
REPLAN_URL: Executor service for replan responses, set to your_url/5000/replan
WANDB_API_KEY: WandB API key (optional)
SAVE_CHECKPOINT_DIR: model save path
DATASET_TRAIN: training dataset path
DATASET_VAL: validation dataset path
REF_MODEL_PATH: pretrained model path

To improve training efficiency, memory content and image caption are collected in advance.

4. Start training on Node 3. Navigate to /Planner-Train/mem-plan/:

bash ./local_search/run_mmsearch_grpo.sh

5. Export the model:

python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir /your_path/actor \
    --target_dir /your_path

Download our trained Planner 🤗 here

🔍 Inference

💡 TTRL

1. Deploy the Memory Manager & Judger service on Node 1:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Qwen/Qwen3-32B \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

2. Deploy the Executor service on Node 2:

export VLLM_USE_FLASHINFER_SAMPLER=0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve /your_path/Executor \
    --tensor-parallel-size 4 \
    --served-model-name "qwen" \
    --gpu-memory-utilization 0.8 \
    --host 0.0.0.0 \
    --port 8002

Navigate to /Serve/MIA-TTRL and configure the run script serve.sh:

AGENT_URL: Executor service URL
SERVICE_URL: offline/online text search service URL
TEST_CACHE_DIR: image-to-image search cache path
MAX_LLM_CALL_PER_RUN: maximum number of interaction rounds between Executor and tools
MEMORY_URL: Memory Manager service URL
TTRL_SAVE: output path during exploration
PARQUET_PATH: test set path with images (required for image datasets)

In /Serve/MIA-TTRL/call_agent.py, configure the text search method (choose one):

from tool_search_local import *   # offline text search
from tool_serper import *   # online text search

Switch the Python script in /Serve/MIA-TTRL/serve.sh to select the mode (supervised / unsupervised):

python agent_serve_ttrl.py ....   # scenario where Ground-Truth is available after each question
python agent_serve_ttrl_nogt.py ....   # scenario where Ground-Truth is unavailable after each question

3. Configure the script /TTRL/TTRL/local_search/run_mmsearch_grpo.sh (supervised) or /TTRL/TTRL-nogt/local_search/run_mmsearch_grpo.sh (unsupervised):

JUDGE_URL: judge service, set to your_url/8002/v1
MEMORY_URL: memory retrieval service, set to your_url/5000/memory
PLAN_URL: Executor service for plan responses, set to your_url/5000/plan
REPLAN_URL: Executor service for replan responses, set to your_url/5000/replan
MEMORY_BANK_SAVE_URL: service for saving memories to the buffer (current batch exploration not yet complete), set to your_url/5000/memory_bank_save
BATCH_EVALUATE_URL: service for evaluating current batch samples, set to your_url/5000/batch_evaluate
CONSOLIDATE_MEMORIES_URL: service for extracting memories from all buffered samples, set to your_url/5000/consolidate_memories
SAVE_MEMORIES_URL: service for saving all memories, set to your_url/5000/save_memory
WANDB_API_KEY: WandB API key (optional)
SAVE_CHECKPOINT_DIR: model save path
DATASET_TRAIN: dataset path
DATASET_VAL: unused, set to the same as DATASET_TRAIN
REF_MODEL_PATH: initial Planner path

Start the service:

bash serve.sh

4. Start the Planner training on Node 3. Navigate to /TTRL/TTRL/ or /TTRL/TTRL-nogt/:

bash ./local_search/run_mmsearch_grpo.sh

⚖️ License

Released under the MIT License.

🎓 Contributors

PhD Student: Weicheng Meng, Yu Cheng, Zhihang Lin

Student Leader: Jingyang Qiao

Professor: Zhizhong Zhang, Xin Tan, Jingyu Gong, Zhaoxia Yin

Projector Leader: Yuan Xie

🎯 To-Do List

We also plan to release the following next versions:

High-Efficiency Version
Trust-worthy Version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memory Intelligence Agent

🚀 Latest News

📌 Overview

🏆 Performance

🦞 OpenClaw Skills

🛠️ Tools

1. Online Text Search 💻

2. Offline Text Search 📖

3. Image-to-Image Search 🎨

⚙️ Environment

🧬 Data Preparation

✨ Two-Stage RL Training

⚡ Executor Training

⚡ Planner Training

🔍 Inference

💡 TTRL

⚖️ License

🎓 Contributors

🎯 To-Do List

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Executor-Train/Train		Executor-Train/Train
Inference		Inference
Memory-Serve		Memory-Serve
Planner-Train/mem-plan		Planner-Train/mem-plan
Serve		Serve
TTRL		TTRL
local_search		local_search
readme_cn		readme_cn
readme_en		readme_en
web_tools		web_tools
README.md		README.md
README_CN.md		README_CN.md

Folders and files

Latest commit

History

Repository files navigation

Memory Intelligence Agent

🚀 Latest News

📌 Overview

🏆 Performance

🦞 OpenClaw Skills

🛠️ Tools

1. Online Text Search 💻

2. Offline Text Search 📖

3. Image-to-Image Search 🎨

⚙️ Environment

🧬 Data Preparation

✨ Two-Stage RL Training

⚡ Executor Training

⚡ Planner Training

🔍 Inference

💡 TTRL

⚖️ License

🎓 Contributors

🎯 To-Do List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages