Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model

Xuankun Rong, Wenke Huang, Wenzheng Jiang, Yiming Li, Wenxuan Wang, Mang Ye*

🙌 Abstract

The massive scale of data and computation required for training Multimodal Large Language Models (MLLMs) has fueled the rise of Fine-Tuning as a Service (FTaaS), enabling users to rapidly customize models for diverse real-world tasks. While FTaaS democratizes access to advanced multimodal intelligence, it also introduces serious security concerns, particularly backdoor attacks. In this work, we systematically analyze backdoor vulnerabilities in MLLMs under the FTaaS paradigm, revealing two key phenomena: (1) markedly reduced sensitivity to textual variations when a visual trigger is present, and (2) abnormally stable model confidence even under strong semantic perturbations. Building on these insights, we propose Trap on Text (ToT), a novel inference-time backdoor detection framework. ToT applies controlled semantic perturbations to textual prompts and jointly analyzes the semantic consistency and confidence drift of the model’s responses, enabling robust detection of backdoor activations without requiring model parameters, architectures or clean reference data. Extensive experiments across architectures and datasets show that ToT achieves strong attack mitigation and preserves clean accuracy, offering a practical solution for safeguarding FTaaS workflows.

🥳 Citation

Please kindly cite this paper in your publications if it helps your research:

@inproceedings{rong2026probing,
  title={Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model},
  author={Rong, Xuankun and Huang, Wenke and Jiang, Wenzheng and Li, Yiming and Wang, Wenxuan and Ye, Mang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={42},
  pages={35775--35783},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
InternVL		InternVL
LLaVA		LLaVA
assets		assets
README.md		README.md
eval_iconqa.py		eval_iconqa.py
eval_science_qa.py		eval_science_qa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model

🙌 Abstract

🥳 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model

🙌 Abstract

🥳 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages