Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model
Xuankun Rong, Wenke Huang, Wenzheng Jiang, Yiming Li, Wenxuan Wang, Mang Ye*
The massive scale of data and computation required for training Multimodal Large Language Models (MLLMs) has fueled the rise of Fine-Tuning as a Service (FTaaS), enabling users to rapidly customize models for diverse real-world tasks. While FTaaS democratizes access to advanced multimodal intelligence, it also introduces serious security concerns, particularly backdoor attacks. In this work, we systematically analyze backdoor vulnerabilities in MLLMs under the FTaaS paradigm, revealing two key phenomena: (1) markedly reduced sensitivity to textual variations when a visual trigger is present, and (2) abnormally stable model confidence even under strong semantic perturbations. Building on these insights, we propose Trap on Text (ToT), a novel inference-time backdoor detection framework. ToT applies controlled semantic perturbations to textual prompts and jointly analyzes the semantic consistency and confidence drift of the model’s responses, enabling robust detection of backdoor activations without requiring model parameters, architectures or clean reference data. Extensive experiments across architectures and datasets show that ToT achieves strong attack mitigation and preserves clean accuracy, offering a practical solution for safeguarding FTaaS workflows.
Please kindly cite this paper in your publications if it helps your research:
@inproceedings{rong2026probing,
title={Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model},
author={Rong, Xuankun and Huang, Wenke and Jiang, Wenzheng and Li, Yiming and Wang, Wenxuan and Ye, Mang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
number={42},
pages={35775--35783},
year={2026}
}