[Bug]: ValueError: Cannot cast <zmq.Socket(zmq.ROUTER) at 0x796c63de24a0> to int

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
DEBUG 06-10 18:53:37 [__init__.py:28] No plugins for group vllm.platform_plugins found.
DEBUG 06-10 18:53:37 [__init__.py:34] Checking if TPU platform is available.
DEBUG 06-10 18:53:37 [__init__.py:44] TPU platform is not available because: No module named 'libtpu'
DEBUG 06-10 18:53:37 [__init__.py:51] Checking if CUDA platform is available.
DEBUG 06-10 18:53:37 [__init__.py:71] Confirmed CUDA platform is available.
DEBUG 06-10 18:53:37 [__init__.py:99] Checking if ROCm platform is available.
DEBUG 06-10 18:53:37 [__init__.py:113] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 06-10 18:53:37 [__init__.py:120] Checking if HPU platform is available.
DEBUG 06-10 18:53:37 [__init__.py:127] HPU platform is not available because habana_frameworks is not found.
DEBUG 06-10 18:53:37 [__init__.py:137] Checking if XPU platform is available.
DEBUG 06-10 18:53:37 [__init__.py:147] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 06-10 18:53:37 [__init__.py:154] Checking if CPU platform is available.
DEBUG 06-10 18:53:37 [__init__.py:176] Checking if Neuron platform is available.
DEBUG 06-10 18:53:37 [__init__.py:51] Checking if CUDA platform is available.
DEBUG 06-10 18:53:37 [__init__.py:71] Confirmed CUDA platform is available.
INFO 06-10 18:53:37 [__init__.py:243] Automatically detected platform cuda.
2025-06-10 18:53:38.390052: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1749581618.412744     397 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749581618.420098     397 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Collecting environment information...
/usr/local/lib/python3.11/dist-packages/_distutils_hack/__init__.py:31: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : 14.0.0-1ubuntu1.1
CMake version                : version 3.31.6
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0+cu126
Is debug build               : False
CUDA used to build PyTorch   : 12.6
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-6.6.56+-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.5.82
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA L4
GPU 1: NVIDIA L4
GPU 2: NVIDIA L4
GPU 3: NVIDIA L4

Nvidia driver version        : 560.35.03
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.2.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.2.1
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               48
On-line CPU(s) list:                  0-47
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) CPU @ 2.20GHz
CPU family:                           6
Model:                                85
Thread(s) per core:                   2
Core(s) per socket:                   24
Socket(s):                            1
Stepping:                             7
BogoMIPS:                             4400.40
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            768 KiB (24 instances)
L1i cache:                            768 KiB (24 instances)
L2 cache:                             24 MiB (24 instances)
L3 cache:                             38.5 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-47
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

==============================
Versions of relevant libraries
==============================
[pip3] mypy_extensions==1.1.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvcc-cu12==12.5.82
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-ml-py==12.575.51
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvcomp-cu12==4.2.0.11
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] onnx==1.17.0
[pip3] optree==0.14.1
[pip3] pynvml==12.0.0
[pip3] pytorch-ignite==0.5.2
[pip3] pytorch-lightning==2.5.1.post0
[pip3] pyzmq==26.4.0
[pip3] sentence-transformers==3.4.1
[pip3] torch==2.7.0
[pip3] torchao==0.10.0
[pip3] torchaudio==2.7.0
[pip3] torchdata==0.11.0
[pip3] torchinfo==1.8.0
[pip3] torchmetrics==1.7.1
[pip3] torchsummary==1.5.1
[pip3] torchtune==0.6.1
[pip3] torchvision==0.22.0
[pip3] transformers==4.51.3
[pip3] triton==3.3.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.9.0.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  	GPU0	GPU1	GPU2	GPU3	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PHB	PHB	PHB	0-45	0		N/A
GPU1	PHB	 X 	PHB	PHB	0-45	0		N/A
GPU2	PHB	PHB	 X 	PHB	0-45	0		N/A
GPU3	PHB	PHB	PHB	 X 	0-45	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=all
PYTORCH_NVML_BASED_CUDA_CHECK=1
NVIDIA_REQUIRE_CUDA=cuda>=12.5 brand=unknown,driver>=470,driver<471 brand=grid,driver>=470,driver<471 brand=tesla,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=vapps,driver>=470,driver<471 brand=vpc,driver>=470,driver<471 brand=vcs,driver>=470,driver<471 brand=vws,driver>=470,driver<471 brand=cloudgaming,driver>=470,driver<471 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551
NCCL_VERSION=2.22.3-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
VLLM_WORKER_MULTIPROC_METHOD=spawn
NVIDIA_PRODUCT_NAME=CUDA
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root
CUDA_VERSION=12.5.1
VLLM_TRACE_FUNCTION=1
TORCHINDUCTOR_COMPILE_THREADS=1
LD_LIBRARY_PATH=/usr/local/lib/python3.11/dist-packages/cv2/../../lib64:/usr/local/lib/python3.11/dist-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
MKL_THREADING_LAYER=GNU
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
VLLM_LOGGING_LEVEL=DEBUG
CUDA_MODULE_LOADING=LAZY
NCCL_CUMEM_ENABLE=0
VLLM_USE_V1=1
```

</details>

### 🐛 Describe the bug

Running the following on Kaggle Notebook GPU L4 x4

```Python
!pip install vllm
from vllm import LLM
llm = LLM(model="facebook/opt-125m")
```
gives the following error whenever vllm is installed the first time.

```
INFO 06-10 19:07:55 [__init__.py:243] Automatically detected platform cuda.
2025-06-10 19:07:57.514106: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1749582477.759738      99 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749582477.832065      99 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO 06-10 19:08:14 [__init__.py:31] Available plugins for group vllm.general_plugins:
INFO 06-10 19:08:14 [__init__.py:33] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
INFO 06-10 19:08:14 [__init__.py:36] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
config.json: 100%
 651/651 [00:00<00:00, 83.0kB/s]
INFO 06-10 19:08:29 [config.py:793] This model supports multiple tasks: {'embed', 'score', 'classify', 'reward', 'generate'}. Defaulting to 'generate'.
INFO 06-10 19:08:29 [config.py:2118] Chunked prefill is enabled with max_num_batched_tokens=8192.
tokenizer_config.json: 100%
 685/685 [00:00<00:00, 101kB/s]
vocab.json: 100%
 899k/899k [00:00<00:00, 17.1MB/s]
merges.txt: 100%
 456k/456k [00:00<00:00, 26.1MB/s]
special_tokens_map.json: 100%
 441/441 [00:00<00:00, 69.6kB/s]
generation_config.json: 100%
 137/137 [00:00<00:00, 21.2kB/s]
WARNING 06-10 19:08:31 [utils.py:2531] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reason: CUDA is initialized
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_99/2748578261.py in <cell line: 0>()
      1 get_ipython().system('pip install vllm')
      2 from vllm import LLM
----> 3 llm = LLM(model="facebook/opt-125m")

/usr/local/lib/python3.11/dist-packages/vllm/utils.py in inner(*args, **kwargs)
   1181                     )
   1182 
-> 1183             return fn(*args, **kwargs)
   1184 
   1185         return inner  # type: ignore

/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py in __init__(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, allowed_local_media_path, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, hf_token, hf_overrides, mm_processor_kwargs, task, override_pooler_config, compilation_config, **kwargs)
    251 
    252         # Create the Engine (autoselects V0 vs V1)
--> 253         self.llm_engine = LLMEngine.from_engine_args(
    254             engine_args=engine_args, usage_context=UsageContext.LLM_CLASS)
    255         self.engine_class = type(self.llm_engine)

/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py in from_engine_args(cls, engine_args, usage_context, stat_loggers)
    499             engine_cls = V1LLMEngine
    500 
--> 501         return engine_cls.from_vllm_config(
    502             vllm_config=vllm_config,
    503             usage_context=usage_context,

/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/llm_engine.py in from_vllm_config(cls, vllm_config, usage_context, stat_loggers, disable_log_stats)
    121         disable_log_stats: bool = False,
    122     ) -> "LLMEngine":
--> 123         return cls(vllm_config=vllm_config,
    124                    executor_class=Executor.get_class(vllm_config),
    125                    log_stats=(not disable_log_stats),

/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/llm_engine.py in __init__(self, vllm_config, executor_class, log_stats, usage_context, stat_loggers, mm_registry, use_cached_outputs, multiprocess_mode)
     98 
     99         # EngineCore (gets EngineCoreRequests and gives EngineCoreOutputs)
--> 100         self.engine_core = EngineCoreClient.make_client(
    101             multiprocess_mode=multiprocess_mode,
    102             asyncio_mode=False,

/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core_client.py in make_client(multiprocess_mode, asyncio_mode, vllm_config, executor_class, log_stats)
     73 
     74         if multiprocess_mode and not asyncio_mode:
---> 75             return SyncMPClient(vllm_config, executor_class, log_stats)
     76 
     77         return InprocClient(vllm_config, executor_class, log_stats)

/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core_client.py in __init__(self, vllm_config, executor_class, log_stats)
    578     def __init__(self, vllm_config: VllmConfig, executor_class: type[Executor],
    579                  log_stats: bool):
--> 580         super().__init__(
    581             asyncio_mode=False,
    582             vllm_config=vllm_config,

/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core_client.py in __init__(self, asyncio_mode, vllm_config, executor_class, log_stats)
    416 
    417             # Wait for engine core process(es) to start.
--> 418             self._wait_for_engine_startup(output_address, parallel_config)
    419 
    420             self.utility_results: dict[int, AnyFuture] = {}

/usr/local/lib/python3.11/dist-packages/vllm/v1/engine/core_client.py in _wait_for_engine_startup(self, output_address, parallel_config)
    452                                  parallel_config: ParallelConfig):
    453         # Get a sync handle to the socket which can be sync or async.
--> 454         sync_input_socket = zmq.Socket.shadow(self.input_socket)
    455 
    456         # Wait for engine core process(es) to send ready messages.

/usr/local/lib/python3.11/dist-packages/zmq/sugar/socket.py in shadow(cls, address)
    166             copy_threshold=copy_threshold,
    167         )
--> 168         if self._shadow_obj and shadow_context:
    169             # keep self.context reference if shadowing a Socket object
    170             self.context = shadow_context

/usr/local/lib/python3.11/dist-packages/zmq/utils/interop.py in cast_int_addr(n)
     27             return int(ffi.cast("size_t", n))
     28 
---> 29     raise ValueError(f"Cannot cast {n!r} to int")

ValueError: Cannot cast <zmq.Socket(zmq.ROUTER) at 0x7ad308169fd0> to int
```

This issue happens 100% of the time on a new vllm install. If you restart the notebook and load the model without installing vllm again, then the issue is gone. This prohibit the notebook from executing on a clean state.

This issue appears after 0.8.4. I was testing new model so downgrading is not an option. Setting vllm to v0 triggers other OOM issues.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: ValueError: Cannot cast <zmq.Socket(zmq.ROUTER) at 0x796c63de24a0> to int #19444

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: ValueError: Cannot cast <zmq.Socket(zmq.ROUTER) at 0x796c63de24a0> to int #19444

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions