[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

eldarkurtic · 2025-03-18T05:24:15Z

Your current environment

Model: cognitivecomputations/DeepSeek-R1-AWQ
Served on 8xH100s: vllm serve "cognitivecomputations/DeepSeek-R1-AWQ" -tp 8 --gpu-memory-utilization 0.8 --max-model-len 4096 --enable-chunked-prefill --trust-remote-code --max-num-batched-tokens 4096 --dtype float16 --port 1234
Prompted with:

curl -X POST http://localhost:1234/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "cognitivecomputations/DeepSeek-R1-AWQ",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

🐛 Describe the bug

In vllm==0.7.3, everything looks good:

{"id":"chatcmpl-acf69072a2a545c4baedcf93efbde64e","object":"chat.completion","created":1742275076,"model":"local-awq","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, so the question is asking for the capital of France. Let me think. I know France is a country in Europe. I've heard of Paris being mentioned a lot in relation to France. Wait, is Paris the capital? I remember seeing the Eiffel Tower in Paris, so maybe that's the capital. But let me double-check in my mind. Are there other major cities in France that could be the capital? Lyon, Marseille, maybe Toulouse? No, I think Paris is the most well-known. Also, I think the government and president are based in Paris. Yeah, I'm pretty sure it's Paris.\n</think>\n\nThe capital of France is **Paris**. Known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral, Paris serves as the country's political, cultural, and economic center. 🇫🇷","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":12,"total_tokens":192,"completion_tokens":180,"prompt_tokens_details":null},"prompt_logprobs":null}

In vllm==v0.8.0rc3.dev5+g5eeabc2a (V1) the model produces garbage:

{"id":"chatcmpl-b6e33ee4c1d64a3ab785cd54e2ed11fa","object":"chat.completion","created":1742274408,"model":"local-awq","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"<think>\n\n Bible_\n\n.<think>\n\n* R<think> down\n\n** \n\n\n\n\n<think>\n\n<think></think><think><think></think><think></think><think>\n\n�</think><｜place▁holder▁no▁9｜><think><think><think></think><think></think><think></think><think><｜place▁holder▁no▁772｜><think>\n\n�</think><｜place▁holder▁no▁5｜><think><think></think><think></think><think></think><｜place▁holder▁no▁796｜><think><think><｜place▁holder▁no▁7｜>⠀<think><｜place▁holder▁no▁9｜><think><think></think>\n<think>\n\n�\n\n##</think>\n<think></think><think></think>\n\n�\n</think><think></think>\n\n><think>\n\n<think>\n\n<think>\n\n</think>\n\n<think></think><think></think>\n\n>Hey?\n\n</think><think><｜place▁holder▁no▁791｜><think>\n\n<think>\n\n\n\n</think><think>\n\n<｜place▁holder▁no▁795｜>\n\n<think>\n</think>\n\n<think>\n\n<think></think>**<｜place▁holder▁no▁795｜>\n\n<｜place▁holder▁no▁14｜>\n\n<｜place▁holder▁no▁795｜>\n\n<｜place▁holder▁no▁370｜>\n<｜place▁holder▁no▁795｜>\n\n<｜place▁holder▁no▁795｜>\n\n\n\n\n\n<think></\n\n<think>\n\n<｜place▁holder▁no▁20｜><think><think><think><think></think></think></think><think>\n\n<｜place▁holder▁no▁795｜>\n\n<｜place▁holder▁no▁284｜>\n\n\n\n<｜place▁holder▁no▁792｜>.<think></think><｜place▁holder▁no▁793｜></think></think><think><｜place▁holder▁no▁10｜><think></think><think>\n-<｜place▁holder▁no▁794｜>;\\<think></think><think></think>\n\n\n\n\n<｜place▁holder▁no▁765｜><｜place▁holder▁no▁793｜>endum<｜place▁holder▁no▁793｜></think><think><｜place▁holder▁no▁794｜><｜place▁holder▁no▁795｜><｜place▁holder▁no▁794｜>\n\n\n:\n\n�\n\n<think>\n\n\n</think><think><｜place▁holder▁no▁772｜><think><｜place▁holder▁no▁797｜>\n\n<｜place▁holder▁no▁10｜><think></think><think></think><think>\n\n<｜place▁holder▁no▁558｜></think><think></think></think><｜place▁holder▁no▁795｜><think><｜place▁holder▁no▁795｜><think>\n\n°<｜place▁holder▁no▁793｜><think></think><｜place▁holder▁no▁793｜><think></think><think>\nH;**\n\n<think><｜place▁holder▁no▁11｜><｜place▁holder▁no▁794｜><think>®<think><think><｜place▁holder▁no▁791｜></think><｜place▁holder▁no▁29｜><think>\n<think><｜place▁holder▁no▁792｜><think><｜place▁holder▁no▁792｜><think><｜place▁holder▁no▁5｜></think><｜place▁holder▁no▁147｜><｜place▁holder▁no▁794｜>\n\nH<｜place▁holder▁no▁795｜><think></think><｜place▁holder▁no▁365｜>\n\n,<think>\n<｜place▁holder▁no▁794｜>;\n\n</think><think><think>\n\n<｜place▁holder▁no▁792｜><think><｜place▁holder▁no▁793｜>;<think><｜place▁holder▁no▁793｜>\n\n<｜place▁holder▁no▁10｜>,\n\n</think><think>;<｜place▁holder▁no▁793｜>\n\\u\n\n</think><｜place▁holder▁no▁793｜><｜place▁holder▁no▁793｜><｜place▁holder▁no▁794｜>\n\nThe\n\n9<｜place▁holder▁no▁793｜><think></think><｜place▁holder▁no▁793｜>\n\n<｜place▁holder▁no▁795｜>,<｜place▁holder▁no▁793｜></think>\n\n**<｜place▁holder▁no▁793｜><think>\n\n\\<｜place▁holder▁no▁793｜><｜place▁holder▁no▁795｜><｜place▁holder▁no▁793｜>\n\n><think><｜place▁holder▁no▁794｜><｜place▁holder▁no▁792｜><think>\n\n<｜place▁holder▁no▁792｜>\n\n<｜place▁holder▁no▁795｜>\n\n<｜place▁holder▁no▁10｜>\n\n,<｜place▁holder▁no▁793｜>  \n  \n,<｜place▁holder▁no▁793｜><\n\n>,<｜place▁holder▁no▁793｜>\n\n><think>\n\n>,; hydroxide</think><think>\n\n\"\n\n<｜place▁holder▁no▁761｜>\n\n°.<｜place▁holder▁no▁775｜><｜place▁holder▁no▁793｜>endum\n\n,<think></think><｜place▁holder▁no▁795｜>&#\n\n\n<｜place▁holder▁no▁40｜><｜place▁holder▁no▁791｜></think><think></think><｜place▁holder▁no▁793｜>\n\n,\n-<｜place▁holder▁no▁796｜><｜place▁holder▁no▁793｜></think>\n\n.<｜place▁holder▁no▁27｜> \n\n</\n\n</think><｜place▁holder▁no▁792｜>&#R<｜place▁holder▁no▁792｜><think><｜place▁holder▁no▁796｜><｜place▁holder▁no▁27｜>\n\n_\n\n \n</think><｜place▁holder▁no▁793｜>°,</think><think>\n\n><｜place▁holder▁no▁793｜>\n\n,�<｜place▁holder▁no▁793｜>\n\n -</think><｜place▁holder▁no▁59｜></think><｜place▁holder▁no▁793｜>\n\n⁻<｜place▁holder▁no▁792｜><｜place▁holder▁no▁793｜><｜place▁holder▁no▁10｜><think></think><｜place▁holder▁no▁797｜>,\n\n.**</think>\t<think><｜place▁holder▁no▁793｜><｜place▁holder▁no▁793｜>,<think>\n\n<｜place▁holder▁no▁170｜>\u0007</think><｜place▁holder▁no▁793｜>\n\n,<｜place▁holder▁no▁793｜>:>\n\n</think><think>\n\n,<｜place▁holder▁no▁795｜><｜place▁holder▁no▁795｜><｜place▁holder▁no▁794｜></think>\n\n,:<｜place▁holder▁no▁793｜>:<think></think><think>\n\n,.\n\n,\n\n,\n\n,\n\n,\n\n,.ate\n\n;<｜place▁holder▁no▁795｜>","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":12,"total_tokens":484,"completion_tokens":472,"prompt_tokens_details":null},"prompt_logprobs":null}

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

eldarkurtic · 2025-03-18T05:30:59Z

FYI: I've also tried vllm==v0.8.0rc3.dev5+g5eeabc2a(V0 with VLLM_USE_V1=0), and in this case it can't even load the model.

❯ VLLM_USE_V1=0 vllm serve "local-awq"  -tp 8 --gpu-memory-utilization 0.8 --max-model-len 4096 --enable-chunked-prefill --trust-remote-code --max-num-batched-tokens 4096 --dtype float16 --port 1234
INFO 03-18 05:28:45 [__init__.py:256] Automatically detected platform cuda.
INFO 03-18 05:28:46 [api_server.py:972] vLLM API server version 0.8.0rc3.dev5+g5eeabc2a
INFO 03-18 05:28:46 [api_server.py:973] args: Namespace(subparser='serve', model_tag='local-awq', config='', host=None, port=1234, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='local-awq', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='float16', kv_cache_dtype='auto', max_model_len=4096, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=8, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.8, num_gpu_blocks_override=None, max_num_batched_tokens=4096, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=True, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7fab3e78d440>)
INFO 03-18 05:28:46 [config.py:208] Replacing legacy 'type' key with 'rope_type'
WARNING 03-18 05:28:46 [config.py:2583] Casting torch.bfloat16 to torch.float16.
INFO 03-18 05:28:52 [config.py:583] This model supports multiple tasks: {'generate', 'classify', 'score', 'reward', 'embed'}. Defaulting to 'generate'.
INFO 03-18 05:28:53 [awq_marlin.py:114] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.
INFO 03-18 05:28:53 [config.py:1499] Defaulting to use mp for distributed inference
INFO 03-18 05:28:53 [config.py:1677] Chunked prefill is enabled with max_num_batched_tokens=4096.
INFO 03-18 05:28:53 [cuda.py:159] Forcing kv cache block size to 64 for FlashMLA backend.
INFO 03-18 05:28:53 [api_server.py:236] Started engine process with PID 3278262
INFO 03-18 05:28:56 [__init__.py:256] Automatically detected platform cuda.
INFO 03-18 05:28:59 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.0rc3.dev5+g5eeabc2a) with config: model='local-awq', speculative_config=None, tokenizer='local-awq', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq_marlin, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=local-awq, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
WARNING 03-18 05:28:59 [multiproc_worker_utils.py:310] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 03-18 05:28:59 [custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
ERROR 03-18 05:28:59 [engine.py:443] Can't pickle <class 'transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config'>: it's not the same object as transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config
ERROR 03-18 05:28:59 [engine.py:443] Traceback (most recent call last):
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 431, in run_mp_engine
ERROR 03-18 05:28:59 [engine.py:443]     engine = MQLLMEngine.from_vllm_config(
ERROR 03-18 05:28:59 [engine.py:443]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 126, in from_vllm_config
ERROR 03-18 05:28:59 [engine.py:443]     return cls(
ERROR 03-18 05:28:59 [engine.py:443]            ^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 80, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self.engine = LLMEngine(*args, **kwargs)
ERROR 03-18 05:28:59 [engine.py:443]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 280, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 03-18 05:28:59 [engine.py:443]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 271, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     super().__init__(*args, **kwargs)
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self._init_executor()
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 90, in _init_executor
ERROR 03-18 05:28:59 [engine.py:443]     worker = ProcessWorkerWrapper(result_handler,
ERROR 03-18 05:28:59 [engine.py:443]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/multiproc_worker_utils.py", line 171, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self.process.start()
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 121, in start
ERROR 03-18 05:28:59 [engine.py:443]     self._popen = self._Popen(self)
ERROR 03-18 05:28:59 [engine.py:443]                   ^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
ERROR 03-18 05:28:59 [engine.py:443]     return Popen(process_obj)
ERROR 03-18 05:28:59 [engine.py:443]            ^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     super().__init__(process_obj)
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self._launch(process_obj)
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
ERROR 03-18 05:28:59 [engine.py:443]     reduction.dump(process_obj, fp)
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
ERROR 03-18 05:28:59 [engine.py:443]     ForkingPickler(file, protocol).dump(obj)
ERROR 03-18 05:28:59 [engine.py:443] _pickle.PicklingError: Can't pickle <class 'transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config'>: it's not the same object as transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 445, in run_mp_engine
    raise e
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 431, in run_mp_engine
    engine = MQLLMEngine.from_vllm_config(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 126, in from_vllm_config
    return cls(
           ^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 80, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 280, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 271, in __init__
    super().__init__(*args, **kwargs)
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 90, in _init_executor
    worker = ProcessWorkerWrapper(result_handler,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/multiproc_worker_utils.py", line 171, in __init__
    self.process.start()
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config'>: it's not the same object as transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config

NaiveYan · 2025-03-20T01:13:13Z

Same issue occurs in both the v0.8.0 and v0.8.1 release versions.

wangjiyang · 2025-03-20T02:35:12Z

Same issue here.

eldarkurtic added the bug label Mar 18, 2025

github-project-automation bot added this to DeepSeek V3/R1 Mar 18, 2025

github-project-automation bot moved this to Backlog in DeepSeek V3/R1 Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

eldarkurtic commented Mar 18, 2025

eldarkurtic commented Mar 18, 2025

NaiveYan commented Mar 20, 2025 •

edited

Loading

wangjiyang commented Mar 20, 2025

[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

Comments

eldarkurtic commented Mar 18, 2025

Your current environment

🐛 Describe the bug

Before submitting a new issue...

eldarkurtic commented Mar 18, 2025

NaiveYan commented Mar 20, 2025 • edited Loading

wangjiyang commented Mar 20, 2025

NaiveYan commented Mar 20, 2025 •

edited

Loading