Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

Open
1 task done
eldarkurtic opened this issue Mar 18, 2025 · 1 comment
Open
1 task done

[Bug]: DeepSeek-R1-AWQ broken in nightly #15002

eldarkurtic opened this issue Mar 18, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@eldarkurtic
Copy link
Contributor

Your current environment

Model: cognitivecomputations/DeepSeek-R1-AWQ
Served on 8xH100s: vllm serve "cognitivecomputations/DeepSeek-R1-AWQ" -tp 8 --gpu-memory-utilization 0.8 --max-model-len 4096 --enable-chunked-prefill --trust-remote-code --max-num-batched-tokens 4096 --dtype float16 --port 1234
Prompted with:

curl -X POST http://localhost:1234/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "cognitivecomputations/DeepSeek-R1-AWQ",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

🐛 Describe the bug

In vllm==0.7.3, everything looks good:

{"id":"chatcmpl-acf69072a2a545c4baedcf93efbde64e","object":"chat.completion","created":1742275076,"model":"local-awq","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Okay, so the question is asking for the capital of France. Let me think. I know France is a country in Europe. I've heard of Paris being mentioned a lot in relation to France. Wait, is Paris the capital? I remember seeing the Eiffel Tower in Paris, so maybe that's the capital. But let me double-check in my mind. Are there other major cities in France that could be the capital? Lyon, Marseille, maybe Toulouse? No, I think Paris is the most well-known. Also, I think the government and president are based in Paris. Yeah, I'm pretty sure it's Paris.\n</think>\n\nThe capital of France is **Paris**. Known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral, Paris serves as the country's political, cultural, and economic center. 🇫🇷","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":12,"total_tokens":192,"completion_tokens":180,"prompt_tokens_details":null},"prompt_logprobs":null}

In vllm==v0.8.0rc3.dev5+g5eeabc2a (V1) the model produces garbage:

{"id":"chatcmpl-b6e33ee4c1d64a3ab785cd54e2ed11fa","object":"chat.completion","created":1742274408,"model":"local-awq","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"<think>\n\n Bible_\n\n.<think>\n\n* R<think> down\n\n** \n\n\n\n\n<think>\n\n<think></think><think><think></think><think></think><think>\n\n�</think><|place▁holder▁no▁9|><think><think><think></think><think></think><think></think><think><|place▁holder▁no▁772|><think>\n\n�</think><|place▁holder▁no▁5|><think><think></think><think></think><think></think><|place▁holder▁no▁796|><think><think><|place▁holder▁no▁7|>⠀<think><|place▁holder▁no▁9|><think><think></think>\n<think>\n\n�\n\n##</think>\n<think></think><think></think>\n\n�\n</think><think></think>\n\n><think>\n\n<think>\n\n<think>\n\n</think>\n\n<think></think><think></think>\n\n>Hey?\n\n</think><think><|place▁holder▁no▁791|><think>\n\n<think>\n\n\n\n</think><think>\n\n<|place▁holder▁no▁795|>\n\n<think>\n</think>\n\n<think>\n\n<think></think>**<|place▁holder▁no▁795|>\n\n<|place▁holder▁no▁14|>\n\n<|place▁holder▁no▁795|>\n\n<|place▁holder▁no▁370|>\n<|place▁holder▁no▁795|>\n\n<|place▁holder▁no▁795|>\n\n\n\n\n\n<think></\n\n<think>\n\n<|place▁holder▁no▁20|><think><think><think><think></think></think></think><think>\n\n<|place▁holder▁no▁795|>\n\n<|place▁holder▁no▁284|>\n\n\n\n<|place▁holder▁no▁792|>.<think></think><|place▁holder▁no▁793|></think></think><think><|place▁holder▁no▁10|><think></think><think>\n-<|place▁holder▁no▁794|>;\\<think></think><think></think>\n\n\n\n\n<|place▁holder▁no▁765|><|place▁holder▁no▁793|>endum<|place▁holder▁no▁793|></think><think><|place▁holder▁no▁794|><|place▁holder▁no▁795|><|place▁holder▁no▁794|>\n\n\n:\n\n�\n\n<think>\n\n\n</think><think><|place▁holder▁no▁772|><think><|place▁holder▁no▁797|>\n\n<|place▁holder▁no▁10|><think></think><think></think><think>\n\n<|place▁holder▁no▁558|></think><think></think></think><|place▁holder▁no▁795|><think><|place▁holder▁no▁795|><think>\n\n°<|place▁holder▁no▁793|><think></think><|place▁holder▁no▁793|><think></think><think>\nH;**\n\n<think><|place▁holder▁no▁11|><|place▁holder▁no▁794|><think>®<think><think><|place▁holder▁no▁791|></think><|place▁holder▁no▁29|><think>\n<think><|place▁holder▁no▁792|><think><|place▁holder▁no▁792|><think><|place▁holder▁no▁5|></think><|place▁holder▁no▁147|><|place▁holder▁no▁794|>\n\nH<|place▁holder▁no▁795|><think></think><|place▁holder▁no▁365|>\n\n,<think>\n<|place▁holder▁no▁794|>;\n\n</think><think><think>\n\n<|place▁holder▁no▁792|><think><|place▁holder▁no▁793|>;<think><|place▁holder▁no▁793|>\n\n<|place▁holder▁no▁10|>,\n\n</think><think>;<|place▁holder▁no▁793|>\n\\u\n\n</think><|place▁holder▁no▁793|><|place▁holder▁no▁793|><|place▁holder▁no▁794|>\n\nThe\n\n9<|place▁holder▁no▁793|><think></think><|place▁holder▁no▁793|>\n\n<|place▁holder▁no▁795|>,<|place▁holder▁no▁793|></think>\n\n**<|place▁holder▁no▁793|><think>\n\n\\<|place▁holder▁no▁793|><|place▁holder▁no▁795|><|place▁holder▁no▁793|>\n\n><think><|place▁holder▁no▁794|><|place▁holder▁no▁792|><think>\n\n<|place▁holder▁no▁792|>\n\n<|place▁holder▁no▁795|>\n\n<|place▁holder▁no▁10|>\n\n,<|place▁holder▁no▁793|>  \n  \n,<|place▁holder▁no▁793|><\n\n>,<|place▁holder▁no▁793|>\n\n><think>\n\n>,; hydroxide</think><think>\n\n\"\n\n<|place▁holder▁no▁761|>\n\n°.<|place▁holder▁no▁775|><|place▁holder▁no▁793|>endum\n\n,<think></think><|place▁holder▁no▁795|>&#\n\n\n<|place▁holder▁no▁40|><|place▁holder▁no▁791|></think><think></think><|place▁holder▁no▁793|>\n\n,\n-<|place▁holder▁no▁796|><|place▁holder▁no▁793|></think>\n\n.<|place▁holder▁no▁27|> \n\n</\n\n</think><|place▁holder▁no▁792|>&#R<|place▁holder▁no▁792|><think><|place▁holder▁no▁796|><|place▁holder▁no▁27|>\n\n_\n\n \n</think><|place▁holder▁no▁793|>°,</think><think>\n\n><|place▁holder▁no▁793|>\n\n,�<|place▁holder▁no▁793|>\n\n -</think><|place▁holder▁no▁59|></think><|place▁holder▁no▁793|>\n\n⁻<|place▁holder▁no▁792|><|place▁holder▁no▁793|><|place▁holder▁no▁10|><think></think><|place▁holder▁no▁797|>,\n\n.**</think>\t<think><|place▁holder▁no▁793|><|place▁holder▁no▁793|>,<think>\n\n<|place▁holder▁no▁170|>\u0007</think><|place▁holder▁no▁793|>\n\n,<|place▁holder▁no▁793|>:>\n\n</think><think>\n\n,<|place▁holder▁no▁795|><|place▁holder▁no▁795|><|place▁holder▁no▁794|></think>\n\n,:<|place▁holder▁no▁793|>:<think></think><think>\n\n,.\n\n,\n\n,\n\n,\n\n,\n\n,.ate\n\n;<|place▁holder▁no▁795|>","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":12,"total_tokens":484,"completion_tokens":472,"prompt_tokens_details":null},"prompt_logprobs":null}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@eldarkurtic eldarkurtic added the bug Something isn't working label Mar 18, 2025
@eldarkurtic
Copy link
Contributor Author

FYI: I've also tried vllm==v0.8.0rc3.dev5+g5eeabc2a(V0 with VLLM_USE_V1=0), and in this case it can't even load the model.

❯ VLLM_USE_V1=0 vllm serve "local-awq"  -tp 8 --gpu-memory-utilization 0.8 --max-model-len 4096 --enable-chunked-prefill --trust-remote-code --max-num-batched-tokens 4096 --dtype float16 --port 1234
INFO 03-18 05:28:45 [__init__.py:256] Automatically detected platform cuda.
INFO 03-18 05:28:46 [api_server.py:972] vLLM API server version 0.8.0rc3.dev5+g5eeabc2a
INFO 03-18 05:28:46 [api_server.py:973] args: Namespace(subparser='serve', model_tag='local-awq', config='', host=None, port=1234, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='local-awq', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='float16', kv_cache_dtype='auto', max_model_len=4096, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=8, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.8, num_gpu_blocks_override=None, max_num_batched_tokens=4096, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=True, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7fab3e78d440>)
INFO 03-18 05:28:46 [config.py:208] Replacing legacy 'type' key with 'rope_type'
WARNING 03-18 05:28:46 [config.py:2583] Casting torch.bfloat16 to torch.float16.
INFO 03-18 05:28:52 [config.py:583] This model supports multiple tasks: {'generate', 'classify', 'score', 'reward', 'embed'}. Defaulting to 'generate'.
INFO 03-18 05:28:53 [awq_marlin.py:114] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.
INFO 03-18 05:28:53 [config.py:1499] Defaulting to use mp for distributed inference
INFO 03-18 05:28:53 [config.py:1677] Chunked prefill is enabled with max_num_batched_tokens=4096.
INFO 03-18 05:28:53 [cuda.py:159] Forcing kv cache block size to 64 for FlashMLA backend.
INFO 03-18 05:28:53 [api_server.py:236] Started engine process with PID 3278262
INFO 03-18 05:28:56 [__init__.py:256] Automatically detected platform cuda.
INFO 03-18 05:28:59 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.0rc3.dev5+g5eeabc2a) with config: model='local-awq', speculative_config=None, tokenizer='local-awq', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq_marlin, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=local-awq, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
WARNING 03-18 05:28:59 [multiproc_worker_utils.py:310] Reducing Torch parallelism from 192 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 03-18 05:28:59 [custom_cache_manager.py:19] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
ERROR 03-18 05:28:59 [engine.py:443] Can't pickle <class 'transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config'>: it's not the same object as transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config
ERROR 03-18 05:28:59 [engine.py:443] Traceback (most recent call last):
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 431, in run_mp_engine
ERROR 03-18 05:28:59 [engine.py:443]     engine = MQLLMEngine.from_vllm_config(
ERROR 03-18 05:28:59 [engine.py:443]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 126, in from_vllm_config
ERROR 03-18 05:28:59 [engine.py:443]     return cls(
ERROR 03-18 05:28:59 [engine.py:443]            ^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 80, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self.engine = LLMEngine(*args, **kwargs)
ERROR 03-18 05:28:59 [engine.py:443]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 280, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 03-18 05:28:59 [engine.py:443]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 271, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     super().__init__(*args, **kwargs)
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self._init_executor()
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 90, in _init_executor
ERROR 03-18 05:28:59 [engine.py:443]     worker = ProcessWorkerWrapper(result_handler,
ERROR 03-18 05:28:59 [engine.py:443]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/multiproc_worker_utils.py", line 171, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self.process.start()
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 121, in start
ERROR 03-18 05:28:59 [engine.py:443]     self._popen = self._Popen(self)
ERROR 03-18 05:28:59 [engine.py:443]                   ^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
ERROR 03-18 05:28:59 [engine.py:443]     return Popen(process_obj)
ERROR 03-18 05:28:59 [engine.py:443]            ^^^^^^^^^^^^^^^^^^
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     super().__init__(process_obj)
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
ERROR 03-18 05:28:59 [engine.py:443]     self._launch(process_obj)
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
ERROR 03-18 05:28:59 [engine.py:443]     reduction.dump(process_obj, fp)
ERROR 03-18 05:28:59 [engine.py:443]   File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
ERROR 03-18 05:28:59 [engine.py:443]     ForkingPickler(file, protocol).dump(obj)
ERROR 03-18 05:28:59 [engine.py:443] _pickle.PicklingError: Can't pickle <class 'transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config'>: it's not the same object as transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 445, in run_mp_engine
    raise e
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 431, in run_mp_engine
    engine = MQLLMEngine.from_vllm_config(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 126, in from_vllm_config
    return cls(
           ^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 80, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 280, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 271, in __init__
    super().__init__(*args, **kwargs)
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/mp_distributed_executor.py", line 90, in _init_executor
    worker = ProcessWorkerWrapper(result_handler,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/nvme2/eldar/hf_home/test/lib/python3.12/site-packages/vllm/executor/multiproc_worker_utils.py", line 171, in __init__
    self.process.start()
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/eldar/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config'>: it's not the same object as transformers_modules.local-awq.configuration_deepseek.DeepseekV3Config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

1 participant