-
Notifications
You must be signed in to change notification settings - Fork 88
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Steps to Reproduce
I'm trying to cluster two mac mini's and an RTX 4090 24GB together to run GPT-OSS 120b.
Somehow the RTX 4090 gets assigned more memory than it can handle.
I ran parallax install and parallax check to make sure my environment is up-to-date and correctly setup.
Expected Behavior
I expect the model to be scaled correctly according to the amount of memory a node has
Actual Behavior
Parallax crashes with the following stacktrace
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1014.00 MiB. GPU 0 has a total capacity of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 31.04 GiB is allocated by PyTorch, and 126.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Nov 10 09:03:46.891 [�[1m�[31mERROR �[0m] launch.py:175 CUDA out of memory. Tried to allocate 1014.00 MiB. GPU 0 has a total capacity of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 31.04 GiB is allocated by PyTorch, and 126.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
File "/root/parallax/src/parallax/launch.py", line 127, in <module>
executor = Executor.create_from_args(args, gradient_server=gradient_server)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/src/parallax/server/executor.py", line 289, in create_from_args
return cls(**create_executor_config(args, gradient_server))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/src/parallax/server/executor.py", line 116, in __init__
self.model_runner, self.config, self.tokenizer = initialize_sgl_model_runner(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/src/parallax/sglang/model_runner.py", line 296, in initialize_sgl_model_runner
model_runner = ParallaxModelRunner(
^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/src/parallax/sglang/model_runner.py", line 76, in __init__
super().__init__(
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 312, in __init__
self.initialize(min_per_gpu_memory)
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 384, in initialize
self.load_model()
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 739, in load_model
self.model = get_model(
^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 28, in get_model
return loader.load_model(
^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 590, in load_model
model = _initialize_model(
^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 262, in _initialize_model
return model_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/models/gpt_oss.py", line 586, in __init__
self.model = GptOssModel(
^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/models/gpt_oss.py", line 507, in __init__
self.layers, self.start_layer, self.end_layer = make_layers(
^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/utils/common.py", line 560, in make_layers
+ get_offloader().wrap_modules(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/utils/offloader.py", line 36, in wrap_modules
return list(all_modules_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/utils/common.py", line 562, in <genexpr>
layer_fn(idx=idx, prefix=add_prefix(idx, prefix))
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/models/gpt_oss.py", line 509, in <lambda>
lambda idx, prefix: decoder_layer_type(
^^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/models/gpt_oss.py", line 416, in __init__
self.mlp = GptOssSparseMoeBlock(
^^^^^^^^^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/models/gpt_oss.py", line 139, in __init__
self.experts = experts_type(
^^^^^^^^^^^^^
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 212, in __init__
self.quant_method.create_weights(
File "/root/parallax/venv/lib/python3.12/site-packages/sglang/srt/layers/quantization/mxfp4.py", line 313, in create_weights
torch.zeros(
File "/root/parallax/venv/lib/python3.12/site-packages/torch/utils/_device.py", line 103, in __torch_function__
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1014.00 MiB. GPU 0 has a total capacity of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 31.04 GiB is allocated by PyTorch, and 126.87 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Nov 10 09:03:46.892 [�[1m�[32mINFO �[0m] server.py:723 Leave scheduler: 12D3KooWGG6rkFt9c33wriGrnDquef4bkhWSwZTzRfmBue2jCpMy
[rank0]:[W1110 09:03:53.902193591 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[INFO] Successfully joined the distributed inference cluster.
Version
Environment & Context
- I'm using the latest version.
- I have searched existing issues.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working