Skip to content

AssertionError: pipeline_model parallel group is ot initialized #330

@jiaqiw09

Description

@jiaqiw09

When run https://github.com/alibaba/ROLL/blob/main/examples/qwen2.5-7B-sft_megatron/sft_config.yaml with megatron backend. I will met a error reported AssertionError: pipeline_model parallel group is ot initialized.

Here is the traceback.

Traceback (most recent call last):
  File "/ROLL/roll/distributed/scheduler/decorator.py", line 296, in inner [repeated 7x across cluster]
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^ 
  File "/ROLL/roll/distributed/strategy/megatron_strategy.py", line 978, in initialize [repeated 14x across cluster]
    self.strategy.initialize(model_provider=default_actor_model_provider) [repeated 7x across cluster]
    self.forward_backward_func = get_forward_backward_func() 
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/lib/python3.11/site-packages/megatron/core/pipeline_parallel/schedules.py", line 114, in get_forward_backward_func [repeated 7x across cluster]
    pipeline_model_parallel_size = parallel_state.get_pipeline_model_parallel_world_size() 
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/lib/python3.11/site-packages/megatron/core/parallel_state.py", line 1559, in get_pipeline_model_parallel_world_size [repeated 7x across cluster]
    pp_group = get_pipeline_model_parallel_group()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
  File "/lib/python3.11/site-packages/megatron/core/parallel_state.py", line 1400, in get_pipeline_model_parallel_group [repeated 7x across cluster]
    _PIPELINE_MODEL_PARALLEL_GROUP is not None
AssertionError: pipeline_model parallel group is not initialized

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions