Skip to content

aicb 执行异常 #149

@ltm920716

Description

@ltm920716

按照simai的教程安装好环境,到aicb目录下执行测试脚本:

sh scripts/megatron_gpt.sh --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_addr localhost --master_port 29500 -m 7 --world_size 8 --tensor_model_parallel_size 2 --pipeline_model_parallel 1 --frame Megatron --global_batch 16  --micro_batch 1 --seq_length 2048 --swiglu --use_flash_attn --aiob_enable

异常如下:

Traceback (most recent call last):
  File "/home/SimAI/aicb/./aicb.py", line 23, in <module>
    from workload_generator.generate_megatron_workload import MegatronWorkload
  File "/home/SimAI/aicb/workload_generator/generate_megatron_workload.py", line 20, in <module>
    from utils.utils import CommGroup, CommType, get_params, WorkloadWriter, num_parameters_to_bytes
ImportError: cannot import name 'num_parameters_to_bytes' from 'utils.utils' (/home/SimAI/aicb/utils/utils.py)
E0630 11:04:00.986000 127638745409344 torch/distributed/elastic/multiprocessing/api.py:881] failed (exitcode: 1) local_rank: 0 (pid: 9876) of binary: /usr/bin/python

通过simai拉取的代码中aicb不兼容么,aicb工程也下也有类似的issue暂未解决,aicb下的readme中教程与实际代码版本、workload样例说明也不一致,谢谢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions