-
Notifications
You must be signed in to change notification settings - Fork 138
Closed
Description
按照simai的教程安装好环境,到aicb目录下执行测试脚本:
sh scripts/megatron_gpt.sh --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_addr localhost --master_port 29500 -m 7 --world_size 8 --tensor_model_parallel_size 2 --pipeline_model_parallel 1 --frame Megatron --global_batch 16 --micro_batch 1 --seq_length 2048 --swiglu --use_flash_attn --aiob_enable
异常如下:
Traceback (most recent call last):
File "/home/SimAI/aicb/./aicb.py", line 23, in <module>
from workload_generator.generate_megatron_workload import MegatronWorkload
File "/home/SimAI/aicb/workload_generator/generate_megatron_workload.py", line 20, in <module>
from utils.utils import CommGroup, CommType, get_params, WorkloadWriter, num_parameters_to_bytes
ImportError: cannot import name 'num_parameters_to_bytes' from 'utils.utils' (/home/SimAI/aicb/utils/utils.py)
E0630 11:04:00.986000 127638745409344 torch/distributed/elastic/multiprocessing/api.py:881] failed (exitcode: 1) local_rank: 0 (pid: 9876) of binary: /usr/bin/python
通过simai拉取的代码中aicb不兼容么,aicb工程也下也有类似的issue暂未解决,aicb下的readme中教程与实际代码版本、workload样例说明也不一致,谢谢
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels