-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Describe the bug
((base) amd00@mz00:~/model$ git clone https://www.modelscope.cn/AI-ModelScope/gpt2.git
(ds_ex) amd00@mz00:~/repo/DeepSpeed$ deepspeed --autotuning tune --num_nodes=1 --num_gpus=4 /home/amd00/repo/transformers/examples/pytorch/language-modeling/run_clm.py --model_name_or_path /home/amd00/model/gpt2
--deepspeed deepspeed/autotuning/config_templates/zero3.json
--do_train --do_eval --fp16 --per_device_train_batch_size 8 --gradient_accumulation_steps 1
/home/amd00/anaconda3/envs/ds_ex/bin/deepspeed:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').require('deepspeed==0.16.8+ee492c30')
[2025-05-03 09:41:17,131] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-03 09:41:17,191] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-03 09:41:20,466] [WARNING] [runner.py:215:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2025-05-03 09:41:20,467] [INFO] [autotuner.py:71:init] Created autotuning experiments directory: autotuning_exps
[2025-05-03 09:41:20,467] [INFO] [autotuner.py:84:init] Created autotuning results directory: autotuning_exps
[2025-05-03 09:41:20,467] [INFO] [autotuner.py:200:_get_resource_manager] active_resources = OrderedDict([('localhost', [0, 1, 2, 3])])
[2025-05-03 09:41:20,468] [INFO] [runner.py:392:run_autotuning] [Start] Running autotuning
[2025-05-03 09:41:20,468] [INFO] [autotuner.py:413:tune] Fast mode is enabled. Tuning micro batch size only.
[2025-05-03 09:41:20,468] [INFO] [autotuner.py:669:model_info_profile_run] Starting model info profile run.
0%| | 0/1 [00:00<?, ?it/s][2025-05-03 09:41:20,483] [INFO] [scheduler.py:346:run_experiment] Scheduler wrote ds_config to autotuning_results/profile_model_info/ds_config.json, /home/amd00/repo/DeepSpeed/autotuning_results/profile_model_info/ds_config.json
[2025-05-03 09:41:20,485] [INFO] [scheduler.py:353:run_experiment] Scheduler wrote exp to autotuning_results/profile_model_info/exp.json, /home/amd00/repo/DeepSpeed/autotuning_results/profile_model_info/exp.json
[2025-05-03 09:41:20,487] [INFO] [scheduler.py:380:run_experiment] Launching exp_id = 0, exp_name = profile_model_info, with resource = localhost:0,1,2,3, and ds_config = /home/amd00/repo/DeepSpeed/autotuning_results/profile_model_info/ds_config.json
[2025-05-03 09:41:47,809] [INFO] [scheduler.py:432:clean_up] Done cleaning up exp_id = 0 on the following workers: localhost
[2025-05-03 09:41:47,809] [INFO] [scheduler.py:395:run_experiment] Done running exp_id = 0, exp_name = profile_model_info, with resource = localhost:0,1,2,3
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:35<00:00, 35.02s/it]
[2025-05-03 09:41:55,501] [ERROR] [autotuner.py:700:model_info_profile_run] The model is not runnable with DeepSpeed with error = (
[2025-05-03 09:41:55,502] [INFO] [runner.py:397:run_autotuning] [End] Running autotuning
(ds_ex) amd00@mz00:~/repo/DeepSpeed$
ds_report output
(ds_ex) amd00@mz00:~/repo/DeepSpeed$ ds_report
/home/amd00/anaconda3/envs/ds_ex/bin/ds_report:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').require('deepspeed==0.16.8+ee492c30')
[2025-05-03 09:43:50,328] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-03 09:43:50,392] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
dc ..................... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] FP Quantizer is using an untested triton version (3.3.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.7
[WARNING] using untested triton version (3.3.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/home/amd00/anaconda3/envs/ds_ex/lib/python3.10/site-packages/torch']
torch version .................... 2.7.0+cu126
deepspeed install path ........... ['/home/amd00/repo/DeepSpeed/deepspeed']
deepspeed info ................... 0.16.8+ee492c30, ee492c3, master
torch cuda version ............... 12.6
torch hip version ................ None
nvcc version ..................... 12.6
deepspeed wheel compiled w. ...... torch 2.7, cuda 12.6
shared memory (/dev/shm) size .... 497.78 GB
(ds_ex) amd00@mz00:~/repo/DeepSpeed$
Additional context
zero3.json:
{
"autotuning": {
"enabled": true,
"arg_mappings": {
"train_micro_batch_size_per_gpu": "--per_device_train_batch_size",
"gradient_accumulation_steps ": "--gradient_accumulation_steps"
}
},
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": 1,
"fp16": {
"enabled": "auto"
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"activation_checkpointing": {
"partition_activations": true,
"cpu_checkpointing": true,
"contiguous_memory_optimization": false,
"synchronize_checkpoint_boundary": false
},
"zero_optimization": {
"stage": 3,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": false,
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_prefetch_bucket_size": 5e8,
"stage3_param_persistence_threshold": 1e6,
"stage3_gather_16bit_weights_on_model_save": false,
"sub_group_size": 1e12
}
}