forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
运行时无法找到fused_kernels/build/scaled_upper_triang_masked_softmax_cuda.so #28
Comments
看起来编译失败了:1.11.1.git.kitware.jobserver-1 |
请问这种情况要怎么解决呢? |
搜一搜这个报错?
|
您好,请问您是如何判定这条输出是一个异常的呢?我搜了一下感觉这是一个正常的输出? |
您好,后来检查了一下发现是由于apex安装失败所导致的问题,重装apex之后这个问题得到解决。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
您好,我在运行您的代码时遇到了如题目所示的错误,在对应的文件夹中只能找到build.ninja文件,请问这是因为什么?有尝试过删除整个build文件夹再运行,但是仍无法解决这个问题
具体日志如下所示:
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
using world size: 2, data-parallel-size: 1, tensor-model-parallel size: 2, pipeline-model-parallel size: 1
WARNING: overriding default arguments for tokenizer_type:GPT2BPETokenizer with tokenizer_type:PretrainedFromHF
accumulate and all-reduce gradients in fp32 for bfloat16 data type.
using torch.bfloat16 for parameters ...
Disable contiguous grad buffer in DDP, since the optimizer would handle it.
Disable gradient accumulation fusion, since the optimizer would handle it.
------------------------ arguments ------------------------
accumulate_allreduce_grads_in_fp32 .............. True
adam_beta1 ...................................... 0.9
adam_beta2 ...................................... 0.95
adam_eps ........................................ 1e-08
add_bias_linear ................................. False
add_position_embedding .......................... False
adlr_autoresume ................................. False
adlr_autoresume_interval ........................ 1000
apply_layernorm_1p .............................. False
apply_query_key_layer_scaling ................... True
apply_residual_connection_post_layernorm ........ False
async_tensor_model_parallel_allreduce ........... False
attention_dropout ............................... 0.0
attention_softmax_in_fp32 ....................... False
barrier_with_L1_time ............................ True
bert_binary_head ................................ True
bert_embedder_type .............................. megatron
bert_load ....................................... None
bf16 ............................................ True
bias_dropout_fusion ............................. True
bias_gelu_fusion ................................ False
biencoder_projection_dim ........................ 0
biencoder_shared_query_context_model ............ False
block_data_path ................................. None
causal_lm ....................................... True
checkpoint_dir_name ............................. None
classes_fraction ................................ 1.0
clip_grad ....................................... 1.0
consumed_train_samples .......................... 0
consumed_valid_samples .......................... 0
data_impl ....................................... mmap
data_parallel_random_init ....................... False
data_parallel_size .............................. 1
data_path ....................................... ['1', '/data1/projects/Megatron-DeepSpeed-main/my-train-data/my-gpt2_text_document']
data_per_class_fraction ......................... 1.0
data_sharding ................................... True
dataloader_type ................................. cyclic
DDP_impl ........................................ local
decoder_num_layers .............................. None
decoder_seq_length .............................. None
dino_bottleneck_size ............................ 256
dino_freeze_last_layer .......................... 1
dino_head_hidden_size ........................... 2048
dino_local_crops_number ......................... 10
dino_local_img_size ............................. 96
dino_norm_last_layer ............................ False
dino_teacher_temp ............................... 0.07
dino_warmup_teacher_temp ........................ 0.04
dino_warmup_teacher_temp_epochs ................. 30
distribute_saved_activations .................... False
distributed_backend ............................. nccl
distributed_checkpointing ....................... False
distributed_timeout_minutes ..................... 10
embedding_path .................................. None
empty_unused_memory_level ....................... 0
encoder_num_layers .............................. 40
encoder_seq_length .............................. 2048
end_weight_decay ................................ 0.1
eod_mask_loss ................................... False
eval_interval ................................... 1000
eval_iters ...................................... 10
evidence_data_path .............................. None
exit_duration_in_mins ........................... None
exit_interval ................................... None
exit_on_missing_checkpoint ...................... False
exit_signal_handler ............................. False
ffn_hidden_size ................................. 13824
finetune ........................................ True
fp16 ............................................ False
fp16_lm_cross_entropy ........................... False
fp32_residual_connection ........................ False
fp8_amax_compute_algo ........................... most_recent
fp8_amax_history_len ............................ 1
fp8_e4m3 ........................................ False
fp8_hybrid ...................................... False
fp8_interval .................................... 1
fp8_margin ...................................... 0
fp8_wgrad ....................................... True
global_batch_size ............................... 16
gradient_accumulation_fusion .................... False
head_lr_mult .................................... 1.0
hidden_dropout .................................. 0.0
hidden_size ..................................... 5120
hysteresis ...................................... 2
ict_head_size ................................... None
ict_load ........................................ None
img_h ........................................... 224
img_w ........................................... 224
indexer_batch_size .............................. 128
indexer_log_interval ............................ 1000
inference_batch_times_seqlen_threshold .......... 512
init_method_std ................................. 0.01
init_method_xavier_uniform ...................... False
initial_loss_scale .............................. 4294967296
iter_per_epoch .................................. 1250
job_name ........................................ LLaMA_tp2_pp1_mbs2_gpus2
kv_channels ..................................... 128
last_bucket_split_count ......................... None
layernorm_epsilon ............................... 1e-06
lazy_mpu_init ................................... None
load ............................................ /data1/projects/Megatron-DeepSpeed-main/llama_33b
local_rank ...................................... None
log_batch_size_to_tensorboard ................... True
log_interval .................................... 1
log_learning_rate_to_tensorboard ................ True
log_loss_scale_to_tensorboard ................... True
log_memory_to_tensorboard ....................... False
log_num_zeros_in_grad ........................... False
log_params_norm ................................. False
log_timers_to_tensorboard ....................... True
log_validation_ppl_to_tensorboard ............... True
log_world_size_to_tensorboard ................... False
loss_scale ...................................... None
loss_scale_window ............................... 1000
lr .............................................. 6e-05
lr_decay_iters .................................. 10
lr_decay_samples ................................ None
lr_decay_style .................................. cosine
lr_warmup_fraction .............................. None
lr_warmup_iters ................................. 5
lr_warmup_samples ............................... 0
make_vocab_size_divisible_by .................... 1
mask_factor ..................................... 1.0
mask_prob ....................................... 0.15
mask_type ....................................... random
masked_softmax_fusion ........................... True
max_position_embeddings ......................... 2048
max_tokens_to_oom ............................... 12000
merge_file ...................................... None
micro_batch_size ................................ 2
min_loss_scale .................................. 1.0
min_lr .......................................... 6e-06
mmap_warmup ..................................... False
no_load_optim ................................... True
no_load_rng ..................................... None
no_persist_layer_norm ........................... False
no_save_optim ................................... None
no_save_rng ..................................... None
num_attention_heads ............................. 40
num_channels .................................... 3
num_classes ..................................... 1000
num_experts ..................................... None
num_layers ...................................... 40
num_layers_per_virtual_pipeline_stage ........... None
num_workers ..................................... 2
onnx_safe ....................................... None
openai_gelu ..................................... False
optimizer ....................................... adam
output_bert_embeddings .......................... False
overlapped_distributed_optimizer ................ True
override_opt_param_scheduler .................... True
params_dtype .................................... torch.bfloat16
patch_dim ....................................... 16
perform_initialization .......................... True
pipeline_model_parallel_size .................... 1
pipeline_model_parallel_split_rank .............. None
query_in_block_prob ............................. 0.1
rampup_batch_size ............................... None
rank ............................................ 0
recompute_granularity ........................... selective
recompute_method ................................ None
recompute_num_layers ............................ 1
reduce_bucket_size .............................. 200000000.0
reset_attention_mask ............................ False
reset_position_ids .............................. False
retriever_report_topk_accuracies ................ []
retriever_score_scaling ......................... False
retriever_seq_length ............................ 256
retro_add_retriever ............................. False
retro_cyclic_train_iters ........................ None
retro_encoder_attention_dropout ................. 0.1
retro_encoder_hidden_dropout .................... 0.1
retro_encoder_layers ............................ 2
retro_num_neighbors ............................. 2
retro_num_retrieved_chunks ...................... 2
retro_return_doc_ids ............................ False
retro_workdir ................................... None
RMSNorm ......................................... True
rotary_percent .................................. 1.0
sample_rate ..................................... 1.0
save ............................................ /data1/projects/Megatron-DeepSpeed-main/save_models
save_interval ................................... 100
scatter_gather_tensors_in_pipeline .............. True
seed ............................................ 1234
seq_length ...................................... 2048
sequence_parallel ............................... True
sgd_momentum .................................... 0.9
short_seq_prob .................................. 0.1
split ........................................... 98,2,0
squared_relu .................................... False
standalone_embedding_stage ...................... False
start_weight_decay .............................. 0.1
swiglu .......................................... True
swin_backbone_type .............................. tiny
tensor_model_parallel_size ...................... 2
tensorboard_dir ................................. /data1/projects/Megatron-DeepSpeed-main/tensorboard_data
tensorboard_log_interval ........................ 1
tensorboard_queue_size .......................... 1000
test_data_path .................................. None
timing_log_level ................................ 0
timing_log_option ............................... minmax
titles_data_path ................................ None
tokenizer_model ................................. None
tokenizer_name_or_path .......................... /data1/projects/Megatron-DeepSpeed-main/33B_tokenizer
tokenizer_type .................................. PretrainedFromHF
train_data_path ................................. None
train_iters ..................................... 1000
train_samples ................................... None
transformer_impl ................................ local
transformer_pipeline_model_parallel_size ........ 1
untie_embeddings_and_output_weights ............. False
use_checkpoint_args ............................. False
use_checkpoint_opt_param_scheduler .............. False
use_contiguous_buffers_in_local_ddp ............. False
use_cpu_initialization .......................... None
use_distributed_optimizer ....................... False
use_flash_attn .................................. True
use_one_sent_docs ............................... False
use_ring_exchange_p2p ........................... False
use_rotary_position_embeddings .................. True
valid_data_path ................................. None
variable_seq_lengths ............................ False
verify_grad_order ............................... False
virtual_pipeline_model_parallel_size ............ None
vision_backbone_type ............................ vit
vision_pretraining .............................. False
vision_pretraining_type ......................... classify
vocab_extra_ids ................................. 0
vocab_file ...................................... None
vocab_size ...................................... None
weight_decay .................................... 0.1
weight_decay_incr_style ......................... constant
world_size ...................................... 2
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 8
The text was updated successfully, but these errors were encountered: