Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bvar is busy at sampling for 2 seconds #2701

Closed
pengfan7758258 opened this issue Jul 1, 2022 · 8 comments
Closed

bvar is busy at sampling for 2 seconds #2701

pengfan7758258 opened this issue Jul 1, 2022 · 8 comments
Labels

Comments

@pengfan7758258
Copy link

版本、环境信息
1)PaddleNLP和PaddlePaddle版本:paddlenlp 2.3.3,paddlepaddle-gpu 2.3.0
2)系统环境:Linux-ubuntu,python 3.8.13

是在uie上做的finetune
运行的命令是复制的官网给的例子

python -u -m paddle.distributed.launch --gpus "0" finetune.py \
  --train_path ./data/train.txt \
  --dev_path ./data/dev.txt \
  --save_dir ./checkpoint \
  --learning_rate 1e-5 \
  --batch_size 16 \
  --max_seq_len 512 \
  --num_epochs 100 \
  --model uie-base \
  --seed 1000 \
  --logging_steps 10 \
  --valid_steps 100 \
  --device gpu

运行的log
log

@LemonNoel
Copy link
Contributor

当前目录下的log/workerlog.0文件里有其他报错信息吗?

没有的话,可以跑下下边命令看看paddle是否安装成功。

import paddle
paddle.utils.run_check()

或者把paddlenlp升级到最新版本pip install paddlenlp==2.3.4

@pengfan7758258
Copy link
Author

@LemonNoel ,显示如下
log

@pengfan7758258
Copy link
Author

@LemonNoel 补充一下就是我前面在微调的时候,被指定训练的gpu现存已经占用了
log

@LemonNoel
Copy link
Contributor

@LemonNoel ,显示如下 log

看起来NCCL安装有问题,可以试下用conda来安装paddlepaddle-gpu,然后再测下看看是否在多卡上安装成功了。

@pengfan7758258
Copy link
Author

@LemonNoel
这个NCCL是否需要单独安装,我重新创建了conda的虚拟环境也重新安装了paddlepaddle-gpu也是同样的错误

@LemonNoel
Copy link
Contributor

是的,NCCL需要重新安装。可以参考下Nvidia的官方文档 https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html ,或者试试用conda安装 https://libraries.io/conda/nccl

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Dec 9, 2022
@github-actions
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants