New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
文字检测训练时候出现Initialize indexs of datasets:['./train_data/train_Label.txt']就直接结束程序了 #2184
Comments
可以把num worker改为0,然后把batch size改小一点,看下还会不会有问题 |
batch size改小后成功开始训练,感谢 |
我将batch size修改为1后成功运行,但是没多久又中断训练了,这是什么情况 |
补充:多次测试后发现将batch size设置为2时候能跑到第一轮训练的iter290;设置为1时能跑到590 |
eval的batch size和num worker也需要调整,另外,训练的时候,因为包含batch norm,不建议batch size小于16,否则效果可能会受到影响,num worker可以设置为0 |
我的batch size设置为16就回到了最开始的问题直接程序结束运行,按说1060不至于这样啊,还有eval的batch size不是注释了必须是1吗 |
我也遇到一样的问题训练会突然中断 [2021/03/31 15:35:12] root INFO: train with paddle 2.0.1 and device CPUPlace 就停了 |
@Xu-xunshan if idx >= len(train_dataloader): |
好的, 之前应该也有其他用户遇到这样的问题,我们记录下,感谢反馈~ |
* fix RCNN dygraph to static
[2021/03/06 20:22:25] root INFO: shuffle : True
[2021/03/06 20:22:25] root INFO: use_shared_memory : False
[2021/03/06 20:22:25] root INFO: train with paddle 2.0.0 and device CUDAPlace(0)
[2021/03/06 20:22:25] root INFO: Initialize indexs of datasets:['./train_data/train_Label.txt']
[2021/03/06 20:22:25] root INFO: Initialize indexs of datasets:['./train_data/test_Label.txt']
W0306 20:22:25.196138 4376 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.2, Runtime API Version: 11.0
W0306 20:22:25.205113 4376 device_context.cc:372] device: 0, cuDNN Version: 8.0.
[2021/03/06 20:22:27] root INFO: load pretrained model from ['./pretrain_models/MobileNetV3_large_x0_5_pretrained']
[2021/03/06 20:22:27] root INFO: train dataloader has 60 iters, valid dataloader has 100 iters
[2021/03/06 20:22:27] root INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations
[2021/03/06 20:22:27] root INFO: Initialize indexs of datasets:['./train_data/train_Label.txt']
运行到这边会卡一下然后结束程序
我观察了一下显卡的内存利用率挺正常,但是CUDA的利用率显示为一瞬间的使用
小白求解
The text was updated successfully, but these errors were encountered: