Segmentation fault (core dumped) #57

chenyangMl · 2020-05-18T09:51:38Z

您好，follow detection.md文档进行ocr训练时，运行训练命令遇到了以下问题，特来请教。
通过FLAGS_selected_gpus 指定显卡和修改train.py里面的place = fluid.CUDAPlace(3) if use_gpu else fluid.CPUPlace()运行训练都会出现这个问题。
运行use_gpu=False也出现了下面的问题。
FLAGS_selected_gpus=3 && python3 tools/train.py -c configs/det/det_mv3_db.yml -o Optimizer.base_lr=0.0001
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
2020-05-18 09:29:29,797-INFO: {'Architecture': {'function': 'ppocr.modeling.architectures.det_model,DetModel'}, 'TestReader': {'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'single_img_path': None, 'img_set_dir': './train_data/icdar2015/text_localization/', 'label_file_path': './train_data/icdar2015/text_localization/test_icdar2015_label.txt', 'test_image_shape': [736, 1280], 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'do_eval': True}, 'Backbone': {'function': 'ppocr.modeling.backbones.det_mobilenet_v3,MobileNetV3', 'model_name': 'large', 'scale': 0.5}, 'Head': {'k': 50, 'function': 'ppocr.modeling.heads.det_db_head,DBHead', 'inner_channels': 96, 'model_name': 'large', 'out_channels': 2}, 'Optimizer': {'beta1': 0.9, 'function': 'ppocr.optimizer,AdamDecay', 'base_lr': 0.0001, 'beta2': 0.999}, 'EvalReader': {'test_image_shape': [736, 1280], 'reader_function': 'ppocr.data.det.dataset_traversal,EvalTestReader', 'img_set_dir': './train_data/icdar2015/text_localization/', 'process_function': 'ppocr.data.det.db_process,DBProcessTest', 'label_file_path': './train_data/icdar2015/text_localization/test_icdar2015_label.txt'}, 'Loss': {'function': 'ppocr.modeling.losses.det_db_loss,DBLoss', 'balance_loss': True, 'beta': 10, 'alpha': 5, 'ohem_ratio': 3, 'main_loss_type': 'DiceLoss'}, 'TrainReader': {'reader_function': 'ppocr.data.det.dataset_traversal,TrainReader', 'num_workers': 8, 'img_set_dir': './train_data/icdar2015/text_localization/', 'process_function': 'ppocr.data.det.db_process,DBProcessTrain', 'label_file_path': './train_data/icdar2015/text_localization/train_icdar2015_label.txt'}, 'PostProcess': {'unclip_ratio': 1.5, 'max_candidates': 1000, 'function': 'ppocr.postprocess.db_postprocess,DBPostProcess', 'thresh': 0.3, 'box_thresh': 0.7}, 'Global': {'save_epoch_step': 200, 'save_inference_dir': None, 'eval_batch_step': 5000, 'log_smooth_window': 20, 'algorithm': 'DB', 'epoch_num': 1200, 'use_gpu': True, 'train_batch_size_per_card': 16, 'image_shape': [3, 640, 640], 'save_model_dir': './output/det_db/', 'save_res_path': './output/det_db/predicts_db.txt', 'checkpoints': None, 'pretrain_weights': './pretrain_models/MobileNetV3_large_x0_5_pretrained/', 'test_batch_size_per_card': 16, 'reader_yml': './configs/det/det_db_icdar15_reader.yml', 'print_batch_step': 2}}
3 640 640
3 640 640
import ujson error: No module named 'ujson' use json
2020-05-18 09:29:33,067-INFO: places would be ommited when DataLoader is not iterable
W0518 09:29:33.928460 5607 device_context.cc:237] Please NOTE: device: 3, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.0
W0518 09:29:33.932370 5607 device_context.cc:245] device: 3, cuDNN Version: 7.5.
W0518 09:29:33.932396 5607 device_context.cc:271] WARNING: device: 3. The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
2020-05-18 09:29:35,015-INFO: Loading parameters from ./pretrain_models/MobileNetV3_large_x0_5_pretrained/...
2020-05-18 09:29:35,015-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-18 09:29:35,015-WARNING: ./pretrain_models/MobileNetV3_large_x0_5_pretrained/.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-18 09:29:35,206-INFO: Finish initing model from ./pretrain_models/MobileNetV3_large_x0_5_pretrained/
I0518 09:29:35.251972 5607 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 8 cards are used, so 8 programs are executed in parallel.
W0518 09:29:48.223284 5607 init.cc:209] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0518 09:29:48.223331 5607 init.cc:211] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0518 09:29:48.223345 5607 init.cc:214] The detail failure signal is:

LDOUBLEV · 2020-05-18T10:00:48Z

place = fluid.CUDAPlace(3) -> 改成place = fluid.CUDAPlace(0) 目前切换不同GPU，建议用

export CUDA_VISIBLE_DEVICES=3
python3 tools/train.py -c

chenyangMl · 2020-05-18T10:08:25Z

CUDA_VISIBLE_DEVICES=3

多谢多谢，跑起来了。建议更新下文档，我看文档说,多卡任务请先使用 FLAGS_selected_gpus 环境变量设置可见的GPU设备，下个版本将会修正 CUDA_VISIBLE_DEVICES 环境变量无效的问题。

LDOUBLEV · 2020-05-18T10:44:35Z

CUDA_VISIBLE_DEVICES=3

多谢多谢，跑起来了。建议更新下文档，我看文档说,多卡任务请先使用 FLAGS_selected_gpus 环境变量设置可见的GPU设备，下个版本将会修正 CUDA_VISIBLE_DEVICES 环境变量无效的问题。

收到，感谢反馈

chenyangMl · 2020-05-18T13:39:59Z

CUDA_VISIBLE_DEVICES=3

多谢多谢，跑起来了。建议更新下文档，我看文档说,多卡任务请先使用 FLAGS_selected_gpus 环境变量设置可见的GPU设备，下个版本将会修正 CUDA_VISIBLE_DEVICES 环境变量无效的问题。

收到，感谢反馈

您好，训练过程中出现了如下错误，请问该如何解决？
2020-05-18 12:07:49,601-INFO: epoch: 156, iter: 4996, 'total_loss': 2.304114, 'loss_threshold_maps': 0.762533, 'loss_shrink_maps': 1.291126, 'lr': 1e-04, 'loss_binary_maps': 0.249221, time: 1.161
2020-05-18 12:07:51,780-INFO: epoch: 156, iter: 4998, 'total_loss': 2.336653, 'loss_threshold_maps': 0.762533, 'loss_shrink_maps': 1.323457, 'lr': 1e-04, 'loss_binary_maps': 0.251576, time: 1.072
2020-05-18 12:07:53,733-INFO: epoch: 156, iter: 5000, 'total_loss': 2.266071, 'loss_threshold_maps': 0.755013, 'loss_shrink_maps': 1.292108, 'lr': 1e-04, 'loss_binary_maps': 0.249244, time: 1.006
Traceback (most recent call last):
File "tools/train.py", line 112, in
main()
File "tools/train.py", line 104, in main
program.train_eval_det_run(config, exe, train_info_dict, eval_info_dict)
File "/paddle/PaddleOCR/tools/program.py", line 256, in train_eval_det_run
metrics = eval_det_run(exe, config, eval_info_dict, "eval")
File "/paddle/PaddleOCR/tools/eval_utils/eval_det_utils.py", line 126, in eval_det_run
cal_det_res(exe, config, eval_info_dict)
File "/paddle/PaddleOCR/tools/eval_utils/eval_det_utils.py", line 48, in cal_det_res
for data in eval_info_dict'reader':
File "/paddle/PaddleOCR/ppocr/data/det/dataset_traversal.py", line 93, in batch_iter_reader
img = cv2.imread(img_path)
TypeError: bad argument type for built-in operation

tink2123 · 2020-05-19T06:56:19Z

训练过程中eval读取数据报错，问题已修复，请更新到最新的代码～

LDOUBLEV closed this as completed May 18, 2020

adigest mentioned this issue Oct 15, 2020

Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR) #944

Closed

BillDior pushed a commit to BillDior/PaddleOCR that referenced this issue Aug 13, 2021

[android]add face keypoints detection demo (PaddlePaddle#57)

6d32456

an1018 pushed a commit to an1018/PaddleOCR that referenced this issue Aug 17, 2022

Fix pretrain_weights URL in yolov3_r50vd_dcn.yml (PaddlePaddle#57)

506c7f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault (core dumped) #57

Segmentation fault (core dumped) #57

chenyangMl commented May 18, 2020

LDOUBLEV commented May 18, 2020

chenyangMl commented May 18, 2020

LDOUBLEV commented May 18, 2020

chenyangMl commented May 18, 2020

tink2123 commented May 19, 2020

Segmentation fault (core dumped) #57

Segmentation fault (core dumped) #57

Comments

chenyangMl commented May 18, 2020

LDOUBLEV commented May 18, 2020

chenyangMl commented May 18, 2020

LDOUBLEV commented May 18, 2020

chenyangMl commented May 18, 2020

tink2123 commented May 19, 2020