We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,我使用预训练模型进行预测正常,但是训练表格结构识别模型的时候,遇到报错Segmentation fault
Segmentation fault
训练使用命令:python3 tools/train.py -c configs/table/table_mv3.yml GPU环境信息:单卡cuda=10.1 cudnn=7.6 Paddle版本信息:paddleocr=2.3.0.2 paddlepaddle-gpu=2.1.2.post101 table_mv3.yml修改内容:只修改max_len : 800和num_workers: 0,其他保持默认配置 具体配置:
python3 tools/train.py -c configs/table/table_mv3.yml
cuda=10.1 cudnn=7.6
paddleocr=2.3.0.2 paddlepaddle-gpu=2.1.2.post101
max_len : 800
num_workers: 0
[2021/12/16 10:25:49] root INFO: Architecture : [2021/12/16 10:25:49] root INFO: Backbone : [2021/12/16 10:25:49] root INFO: disable_se : True [2021/12/16 10:25:49] root INFO: model_name : small [2021/12/16 10:25:49] root INFO: name : MobileNetV3 [2021/12/16 10:25:49] root INFO: scale : 1.0 [2021/12/16 10:25:49] root INFO: Head : [2021/12/16 10:25:49] root INFO: hidden_size : 256 [2021/12/16 10:25:49] root INFO: l2_decay : 1e-05 [2021/12/16 10:25:49] root INFO: loc_type : 2 [2021/12/16 10:25:49] root INFO: name : TableAttentionHead [2021/12/16 10:25:49] root INFO: algorithm : TableAttn [2021/12/16 10:25:49] root INFO: model_type : table [2021/12/16 10:25:49] root INFO: Eval : [2021/12/16 10:25:49] root INFO: dataset : [2021/12/16 10:25:49] root INFO: data_dir : dataset/PubTabNet/images/val/ [2021/12/16 10:25:49] root INFO: label_file_path : dataset/PubTabNet/annotations/PubTabNet_2.0.0_val.jsonl [2021/12/16 10:25:49] root INFO: name : PubTabDataSet [2021/12/16 10:25:49] root INFO: transforms : [2021/12/16 10:25:49] root INFO: DecodeImage : [2021/12/16 10:25:49] root INFO: channel_first : False [2021/12/16 10:25:49] root INFO: img_mode : BGR [2021/12/16 10:25:49] root INFO: ResizeTableImage : [2021/12/16 10:25:49] root INFO: max_len : 800 [2021/12/16 10:25:49] root INFO: TableLabelEncode : None [2021/12/16 10:25:49] root INFO: NormalizeImage : [2021/12/16 10:25:49] root INFO: mean : [0.485, 0.456, 0.406] [2021/12/16 10:25:49] root INFO: order : hwc [2021/12/16 10:25:49] root INFO: scale : 1./255. [2021/12/16 10:25:49] root INFO: std : [0.229, 0.224, 0.225] [2021/12/16 10:25:49] root INFO: PaddingTableImage : None [2021/12/16 10:25:49] root INFO: ToCHWImage : None [2021/12/16 10:25:49] root INFO: KeepKeys : [2021/12/16 10:25:49] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'] [2021/12/16 10:25:49] root INFO: loader : [2021/12/16 10:25:49] root INFO: batch_size_per_card : 8 [2021/12/16 10:25:49] root INFO: drop_last : False [2021/12/16 10:25:49] root INFO: num_workers : 0 [2021/12/16 10:25:49] root INFO: shuffle : False [2021/12/16 10:25:49] root INFO: Global : [2021/12/16 10:25:49] root INFO: cal_metric_during_train : True [2021/12/16 10:25:49] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt [2021/12/16 10:25:49] root INFO: character_type : en [2021/12/16 10:25:49] root INFO: checkpoints : None [2021/12/16 10:25:49] root INFO: debug : False [2021/12/16 10:25:49] root INFO: distributed : False [2021/12/16 10:25:49] root INFO: epoch_num : 50 [2021/12/16 10:25:49] root INFO: eval_batch_step : [0, 800] [2021/12/16 10:25:49] root INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2021/12/16 10:25:49] root INFO: infer_mode : False [2021/12/16 10:25:49] root INFO: log_smooth_window : 20 [2021/12/16 10:25:49] root INFO: max_cell_num : 500 [2021/12/16 10:25:49] root INFO: max_elem_length : 500 [2021/12/16 10:25:49] root INFO: max_text_length : 100 [2021/12/16 10:25:49] root INFO: pretrained_model : None [2021/12/16 10:25:49] root INFO: print_batch_step : 5 [2021/12/16 10:25:49] root INFO: process_cut_num : 0 [2021/12/16 10:25:49] root INFO: process_total_num : 0 [2021/12/16 10:25:49] root INFO: save_epoch_step : 5 [2021/12/16 10:25:49] root INFO: save_inference_dir : None [2021/12/16 10:25:49] root INFO: save_model_dir : ./output/table_mv3_pubtabnet/ [2021/12/16 10:25:49] root INFO: use_gpu : True [2021/12/16 10:25:49] root INFO: use_visualdl : False [2021/12/16 10:25:49] root INFO: Loss : [2021/12/16 10:25:49] root INFO: loc_weight : 10000.0 [2021/12/16 10:25:49] root INFO: name : TableAttentionLoss [2021/12/16 10:25:49] root INFO: structure_weight : 100.0 [2021/12/16 10:25:49] root INFO: Metric : [2021/12/16 10:25:49] root INFO: main_indicator : acc [2021/12/16 10:25:49] root INFO: name : TableMetric [2021/12/16 10:25:49] root INFO: Optimizer : [2021/12/16 10:25:49] root INFO: beta1 : 0.9 [2021/12/16 10:25:49] root INFO: beta2 : 0.999 [2021/12/16 10:25:49] root INFO: clip_norm : 5.0 [2021/12/16 10:25:49] root INFO: lr : [2021/12/16 10:25:49] root INFO: learning_rate : 0.001 [2021/12/16 10:25:49] root INFO: name : Adam [2021/12/16 10:25:49] root INFO: regularizer : [2021/12/16 10:25:49] root INFO: factor : 0.0 [2021/12/16 10:25:49] root INFO: name : L2 [2021/12/16 10:25:49] root INFO: PostProcess : [2021/12/16 10:25:49] root INFO: name : TableLabelDecode [2021/12/16 10:25:49] root INFO: Train : [2021/12/16 10:25:49] root INFO: dataset : [2021/12/16 10:25:49] root INFO: data_dir : dataset/PubTabNet/images/train/ [2021/12/16 10:25:49] root INFO: label_file_path : dataset/PubTabNet/annotations/PubTabNet_2.0.0_train.jsonl [2021/12/16 10:25:49] root INFO: name : PubTabDataSet [2021/12/16 10:25:49] root INFO: transforms : [2021/12/16 10:25:49] root INFO: DecodeImage : [2021/12/16 10:25:49] root INFO: channel_first : False [2021/12/16 10:25:49] root INFO: img_mode : BGR [2021/12/16 10:25:49] root INFO: ResizeTableImage : [2021/12/16 10:25:49] root INFO: max_len : 800 [2021/12/16 10:25:49] root INFO: TableLabelEncode : None [2021/12/16 10:25:49] root INFO: NormalizeImage : [2021/12/16 10:25:49] root INFO: mean : [0.485, 0.456, 0.406] [2021/12/16 10:25:49] root INFO: order : hwc [2021/12/16 10:25:49] root INFO: scale : 1./255. [2021/12/16 10:25:49] root INFO: std : [0.229, 0.224, 0.225] [2021/12/16 10:25:49] root INFO: PaddingTableImage : None [2021/12/16 10:25:49] root INFO: ToCHWImage : None [2021/12/16 10:25:49] root INFO: KeepKeys : [2021/12/16 10:25:49] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'] [2021/12/16 10:25:49] root INFO: loader : [2021/12/16 10:25:49] root INFO: batch_size_per_card : 4 [2021/12/16 10:25:49] root INFO: drop_last : True [2021/12/16 10:25:49] root INFO: num_workers : 0 [2021/12/16 10:25:49] root INFO: shuffle : True
报错日志:
[2021/12/15 11:25:48] root INFO: train with paddle 2.1.2 and device CUDAPlace(0) [2021/12/15 11:25:48] root INFO: Initialize indexs of datasets:dataset/PubTabNet/annotations/PubTabNet_2.0.0_test.jsonl [2021/12/15 11:25:48] root INFO: Initialize indexs of datasets:dataset/PubTabNet/annotations/PubTabNet_2.0.0_test.jsonl W1215 11:25:48.538619 168238 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1 W1215 11:25:48.543943 168238 device_context.cc:422] device: 0, cuDNN Version: 7.6. [2021/12/15 11:26:04] root INFO: train dataloader has 6 iters [2021/12/15 11:26:04] root INFO: valid dataloader has 13 iters [2021/12/15 11:26:04] root INFO: During the training process, after the 0th iteration, an evaluation is run every 5 iterations [2021/12/15 11:26:04] root INFO: Initialize indexs of datasets:dataset/PubTabNet/annotations/PubTabNet_2.0.0_test.jsonl [ERROR] 2021-12-15T04:46:31.567873Z, 168330, "Cannot create UVM block on server" W1215 12:46:31.567991 168330 system_allocator.cc:205] cudaHostAlloc failed. W1215 12:46:31.568040 168330 naive_best_fit_allocator.cc:519] cudaHostAlloc Cannot allocate 122880000 bytes in CUDAPinnedPlace -------------------------------------- C++ Traceback (most recent call last): -------------------------------------- 0 std::thread::_Impl<std::_Bind_simple<ThreadPool::ThreadPool(unsigned long)::{lambda()#1} ()> >::_M_run() 1 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) 2 paddle::framework::SignalHandle(char const*, int) 3 paddle::platform::GetCurrentTraceBackString[abi:cxx11]() ---------------------- Error Message Summary: ---------------------- FatalError: `Segmentation fault` is detected by the operating system. [TimeInfo: *** Aborted at 1639543591 (unix time) try "date -d @1639543591" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0x0) received by PID 168238 (TID 0x7fc05d8c5700) from PID 0 ***]
The text was updated successfully, but these errors were encountered:
遇到类似的问题 使用下面的语句重新安装paddle环境好了
conda install paddlepaddle-gpu==2.2.1 cudatoolkit=10.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
Sorry, something went wrong.
遇到类似的问题 使用下面的语句重新安装paddle环境好了 conda install paddlepaddle-gpu==2.2.1 cudatoolkit=10.1 --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
我尝试把paddlepaddle-gpu升级到你这个版本,还是存在同样的问题,另外通过top查看内存发现RES用了3.2G,但是VIRT会不断增加,直到100多t后,出现报错Cannot create UVM block on server,怀疑这个表格识别模型训练部分是不是存在内存泄漏的问题
paddlepaddle-gpu
Cannot create UVM block on server
WenmuZhou
No branches or pull requests
你好,我使用预训练模型进行预测正常,但是训练表格结构识别模型的时候,遇到报错
Segmentation fault
训练使用命令:
python3 tools/train.py -c configs/table/table_mv3.yml
GPU环境信息:单卡
cuda=10.1 cudnn=7.6
Paddle版本信息:
paddleocr=2.3.0.2 paddlepaddle-gpu=2.1.2.post101
table_mv3.yml修改内容:只修改
max_len : 800
和num_workers: 0
,其他保持默认配置具体配置:
报错日志:
The text was updated successfully, but these errors were encountered: