Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

文字检测训练总是报错,检查数据文件没有问题 #12081

Closed
Ghaz1i opened this issue May 9, 2024 · 9 comments
Closed

文字检测训练总是报错,检查数据文件没有问题 #12081

Ghaz1i opened this issue May 9, 2024 · 9 comments
Assignees

Comments

@Ghaz1i
Copy link

Ghaz1i commented May 9, 2024

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:win10
  • 版本号/Version:Paddle:2.7 PaddleOCR: 问题相关组件/Related components:
  • 运行指令/Command Code:python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml
  • 完整报错/Complete Error Message:
    [2024/05/09 14:50:45] ppocr ERROR: When parsing line
    , error happened with msg: Traceback (most recent call last):
    File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
    label = substr[1]
    IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\14.jpg [{"transcription": "0028.0", "points": [[467, 540], [1062, 546], [1061, 706], [466, 699]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\07.jpg [{"transcription": "0010.0", "points": [[500, 343], [1008, 339], [1009, 478], [501, 482]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\30.jpg [{"transcription": "0040.0", "points": [[579, 327], [1091, 318], [1093, 458], [581, 466]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\23.jpg [{"transcription": "0037.0", "points": [[505, 216], [1156, 216], [1156, 402], [505, 402]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\02.jpg [{"transcription": "0027.0", "points": [[521, 233], [1059, 242], [1057, 389], [519, 380]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:50:45] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\18.jpg [{"transcription": "0017.0", "points": [[457, 199], [1109, 207], [1106, 405], [455, 396]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:09] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\14.jpg [{"transcription": "0028.0", "points": [[467, 540], [1062, 546], [1061, 706], [466, 699]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:30] ppocr INFO: epoch: [1/1200], global_step: 2, lr: 0.000014, dml_thrink_maps_0: 0.901465, loss: 20.252010, DBLoss_Student_loss_shrink_maps: 4.710961, DBLoss_Student_loss_threshold_maps: 4.071359, DBLoss_Student_loss_binary_maps: 0.931676, DBLoss_Student_loss_cbn: 0.000000, DBLoss_Student2_loss_shrink_maps: 4.662602, DBLoss_Student2_loss_threshold_maps: 4.037912, DBLoss_Student2_loss_binary_maps: 0.936035, DBLoss_Student2_loss_cbn: 0.000000, avg_reader_cost: 0.19312 s, avg_batch_cost: 23.10311 s, avg_samples: 2.0, ips: 0.08657 samples/s, eta: 5 days, 18:36:20
[2024/05/09 14:51:30] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\02.jpg [{"transcription": "0027.0", "points": [[521, 233], [1059, 242], [1057, 389], [519, 380]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:30] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:51:54] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\20.jpg [{"transcription": "0027.0", "points": [[491, 238], [1128, 249], [1126, 429], [488, 418]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:52:16] ppocr INFO: epoch: [1/1200], global_step: 4, lr: 0.000042, dml_thrink_maps_0: 0.856502, loss: 20.086422, DBLoss_Student_loss_shrink_maps: 4.733825, DBLoss_Student_loss_threshold_maps: 3.983587, DBLoss_Student_loss_binary_maps: 0.940934, DBLoss_Student_loss_cbn: 0.000000, DBLoss_Student2_loss_shrink_maps: 4.737186, DBLoss_Student2_loss_threshold_maps: 3.955768, DBLoss_Student2_loss_binary_maps: 0.948461, DBLoss_Student2_loss_cbn: 0.000000, avg_reader_cost: 0.00000 s, avg_batch_cost: 22.83796 s, avg_samples: 2.0, ips: 0.08757 samples/s, eta: 5 days, 17:47:51
[2024/05/09 14:52:16] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 153, in getitem
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\09.jpg [{"transcription": "0002.0", "points": [[456, 455], [1073, 455], [1073, 622], [456, 622]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\10.jpg [{"transcription": "0008.0", "points": [[438, 431], [1034, 431], [1034, 596], [438, 596]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

[2024/05/09 14:52:16] ppocr ERROR: When parsing line G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\03.jpg [{"transcription": "0007.0", "points": [[612, 379], [1087, 379], [1087, 508], [612, 508]], "difficult": false}]
, error happened with msg: Traceback (most recent call last):
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 161, in getitem
data["ext_data"] = self.get_ext_data()
File "G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 127, in get_ext_data
label = substr[1]
IndexError: list index out of range

@UserWangZz
Copy link
Collaborator

G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\03.jpg [{"transcription": "0007.0", "points": [[612, 379], [1087, 379], [1087, 508], [612, 508]], "difficult": false}]
检查一下图像路径和label之间的分割符是不是\t

@lili-changjiang
Copy link

你看下epoch次数是多少,我epoch次数少就会报这个错,我设置为500就不会

@wentao-uw
Copy link

训练数据的格式可能有问题:[{"transcription": "0010.0", "points": [[500, 343], [1008, 339], [1009, 478], [501, 482]], "difficult": false}],应该是image_path \t [{"label": "xxx", "transcription": "xxx", "points": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]]}]

@564142183
Copy link

564142183 commented May 13, 2024

@wentao-uw 大佬,训练的数据格式加了\t,还是报错

[2024/05/13 00:49:51] ppocr ERROR: When parsing line /PaddleOCR/train_data/det/train/0009.jpg \t [{"transcription": "熊杼", "points": [[95, 55], [139, 55], [139, 80], [95, 80]], "difficult": false, "key_cls": "name"}, {"transcription": "性别机器人民族汉", "points": [[45, 88], [215, 88], [215, 109], [45, 109]], "difficult": false, "key_cls": "sex"}, {"transcription": "2013年4月5日", "points": [[89, 116], [244, 116], [244, 141], [89, 141]], "difficult": false, "key_cls": "birth"}, {"transcription": "广东省广州某某666号—自制数据集", "points": [[87, 149], [279, 149], [279, 197], [87, 197]], "difficult": false, "key_cls": "adress"}, {"transcription": "661546442301175555", "points": [[145, 236], [312, 236], [312, 258], [145, 258]], "difficult": false, "key_cls": "card"}]
, error happened with msg: Traceback (most recent call last):
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 153, in __getitem__
    label = substr[1]
IndexError: list index out of range

Exception in thread Thread-2 (_thread_loop):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
    batch = self._get_data()
Traceback (most recent call last):
  File "/PaddleOCR/tools/train.py", line 255, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "/PaddleOCR/tools/train.py", line 208, in main
    program.train(
  File "/PaddleOCR/tools/program.py", line 304, in train
    for idx, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at ../paddle/fluid/operators/reader/blocking_queue.h:171)

  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
    batch.reraise()
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 187, in reraise
    raise self.exc_type(msg)
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 153, in __getitem__
    label = substr[1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
    data.append(self.dataset[idx])
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 177, in __getitem__
    return self.__getitem__(rnd_idx)
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 177, in __getitem__
    return self.__getitem__(rnd_idx)
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 177, in __getitem__
    return self.__getitem__(rnd_idx)
  [Previous line repeated 967 more times]
  File "/PaddleOCR/ppocr/data/simple_dataset.py", line 164, in __getitem__
    self.logger.error(
  File "/usr/lib/python3.10/logging/__init__.py", line 1506, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
    self.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 968, in handle
    self.emit(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1218, in emit
    StreamHandler.emit(self, record)
  File "/usr/lib/python3.10/logging/__init__.py", line 1100, in emit
    msg = self.format(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 943, in format
    return fmt.format(record)
  File "/usr/lib/python3.10/logging/__init__.py", line 679, in format
    if self.usesTime():
  File "/usr/lib/python3.10/logging/__init__.py", line 647, in usesTime
    return self._style.usesTime()
  File "/usr/lib/python3.10/logging/__init__.py", line 424, in usesTime
    return self._fmt.find(self.asctime_search) >= 0
RecursionError: maximum recursion depth exceeded while calling a Python object

@UserWangZz
Copy link
Collaborator

/PaddleOCR/train_data/det/train/0009.jpg \t [{"transcription": "熊杼", "points": [[95, 55], [139, 55], [139, 80], [95, 80]], "difficult": false, "key_cls": "name"}, {"transcription": "性别机器人民族汉", "points": [[45, 88], [215, 88], [215, 109], [45, 109]], "difficult": false, "key_cls": "sex"}, {"transcription": "2013年4月5日", "points": [[89, 116], [244, 116], [244, 141], [89, 141]], "difficult": false, "key_cls": "birth"}, {"transcription": "广东省广州某某666号—自制数据集", "points": [[87, 149], [279, 149], [279, 197], [87, 197]], "difficult": false, "key_cls": "adress"}, {"transcription": "661546442301175555", "points": [[145, 236], [312, 236], [312, 258], [145, 258]], "difficult": false, "key_cls": "card"}]
这个\t是字符还是制表符,检查一下,确保是地址+制表符+label

@564142183
Copy link

@UserWangZz 感谢回复,我是用PPOCRLabel划分的数据集,命令是

python PPOCRLabel/gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath train_data --detRootPath train_data/det --recRootPath train_data/rec

划分好的训练train.txt格式如下,这种格式是带制表符的吧?

/PaddleOCR/train_data/det/train/0009.jpg	[{"transcription": "熊杼", "points": [[95, 55], [139, 55], [139, 80], [95, 80]], "difficult": false, "key_cls": "name"}, {"transcription": "性别机器人民族汉", "points": [[45, 88], [215, 88], [215, 109], [45, 109]], "difficult": false, "key_cls": "sex"}, {"transcription": "2013年4月5日", "points": [[89, 116], [244, 116], [244, 141], [89, 141]], "difficult": false, "key_cls": "birth"}, {"transcription": "广东省广州某某666号—自制数据集", "points": [[87, 149], [279, 149], [279, 197], [87, 197]], "difficult": false, "key_cls": "adress"}, {"transcription": "661546442301175555", "points": [[145, 236], [312, 236], [312, 258], [145, 258]], "difficult": false, "key_cls": "card"}]

但训练的时候一直报错

[2024/05/13 02:07:25] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 1000 iterations
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
[2024-05-13 02:18:01,015] [ WARNING] dataloader_iter.py:707 - DataLoader 1 workers exit unexpectedly, pids: 9374
Traceback (most recent call last):
  File "/PaddleOCR/tools/train.py", line 255, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "/PaddleOCR/tools/train.py", line 208, in main
    program.train(
  File "/PaddleOCR/tools/program.py", line 304, in train
    for idx, batch in enumerate(train_dataloader):
  File "/usr/local/lib/python3.10/dist-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at ../paddle/fluid/operators/reader/blocking_queue.h:171)

@UserWangZz
Copy link
Collaborator

我好像遇到过你这个问题,是不是在docker中训练的呢?
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough

应该是创建docker的时候,没有指定--shm-size,根据你的服务器配置,在docker创建时添加--shm-size {共享存储大小}G,或者直接指定--ipc=host

也可以先在配置文件中设置,尝试一下

loader:
    shuffle: true
    batch_size_per_card: 96
    drop_last: true
    num_workers: 8
    use_shared_memory=false

@564142183
Copy link

@UserWangZz 感谢大佬,docker启动加--ipc=host解决了

@Ghaz1i
Copy link
Author

Ghaz1i commented May 21, 2024

G:\PaddleOCR-release-2.7\PaddleOCR-release-2.7\train_data\det\train\03.jpg [{"transcription": "0007.0", "points": [[612, 379], [1087, 379], [1087, 508], [612, 508]], "difficult": false}] 检查一下图像路径和label之间的分割符是不是\t

您好,我确定分隔符是\t,然后路径中有时候把/换成\就可以,但有时候又不行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants