Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU 训练可以,GPU训练就报错如下: #565

Closed
YanDingXin opened this issue Aug 19, 2020 · 5 comments
Closed

CPU 训练可以,GPU训练就报错如下: #565

YanDingXin opened this issue Aug 19, 2020 · 5 comments

Comments

@YanDingXin
Copy link

Process Process-1:
Process Process-2:
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 556, in _read_into_queue
six.reraise(*sys.exc_info())
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 303, in batch_iter_reader
for outs in sample_iter_reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 272, in sample_iter_reader
) * self.num_workers > img_num:
File "tools/../ppocr/data/rec/dataset_traversal.py", line 241, in get_device_num
gpu_num = len(gpus.split(','))
AttributeError: 'int' object has no attribute 'split'
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 556, in _read_into_queue
six.reraise(*sys.exc_info())
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 303, in batch_iter_reader
for outs in sample_iter_reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 272, in sample_iter_reader
) * self.num_workers > img_num:
File "tools/../ppocr/data/rec/dataset_traversal.py", line 241, in get_device_num
gpu_num = len(gpus.split(','))
AttributeError: 'int' object has no attribute 'split'
Process Process-3:
Traceback (most recent call last):
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 556, in _read_into_queue
six.reraise(*sys.exc_info())
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 303, in batch_iter_reader
for outs in sample_iter_reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 272, in sample_iter_reader
) * self.num_workers > img_num:
File "tools/../ppocr/data/rec/dataset_traversal.py", line 241, in get_device_num
gpu_num = len(gpus.split(','))
AttributeError: 'int' object has no attribute 'split'
2020-08-19 13:58:35,693-WARNING: Your reader has raised an exception!
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/reader.py", line 805, in thread_main
six.reraise(*sys.exc_info())
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/reader.py", line 785, in thread_main
for tensors in self._tensor_reader():
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/reader.py", line 853, in tensor_reader_impl
for slots in paddle_reader():
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 488, in reader_creator
for item in reader():
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 572, in queue_reader
raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception

Process Process-4:
Traceback (most recent call last):
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 556, in _read_into_queue
six.reraise(*sys.exc_info())
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 303, in batch_iter_reader
for outs in sample_iter_reader():
File "tools/../ppocr/data/rec/dataset_traversal.py", line 272, in sample_iter_reader
) * self.num_workers > img_num:
File "tools/../ppocr/data/rec/dataset_traversal.py", line 241, in get_device_num
gpu_num = len(gpus.split(','))
AttributeError: 'int' object has no attribute 'split'
/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/executor.py:789: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "tools/train.py", line 123, in
main()
File "tools/train.py", line 100, in main
program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict)
File "tools/../tools/program.py", line 336, in train_eval_rec_run
return_numpy=False)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/executor.py", line 790, in run
six.reraise(*sys.exc_info())
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/executor.py", line 785, in run
use_program_cache=use_program_cache)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/executor.py", line 850, in _run_impl
return_numpy=return_numpy)
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/executor.py", line 684, in _run_parallel
tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >
)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Python Call Stacks (More useful to users):

File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/reader.py", line 733, in _init_non_iterable
outputs={'Out': self._feed_list})
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/reader.py", line 646, in init
self._init_non_iterable()
File "/home/aisrv/anaconda3/envs/ydx/lib/python3.7/site-packages/paddle/fluid/reader.py", line 280, in from_generator
iterable, return_list)
File "tools/../ppocr/modeling/architectures/rec_model.py", line 135, in create_feed
iterable=False)
File "tools/../ppocr/modeling/architectures/rec_model.py", line 188, in call
image, labels, loader = self.create_feed(mode)
File "tools/../tools/program.py", line 170, in build
dataloader, outputs = model(mode=mode)
File "tools/train.py", line 50, in main
config, train_program, startup_program, mode='train')
File "tools/train.py", line 123, in
main()


Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
[operator < read > error]

@YanDingXin
Copy link
Author

已解决: export CUDA_VISIBLE_DEVICES=0 必须要指定GPU ID.

@qifang-robotics
Copy link

请问应该在哪里修改呢?

@YanDingXin
Copy link
Author

请问应该在哪里修改呢?

我这是在训练 识别时报的错,, 直接命令行输入 export CUDA_VISIBLE_DEVICES=0 这个就好了。
然后再运行train.py

@qifang-robotics
Copy link

请问应该在哪里修改呢?

我这是在训练 识别时报的错,, 直接命令行输入 export CUDA_VISIBLE_DEVICES=0 这个就好了。
然后再运行train.py

已解决,感谢!

@dream-in-night
Copy link

dream-in-night commented Jun 8, 2022

这个问题在
PaddleOCR/ppocr/data/rec/dataset_traversal.py", line 241
image

这个函数:
os.environ.get("CUDA_VISIBLE_DEVICES", 1)
返回值是1,也就是会第二个参数:
image
这时你对一个整数split,肯定会报错。
解决方式:
gpu_num = 1 # 如果是8卡,就写8,如果卡被占用,使用单卡就写1,与命令行的CUDA_VISIBLE_DEVICES个数一致
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants