When training my own datasets, it interrupted #54

zhuyu-cs · 2019-06-08T13:54:22Z

It will stop after show these informations. The bus shows as follow：

File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 169, in kp_detection
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
IndexError: index 128 is out of bounds for axis 1 with size 128
Process Process-1:
Traceback (most recent call last):
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 47, in prefetch_data
raise e
File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 169, in kp_detection
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
IndexError: index 128 is out of bounds for axis 1 with size 128
training loss at iteration 100: 10.80356502532959
focal loss at iteration 100: 10.398816108703613
pull loss at iteration 100: 0.026798348873853683
push loss at iteration 100: 0.11498038470745087
regr loss at iteration 100: 0.26297056674957275
0%| | 101/480000 [02:48<222:26:08, 1.67s/it]Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

Can anyone help me ? Thank you very much!!!

zhuyu-cs · 2019-06-09T02:54:11Z

solved！

Rguoo · 2019-06-18T09:18:06Z

hello, i meet the same problem, can you tell me how can i solve it, thanks!

Rguoo · 2019-06-18T09:33:05Z

I have solve it ,thanks!

David-19940718 · 2019-07-11T03:12:25Z

solved！
can you tell me how to training it with my own datasets, I'm very appreciate if you can help me.

ruinianxu · 2019-07-31T19:49:40Z

@yzhu20
Hi yzhu,
How did you solve this problem? Just increate the max_per_image to exceed the maximal of instance perimage? I just wonder if there is another way to do it.
Appreciate it if you can give some suggestions.

WuChannn · 2020-08-12T03:54:32Z

@zhuyu-cs Hello, could you please share your solution? thanks a lot

zhuyu-cs closed this as completed Jun 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When training my own datasets, it interrupted #54

When training my own datasets, it interrupted #54

zhuyu-cs commented Jun 8, 2019

zhuyu-cs commented Jun 9, 2019

Rguoo commented Jun 18, 2019

Rguoo commented Jun 18, 2019

David-19940718 commented Jul 11, 2019

ruinianxu commented Jul 31, 2019

WuChannn commented Aug 12, 2020

When training my own datasets, it interrupted #54

When training my own datasets, it interrupted #54

Comments

zhuyu-cs commented Jun 8, 2019

zhuyu-cs commented Jun 9, 2019

Rguoo commented Jun 18, 2019

Rguoo commented Jun 18, 2019

David-19940718 commented Jul 11, 2019

ruinianxu commented Jul 31, 2019

WuChannn commented Aug 12, 2020