Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When training my own datasets, it interrupted #54

Closed
zhuyu-cs opened this issue Jun 8, 2019 · 6 comments
Closed

When training my own datasets, it interrupted #54

zhuyu-cs opened this issue Jun 8, 2019 · 6 comments

Comments

@zhuyu-cs
Copy link

zhuyu-cs commented Jun 8, 2019

It will stop after show these informations. The bus shows as follow:

File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 169, in kp_detection
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
IndexError: index 128 is out of bounds for axis 1 with size 128
Process Process-1:
Traceback (most recent call last):
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 47, in prefetch_data
raise e
File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/home/zhuyu/CenterNet/sample/coco.py", line 169, in kp_detection
tl_regrs[b_ind, tag_ind, :] = [fxtl - xtl, fytl - ytl]
IndexError: index 128 is out of bounds for axis 1 with size 128
training loss at iteration 100: 10.80356502532959
focal loss at iteration 100: 10.398816108703613
pull loss at iteration 100: 0.026798348873853683
push loss at iteration 100: 0.11498038470745087
regr loss at iteration 100: 0.26297056674957275
0%| | 101/480000 [02:48<222:26:08, 1.67s/it]Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/zhuyu/anaconda3/envs/CenterNet/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

Can anyone help me ? Thank you very much!!!

@zhuyu-cs
Copy link
Author

zhuyu-cs commented Jun 9, 2019

solved!

@zhuyu-cs zhuyu-cs closed this as completed Jun 9, 2019
@Rguoo
Copy link

Rguoo commented Jun 18, 2019

hello, i meet the same problem, can you tell me how can i solve it, thanks!

@Rguoo
Copy link

Rguoo commented Jun 18, 2019

I have solve it ,thanks!

@David-19940718
Copy link

solved!
can you tell me how to training it with my own datasets, I'm very appreciate if you can help me.

@ruinianxu
Copy link

@yzhu20
Hi yzhu,
How did you solve this problem? Just increate the max_per_image to exceed the maximal of instance perimage? I just wonder if there is another way to do it.
Appreciate it if you can give some suggestions.

@WuChannn
Copy link

@zhuyu-cs Hello, could you please share your solution? thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants