Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 48, but expected 1) #39

Closed
hsj928 opened this issue May 21, 2019 · 9 comments

Comments

@hsj928
Copy link

hsj928 commented May 21, 2019

CenterNet

@xjiao004
Copy link

The length of chunk size is the number of GPUs. And the batch size is the sum of chunks size.

@hsj928
Copy link
Author

hsj928 commented May 27, 2019

There is an new issue.
CenterNet1

@Duankaiwen
Copy link
Owner

@hsj928
Add these codes above the
dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2)
in models/py_utils/kp_utils.py:

if len(tag_mean.size()) < 2:

      tag_mean = tag_mean.unsqueeze(0)`  

@Ostyk
Copy link

Ostyk commented Dec 28, 2020

The length of chunk size is the number of GPUs. And the batch size is the sum of chunks size.

so if we use one GPU, we can only use chunk size of 1? I'm struggling to use more of the memory of my GPU in this way, even when I set the batch size really high

@Duankaiwen
Copy link
Owner

@Ostyk If you only have one GPU, modify the these codes in config/xxx.json:
line4: "batch_size": xx, (where xx denotes the batch size in a GPU, it can be more than 1)
line22: "chunk_sizes": [xx], (the 'chunk_sizes' is equal to the 'batch_size')
I also recommend that you try the CPNDet (https://github.com/Duankaiwen/CPNDet). CPNDet is a version 2.0 of the CenterNet

@Ostyk
Copy link

Ostyk commented Dec 29, 2020

Thanks for the quick answer. I'll check out the new network, and since it's also anchor free I can use it as the backbone just like CenterNet for FAIRMOT (reidentificiation).

@Ostyk
Copy link

Ostyk commented Dec 29, 2020

still getting the error when I put batch_size == chunksizes
"batch_size": 20, "chunk_sizes": [20],
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 20, but expected 1) (scatter at ..\torch\csrc\cuda\comm.cpp:176)

@Duankaiwen
Copy link
Owner

@Ostyk Can i see your full log?

@Ostyk
Copy link

Ostyk commented Dec 29, 2020

`

Traceback (most recent call last):
File "train.py", line 210, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 143, in train
out_train = nnet.train(**training)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\nnet\py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "C:\Users\ai.admin\Anaconda3\envs\ObjectTracking\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 18, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(*map(scatter_map, obj))))
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "C:\Users\ai.admin\Anaconda3\envs\ObjectTracking\lib\site-packages\torch\nn\parallel_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "C:\Users\ai.admin\Anaconda3\envs\ObjectTracking\lib\site-packages\torch\cuda\comm.py", line 147, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 10, but expected 1) (scatter at ..\torch\csrc\cuda\comm.cpp:176)
(no backtrace available)
`

I read that it might have something to do with nn.DataParallel but couldn't figure it out. I'd be grateful for any tips :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants