RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 48, but expected 1) #39

hsj928 · 2019-05-21T07:43:09Z

xjiao004 · 2019-05-23T11:21:56Z

The length of chunk size is the number of GPUs. And the batch size is the sum of chunks size.

hsj928 · 2019-05-27T07:20:37Z

There is an new issue.

Duankaiwen · 2019-05-30T03:00:28Z

@hsj928
Add these codes above the
dist = tag_mean.unsqueeze(1) - tag_mean.unsqueeze(2)
in models/py_utils/kp_utils.py:

if len(tag_mean.size()) < 2:

      tag_mean = tag_mean.unsqueeze(0)`

Ostyk · 2020-12-28T14:51:23Z

The length of chunk size is the number of GPUs. And the batch size is the sum of chunks size.

so if we use one GPU, we can only use chunk size of 1? I'm struggling to use more of the memory of my GPU in this way, even when I set the batch size really high

Duankaiwen · 2020-12-28T16:01:28Z

@Ostyk If you only have one GPU, modify the these codes in config/xxx.json:
line4: "batch_size": xx, (where xx denotes the batch size in a GPU， it can be more than 1)
line22: "chunk_sizes": [xx], (the 'chunk_sizes' is equal to the 'batch_size')
I also recommend that you try the CPNDet (https://github.com/Duankaiwen/CPNDet). CPNDet is a version 2.0 of the CenterNet

Ostyk · 2020-12-29T08:13:12Z

Thanks for the quick answer. I'll check out the new network, and since it's also anchor free I can use it as the backbone just like CenterNet for FAIRMOT (reidentificiation).

Ostyk · 2020-12-29T09:23:42Z

still getting the error when I put batch_size == chunksizes
"batch_size": 20, "chunk_sizes": [20],
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 20, but expected 1) (scatter at ..\torch\csrc\cuda\comm.cpp:176)

Duankaiwen · 2020-12-29T10:03:05Z

@Ostyk Can i see your full log?

Ostyk · 2020-12-29T11:44:53Z

`

Traceback (most recent call last):
File "train.py", line 210, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 143, in train
out_train = nnet.train(**training)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\nnet\py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "C:\Users\ai.admin\Anaconda3\envs\ObjectTracking\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 18, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(*map(scatter_map, obj))))
File "C:\Users\ai.admin\Desktop\ObjectTracking\CenterNet\models\py_utils\scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "C:\Users\ai.admin\Anaconda3\envs\ObjectTracking\lib\site-packages\torch\nn\parallel_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "C:\Users\ai.admin\Anaconda3\envs\ObjectTracking\lib\site-packages\torch\cuda\comm.py", line 147, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 10, but expected 1) (scatter at ..\torch\csrc\cuda\comm.cpp:176)
(no backtrace available)
`

I read that it might have something to do with nn.DataParallel but couldn't figure it out. I'd be grateful for any tips :)

Duankaiwen closed this as completed May 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 48, but expected 1) #39

RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 48, but expected 1) #39

hsj928 commented May 21, 2019

xjiao004 commented May 23, 2019

hsj928 commented May 27, 2019 •

edited

Loading

Duankaiwen commented May 30, 2019

Ostyk commented Dec 28, 2020

Duankaiwen commented Dec 28, 2020

Ostyk commented Dec 29, 2020

Ostyk commented Dec 29, 2020

Duankaiwen commented Dec 29, 2020

Ostyk commented Dec 29, 2020 •

edited

Loading

RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 48, but expected 1) #39

RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 48, but expected 1) #39

Comments

hsj928 commented May 21, 2019

xjiao004 commented May 23, 2019

hsj928 commented May 27, 2019 • edited Loading

Duankaiwen commented May 30, 2019

Ostyk commented Dec 28, 2020

Duankaiwen commented Dec 28, 2020

Ostyk commented Dec 29, 2020

Ostyk commented Dec 29, 2020

Duankaiwen commented Dec 29, 2020

Ostyk commented Dec 29, 2020 • edited Loading

hsj928 commented May 27, 2019 •

edited

Loading

Ostyk commented Dec 29, 2020 •

edited

Loading