Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training of OA-CNN with custom data fails on flat surfaces #252

Open
meyerjo opened this issue May 16, 2024 · 3 comments
Open

Training of OA-CNN with custom data fails on flat surfaces #252

meyerjo opened this issue May 16, 2024 · 3 comments

Comments

@meyerjo
Copy link

meyerjo commented May 16, 2024

Hi, first of all thanks for the nice repository.

I am trying to train various models on custom data. However, with OA-CNN I encounter the problem below.

Traceback (most recent call last):
  File "/workspace/Pointcept/tests/test_models.py", line 216, in test_oacnn
    self._model_dict_for_val_loader(self.model_definition_dict["oacnn"])
  File "/workspace/Pointcept/tests/test_models.py", line 209, in _model_dict_for_val_loader
    output_dict = model(input_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/default.py", line 20, in forward
    seg_logits = self.backbone(input_dict)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/oacnns/oacnns_v1m1_base.py", line 324, in forward
    x = self.enc[i](x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/Pointcept/pointcept/models/oacnns/oacnns_v1m1_base.py", line 158, in forward
    x = self.down(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/modules.py", line 138, in forward
    input = module(input)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 755, in forward
    return self._conv_forward(self.training,
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 408, in _conv_forward
    raise e
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/conv.py", line 385, in _conv_forward
    res = ops.get_indice_pairs_implicit_gemm(
  File "/opt/conda/lib/python3.10/site-packages/spconv/pytorch/ops.py", line 460, in get_indice_pairs_implicit_gemm
    raise ValueError(
ValueError: your out spatial shape [12, 12, 0] reach zero!!! input shape: [25, 25, 1]

I can influence the number of input files where this error occurs by the grid size of the sampling I choose. I.e. with a grid-size of 0.1 it fails on 17/174 files; with a grid size of 0.01 it fails on only 4/174 files.

I took a look at the grid_coord and noticed that all data on which it fails have a range (max grid_coord - min grid_coord) in the z-component of less than 15. Equaling 15 and above it just works fine. Looking at the failure cases also showed that all failures correspond to flat surfaces on e.g. streets. See example below (blue background comes from visualizing with CloudCompare)
image

Do you have any workaround so that OA-CNN can be trained and (especially) evaluated on all data?

@Gofinge
Copy link
Member

Gofinge commented May 16, 2024

Hi, this is caused by spconv. You can refer to a discussion here #236.

@meyerjo
Copy link
Author

meyerjo commented May 16, 2024

Thanks! I changed the torch.add(..., 1) to torch.add(..., 96) as in the other issue and it worked. Wouldn't it be better to introduce a parameter for the configuration for that?

@Gofinge
Copy link
Member

Gofinge commented May 17, 2024

I think the parameter can be fixed as 96. I will modify the model code later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants