Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds #12

Open
dorucia opened this issue Sep 18, 2020 · 4 comments
Open

Index out of bounds #12

dorucia opened this issue Sep 18, 2020 · 4 comments

Comments

@dorucia
Copy link

dorucia commented Sep 18, 2020

Hi,

Thanks for sharing the code.

I'm trying to train DGR on my own dataset.
So I made a dataloader which returns the same format as other loaders in this repo.
For example, I printed what getitem returns right before its returning line like this

print('u0: {}, u1: {}, c0: {}, c1: {}, f0: {}, f1: {}, m: {}, trans: {}'.format(unique_xyz0_th.shape, unique_xyz1_th.shape, coords0.shape, coords1.shape, feats0.shape, feats1.shape, len(matches), trans.shape))

        return (unique_xyz0_th.float(),
                unique_xyz1_th.float(), coords0.int(), coords1.int(), feats0.float(),
                feats1.float(), matches, trans, extra_package)

and this is its example output

u0: torch.Size([279, 3]), u1: torch.Size([281, 3]), c0: torch.Size([279, 3]), c1: torch.Size([281, 3]), f0: torch.Size([279, 1]), f1: torch.Size([281, 1]), m: 46745, trans: (4, 4)
u0: torch.Size([900, 3]), u1: torch.Size([859, 3]), c0: torch.Size([900, 3]), c1: torch.Size([859, 3]), f0: torch.Size([900, 1]), f1: torch.Size([859, 1]), m: 4696, trans: (4, 4)
u0: torch.Size([1159, 3]), u1: torch.Size([1153, 3]), c0: torch.Size([1159, 3]), c1: torch.Size([1153, 3]), f0: torch.Size([1159, 1]), f1: torch.Size([1153, 1]), m: 298974, trans: (4, 4)
u0: torch.Size([2092, 3]), u1: torch.Size([2048, 3]), c0: torch.Size([2092, 3]), c1: torch.Size([2048, 3]), f0: torch.Size([2092, 1]), f1: torch.Size([2048, 1]), m: 587866, trans: (4, 4)

These look similar to what 3DMatch dataset returns.

So I run the training code but it complains with

Traceback (most recent call last):
  File "train.py", line 76, in <module>
    main(config)
  File "train.py", line 55, in main
    trainer.train()
  File "dgr/core/trainer.py", line 135, in train
    self._train_epoch(epoch)
  File "dgr/core/trainer.py", line 237, in _train_epoch
    weights=weights)
  File "dgr/core/trainer.py", line 591, in weighted_procrustes
    X=xyz0[pred_pair[:, 0]].to(self.device),
IndexError: index 588 is out of bounds for dimension 0 with size 588

To see what it means, I also printed xyz0, xyz1, and pred_pair in core/trainer.py like this

  def weighted_procrustes(self, xyz0s, xyz1s, pred_pairs, weights):
    decomposed_weights = self.decompose_by_length(weights, pred_pairs)
    RT = []
    ws = []

    for xyz0, xyz1, pred_pair, w in zip(xyz0s, xyz1s, pred_pairs, decomposed_weights):
      xyz0.requires_grad = False
      xyz1.requires_grad = False
      ws.append(w.sum().item())
      print('in trainer, xyz0: {}, xyz1: {}, pred_pair: {}'.format(xyz0.shape, xyz1.shape, pred_pair))
      predT = GlobalRegistration.weighted_procrustes(
          X=xyz0[pred_pair[:, 0]].to(self.device),
          Y=xyz1[pred_pair[:, 1]].to(self.device),
          w=w,
          eps=np.finfo(np.float32).eps)
      RT.append(predT)

and this is what I got

in trainer, xyz0: torch.Size([1201, 3]), xyz1: torch.Size([1178, 3]), pred_pair: tensor([[  0,  23],
        [  1,   5],
        [  2,   5],
        ...,
        [585, 531],
        [586, 532],
        [587, 533]])
in trainer, xyz0: torch.Size([588, 3]), xyz1: torch.Size([569, 3]), pred_pair: tensor([[   0,  998],
        [   1,  948],
        [   2,   14],
        ...,
        [1188, 1167],
        [1189, 1166],
        [1190, 1072]])

For me, it seems like somehow the pred_pair is swapped since the first pred_pair has indices up to 587 which is the size of the xyz0 in the second.

I verified that I can run the training code of 3DMatch for a while.
Do you have an idea of why this error is happening?

Best,

@dorucia
Copy link
Author

dorucia commented Sep 18, 2020

For more information, it stochastically throws a different type of error like this one.

Traceback (most recent call last):
  File "train.py", line 76, in <module>
    main(config)
  File "train.py", line 55, in main
    trainer.train()
  File "dgr/core/trainer.py", line 135, in train
    self._train_epoch(epoch)
  File "dgr/core/trainer.py", line 241, in _train_epoch
    rot_error = batch_rotation_error(pred_rots, gt_rots)
  File "dgr/core/metrics.py", line 32, in batch_rotation_error
    assert len(rots1) == len(rots2)
AssertionError

My environment is PyTorch 1.5.0, CUDA 10.1.243, python 3.7, ubuntu 18.04, and installed gcc7 as shown in Readme

@dorucia
Copy link
Author

dorucia commented Sep 18, 2020

Found a bug or an issue

For the CollationFunctionFactory in the base_loader.py,
right before collate_pair_fn returns, I printed xyz0, xyz1, and len_batch like this

for x0, x1, lens in zip(xyz0, xyz1, len_batch):
        print('collate xyz0 {}, xyz1 {}, lenb {}'.format(x0.shape, x1.shape, lens))

    return {
        'pcd0': xyz0,
        'pcd1': xyz1,
        'sinput0_C': coords_batch0,
        'sinput0_F': feats_batch0,
        'sinput1_C': coords_batch1,
        'sinput1_F': feats_batch1,
        'correspondences': matching_inds_batch,
        'T_gt': trans_batch,
        'len_batch': len_batch,
        'extra_packages': extra_packages,
    }

and the output is

collate xyz0 torch.Size([473, 3]), xyz1 torch.Size([473, 3]), lenb [473, 473]
collate xyz0 torch.Size([412, 3]), xyz1 torch.Size([414, 3]), lenb [412, 414]
collate xyz0 torch.Size([304, 3]), xyz1 torch.Size([298, 3]), lenb [459, 463]
collate xyz0 torch.Size([459, 3]), xyz1 torch.Size([463, 3]), lenb [411, 407]
collate xyz0 torch.Size([411, 3]), xyz1 torch.Size([407, 3]), lenb [402, 398]
collate xyz0 torch.Size([402, 3]), xyz1 torch.Size([398, 3]), lenb [269, 264]
collate xyz0 torch.Size([339, 3]), xyz1 torch.Size([334, 3]), lenb [339, 334]
collate xyz0 torch.Size([427, 3]), xyz1 torch.Size([425, 3]), lenb [427, 425]
collate xyz0 torch.Size([358, 3]), xyz1 torch.Size([362, 3]), lenb [358, 362]
collate xyz0 torch.Size([369, 3]), xyz1 torch.Size([345, 3]), lenb [296, 295]
collate xyz0 torch.Size([296, 3]), xyz1 torch.Size([295, 3]), lenb [335, 313]
collate xyz0 torch.Size([335, 3]), xyz1 torch.Size([313, 3]), lenb [366, 371]

as we can see, the shape of xyz0 and xyz1 does not match with lenb for some lines.

@dorucia dorucia closed this as completed Sep 18, 2020
@chrischoy chrischoy reopened this Sep 24, 2020
@lombardm
Copy link

lombardm commented Dec 9, 2020

Hi,

I have the same issue, can you please tell me if you solved this and how?

Thanks in advance

@pranavgundewar
Copy link

I am facing a similar issue as well. How did you solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants