index out of bounds #2

lu-ming-lei · 2019-11-20T12:38:39Z

There was an error called "index out of bounds" when I ran the code.

lu-ming-lei · 2019-11-20T12:42:06Z

error like this:
/opt/conda/conda-bld/pytorch_1570910687650/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [88,0,0], thread: [105,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

PointsCoder · 2020-01-03T16:02:37Z

I also met this error:

/opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/ATen/native/cuda/IndexKernel.cu:53:

lambda ->auto::operator()(int)->auto: block: [134,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

Traceback (most recent call last):

H = torch.matmul(src_centered, src_corr_centered.transpose(2, 1).contiguous()).cpu()

RuntimeError: CUDA error: device-side assert triggered

cvchanghao · 2020-02-12T14:06:32Z

I also met this error:

/opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/ATen/native/cuda/IndexKernel.cu:53:

lambda ->auto::operator()(int)->auto: block: [134,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

Traceback (most recent call last):
H = torch.matmul(src_centered, src_corr_centered.transpose(2, 1).contiguous()).cpu()
RuntimeError: CUDA error: device-side assert triggered

Hi,Did you solved this problem?

PointsCoder · 2020-02-14T13:23:00Z

The 'Index out of bound' error happened because torch.topk sometimes crashed in ill conditions. Actually the error happened in function get_graph_feature:
feature = x.view(batch_size * num_points, -1)[idx, :]
The variable idx computed in function knn:
idx = distance.topk(k=k, dim=-1)[1]
sometimes return out-of-bound values. To verify this, you can either add a assertion or raise an exception after that line.

Replacing topk with sort seems to be stable.

WangYueFt · 2020-02-14T21:00:15Z

The 'Index out of bound' error happened because torch.topk sometimes crashed in ill conditions. Actually the error happened in function get_graph_feature:
feature = x.view(batch_size * num_points, -1)[idx, :]
The variable idx computed in function knn:
idx = distance.topk(k=k, dim=-1)[1]
sometimes return out-of-bound values. To verify this, you can either add a assertion or raise an exception after that line.

Replacing topk with sort seems to be stable.

That's a great observation! Thanks for the suggestions!

wangyujiewj · 2020-02-19T04:59:42Z

The 'Index out of bound' error happened because torch.topk sometimes crashed in ill conditions. Actually the error happened in function get_graph_feature:
feature = x.view(batch_size * num_points, -1)[idx, :]
The variable idx computed in function knn:
idx = distance.topk(k=k, dim=-1)[1]
sometimes return out-of-bound values. To verify this, you can either add a assertion or raise an exception after that line.

Replacing topk with sort seems to be stable.

when I replaced topk with sort, I encountered a new problem:

Traceback (most recent call last):
u, s, v = torch.svd(H[i])
RuntimeError: Lapack Error gesdd : 2 superdiagonals failed to converge. at /opt/conda/conda-bld/pytorch_1549635019666/work/aten/src/TH/generic/THTensorLapack.cpp:493

I still don't know how to solve it....

jl626 · 2020-02-21T00:17:42Z

That lapack error means that your matrix H is ill-conditioned. A simple fix is adding an identity matrix with some scaling values (e.g. torch.eye(n,) * 1e-7)

pebroe · 2020-02-26T15:58:41Z

After having replaced topk with sort, I now got this error:

/opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [479,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.

This is traced back to

/PRNet/model.py", line 453, in forward H = torch.matmul(src_centered, src_corr_centered.transpose(2, 1).contiguous()).cpu()

PointsCoder · 2020-02-27T09:05:05Z

After having replaced topk with sort, I now got this error:

/opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 3]: block: [0,0,0], thread: [479,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.

This is traced back to

/PRNet/model.py", line 453, in forward H = torch.matmul(src_centered, src_corr_centered.transpose(2, 1).contiguous()).cpu()

The traceback of a cuda error can be inaccurate sometimes. A general approach to debug this type of errors is firstly setting CUDA_LAUNCH_BLOCKING=1 when running your program. In this way you can get a more accurate traceback location. Then you can raise an exception at the traceback location to help you catch the invalid index and figure out what happened during training. Hope this can help you.

ShengyuH · 2020-03-25T20:47:03Z

@lu-ming-lei hi, I think it's due to your torch version, it seems some torch versions have unstable torch.svd() behavior, especially when the covariance matrix is ill-conditioned. You can install pytorch1.0.1, this works for me.

WangYueFt · 2020-04-07T02:46:39Z

@lu-ming-lei hi, I think it's due to your torch version, it seems some torch versions have unstable torch.svd() behavior, especially when the covariance matrix is ill-conditioned. You can install pytorch1.0.1, this works for me.

This is to confirm pytorch1.0.1 work for me and I'll close this issue. Feel free to reopen it if it doesn't work.

yangninghua · 2020-08-21T08:08:42Z

@wangyujiewj
Does your "sort" implement this way?

def knn(x, k):
    inner = -2 * torch.matmul(x.transpose(2, 1).contiguous(), x)
    xx = torch.sum(x ** 2, dim=1, keepdim=True)
    distance = -xx - inner - xx.transpose(2, 1).contiguous()

    #idx = distance.topk(k=k, dim=-1)[1]  # (batch_size, num_points, k)

    d_sorted, d_index = torch.sort(distance, dim=-1, descending=True)
    d_i, d_j, d_k = d_sorted.shape
    if d_k > k:
        d_sorted_top = d_sorted[:,:,:20]
        d_index_top = d_index[:,:,:20]
    else:
        d_sorted_top = d_sorted[:,:,:d_k]
        d_index_top = d_index[:,:,:d_k]

    return d_index_top

WangYueFt closed this as completed Apr 7, 2020

SebastianGrans mentioned this issue Apr 17, 2020

RuntimeError: CUDA error: an illegal memory access was encountered pytorch/pytorch#21819

Closed

zgojcic mentioned this issue Jun 1, 2020

CUDA error/SVD not converging #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index out of bounds #2

index out of bounds #2

lu-ming-lei commented Nov 20, 2019

lu-ming-lei commented Nov 20, 2019

PointsCoder commented Jan 3, 2020

cvchanghao commented Feb 12, 2020

PointsCoder commented Feb 14, 2020

WangYueFt commented Feb 14, 2020

wangyujiewj commented Feb 19, 2020

jl626 commented Feb 21, 2020

pebroe commented Feb 26, 2020 •

edited

PointsCoder commented Feb 27, 2020

ShengyuH commented Mar 25, 2020

WangYueFt commented Apr 7, 2020

yangninghua commented Aug 21, 2020

index out of bounds #2

index out of bounds #2

Comments

lu-ming-lei commented Nov 20, 2019

lu-ming-lei commented Nov 20, 2019

PointsCoder commented Jan 3, 2020

cvchanghao commented Feb 12, 2020

PointsCoder commented Feb 14, 2020

WangYueFt commented Feb 14, 2020

wangyujiewj commented Feb 19, 2020

jl626 commented Feb 21, 2020

pebroe commented Feb 26, 2020 • edited

PointsCoder commented Feb 27, 2020

ShengyuH commented Mar 25, 2020

WangYueFt commented Apr 7, 2020

yangninghua commented Aug 21, 2020

pebroe commented Feb 26, 2020 •

edited