Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic-kitchens_55 error #1

Closed
richwardle opened this issue Feb 27, 2022 · 1 comment
Closed

epic-kitchens_55 error #1

richwardle opened this issue Feb 27, 2022 · 1 comment

Comments

@richwardle
Copy link

Hi,

I'm having this errorError running Epic-Kitchens-55. Have you encountered this before? Thanks

Save file name anti_mod_rgb_span_6_s1_5_s2_3_s3_2_recent_2_r1_1.6_r2_1.2_r3_0.8_r4_0.4_bs_10_drop_0.3_lr_0.0001_dimLa_512_dimLi_512_epoc_15_vb_nn
Printing Arguments
Namespace(add_noun_loss=True, add_verb_loss=True, alpha=1, batch_size=10, best_model='best', debug_on=False, display_every=10, dropout_rate=0.3, ek100=False, epochs=15, img_tmpl='frame_{:010d}.jpg', json_directory='tempAgg_ant_rec//models_anticipation/', latent_dim=512, linear_dim=512, lr=0.0001, modality='rgb', mode='train', noun_class=352, noun_loss_weight=1.0, num_class=2513, num_workers=0, past_attention=True, path_to_data='/content/drive/MyDrive/Individual_Project/Models/RULSTM/rulstm-master/RULSTM/data/ek55', path_to_models='models_anticipation/ek55', recent_dim=2, recent_sec1=1.6, recent_sec2=1.2, recent_sec3=0.8, recent_sec4=0.4, resume=False, scale=True, scale_factor=-0.5, schedule_epoch=10, schedule_on=1, span_dim1=5, span_dim2=3, span_dim3=2, spanning_sec=6, task='action_anticipation', topK=1, trainval=False, verb_class=125, verb_loss_weight=1.0, verb_noun_scores=True, video_feat_dim=1024, weight_flow=0.1, weight_obj=0.25, weight_rgb=0.4, weight_roi=0.25)
Populating Dataset: 100% 23493/23493 [00:33<00:00, 694.22it/s]
Populating Dataset: 100% 4979/4979 [00:07<00:00, 689.38it/s]
Add verb losses
Add noun losses
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "main_anticipation.py", line 674, in
main()
File "main_anticipation.py", line 531, in main
start_epoch, start_best_perf, schedule_on)
File "main_anticipation.py", line 400, in train_validation
loss.backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered

@dibschat
Copy link
Owner

dibschat commented Feb 28, 2022

Hi,

This error occurs when the labels are outside the range of the outputs of the model. I ran the code again and didn't encounter such a problem.
Could you check if you are mistakenly using EPIC-100 annotations? You should download the EPIC-55 (only) annotations from here:
https://github.com/fpv-iplab/rulstm/tree/master/RULSTM/data/ek55

You can print the max label number for verb/noun/action and compare them with the classifier head output sizes to see where the mismatch is.

@dibschat dibschat closed this as completed Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants