Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yolov7-w6.pt custom training runtime error indices should be either in cpu or on the same device #1228

Closed
dsbyprateekg opened this issue Dec 9, 2022 · 9 comments

Comments

@dsbyprateekg
Copy link

Hi,

Custom training with the W6 weight file is giving me the following error in Colab:-

image

@d246810g2000
Copy link

This answer can solve your problem:

#1101 (comment)

@dsbyprateekg
Copy link
Author

dsbyprateekg commented Dec 9, 2022

@d246810g2000 No, I need to use GPU.
So can you please tell me the exact line in loss.py to change for W6 weight file?
Also it does not solve my issue.

@d246810g2000
Copy link

d246810g2000 commented Dec 9, 2022

You have two solutions:

from_which_layer.append(torch.ones(size=(len(b),)) * i)

from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda'))

fg_mask_inboxes = matching_matrix.sum(0) > 0.0

add a line after 756 to put fg_mask_inboxes on your cuda device
fg_mask_inboxes = fg_mask_inboxes.to(torch.device('cuda'))

@dsbyprateekg
Copy link
Author

You have two solutions:

from_which_layer.append(torch.ones(size=(len(b),)) * i)

from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda'))

fg_mask_inboxes = matching_matrix.sum(0) > 0.0

add a line after 756 to put fg_mask_inboxes on your cuda device
fg_mask_inboxes = fg_mask_inboxes.to(torch.device('cuda'))

I am still getting the same error-
image

Please find attached my loss.py file with the changes.
loss.txt

Can you please check and let me know if I have missed something?

@dsbyprateekg
Copy link
Author

Just a quick update, I also changed line no 1336 to from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda')) and after that my issue is resolved.

@rimaexo
Copy link

rimaexo commented Dec 15, 2022

could you send me the updated file I am still getting the same error

@ayansaha280
Copy link

I am still getting the same error-
Epoch gpu_mem box obj cls total labels img_size
0% 0/5 [00:09<?, ?it/s]
Traceback (most recent call last):
File "train_aux.py", line 612, in
train(hyp, opt, device, tb_writer)
File "train_aux.py", line 362, in train
loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) # loss scaled by batch_size
File "/content/gdrive/MyDrive/Capstone Project22-23 Group-5/Note Book /1st try/yolov7/utils/loss.py", line 1205, in call
bs_aux, as_aux_, gjs_aux, gis_aux, targets_aux, anchors_aux = self.build_targets2(p[:self.nl], targets, imgs)
File "/content/gdrive/MyDrive/Capstone Project22-23 Group-5/Note Book /1st try/yolov7/utils/loss.py", line 1557, in build_targets2
from_which_layer = from_which_layer[fg_mask_inboxes]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

@dsbyprateekg
Copy link
Author

@rimaexo and @ayansaha280 please use attached loss file-
loss_updated_w6.txt

@JFMeyer2k
Copy link

JFMeyer2k commented May 23, 2023

For reference, I pulled the most recent version of YOLOv7 from main at 2023-05-23.
When I train with yolov7-w6.pt or yolov7-e6e.pt using train_aux.py, I get the same error the authors mention here.

I followed the suggestions above:

  1. within loss.py replace from_which_layer.append(torch.ones(size=(len(b),)) * i) with from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda'))
  2. Adding the line fg_mask_inboxes = fg_mask_inboxes.to(torch.device('cuda')) after fg_mask_inboxes = matching_matrix.sum(0) > 0.0 (in the current version of loss.py, its line 756 and the code reads fg_mask_inboxes = (matching_matrix.sum(0) > 0.0).to(device)
  3. Change line 1336 to from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda')). In the current code it is line 1330, which is from_which_layer.append(torch.ones(size=(len(b),)) * i)

However, the same error occurs. Finally, I replaced the loss.py file with the one (loss_updated_w6.txt) shared by dsbyprateekg and it worked. I also tested e6e and it works too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants