Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #1101

Open
sushovanjena opened this issue Nov 14, 2022 · 12 comments

Comments

@sushovanjena
Copy link

Traceback (most recent call last):
File "/home/arnav/Sushovan/yolov7-main/train.py", line 622, in
train(hyp, opt, device, tb_writer)
File "/home/arnav/Sushovan/yolov7-main/train.py", line 369, in train
loss, loss_items = compute_loss_ota(pred, targets, imgs) # loss scaled by batch_size changed
File "/home/arnav/Sushovan/yolov7-main/utils/loss.py", line 585, in call
bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)
File "/home/arnav/Sushovan/yolov7-main/utils/loss.py", line 759, in build_targets
from_which_layer = from_which_layer[fg_mask_inboxes]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Please help in solving the error. Do I need to transfer some part of code to gpu ? Then which part ?
Actually, it was working fine in simple GPU, but showing this error in HPC.

@jeffacce
Copy link

https://github.com/WongKinYiu/yolov7/blob/main/utils/loss.py#L742

Changing this line to

matching_matrix = torch.zeros_like(cost, device="cpu")

worked for me.

@sushovanjena
Copy link
Author

Bro, it worked for me. Your help is godly. I have been trying to solve this for 4 days. Lots of thanks.
But I don't understand, when the same code was able to run properly on GPU, what problem is it getting on HPC.

@sushovanjena
Copy link
Author

I am wandering, even if we dont mention device="cpu" explicitly, its gets created in cpu only right ?
why to mention "CPU" explicitly ?

@jeffacce
Copy link

I think torch.zeros_like(x) by default allocates to the same device as x, which is on GPU in this case.

@rakshith-ramagiri
Copy link

rakshith-ramagiri commented Nov 16, 2022

If you're training P6 models like e6 or w6 or x, then you'll need to change the following lines as well:

  • 1389 - matching_matrix = torch.zeros_like(cost) to matching_matrix = torch.zeros_like(cost, device="cpu")
  • 1543 - matching_matrix = torch.zeros_like(cost) to matching_matrix = torch.zeros_like(cost, device="cpu")

in the same file (utils/loss.py).

Mauro-Antonello added a commit to Mauro-Antonello/yolov7 that referenced this issue Nov 23, 2022
@alexandrerays
Copy link

Great, that worked for me!

@Manpreetkour95
Copy link

This is occurring due to device issue. I run the same code on colab it worked perfectly fine. But when I tried using AWS it was giving this error. The above answer worked for me. Thanks.

mhwahdan added a commit to RobEn-AAST/yolov7 that referenced this issue Dec 9, 2022
when i used the command 

python train.py --workers 8 --device 0 --batch-size 16 --data data.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights yolov7x.pt --name yolov7 --hyp data/hyp.scratch.p5.yaml

I got this error

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

I modified the loss.py file to automatically get the index of the default GPU selected using torch.device('cuda') function

fixes WongKinYiu#1225 WongKinYiu#1224 WongKinYiu#1101 WongKinYiu#1045
@Boualytpv
Copy link

https://github.com/WongKinYiu/yolov7/blob/main/utils/loss.py#L742

Changing this line to

matching_matrix = torch.zeros_like(cost, device="cpu")

worked for me.

@RANA-ATI
Copy link

matching_matrix = torch.zeros_like(cost, device="cuda") at line 742 in loss.py worked for me

@etale-cohomology
Copy link

matching_matrix = torch.zeros_like(cost, device="cpu")

wasn't working, but

matching_matrix = torch.zeros_like(cost, device="cuda")

did, probably because I had already modified a couple lines in loss.py to set the device to cuda.

@HUAYEFE
Copy link

HUAYEFE commented Aug 6, 2023

如果您正在训练P6像这样的模型e6 or w6 or x,那么您还需要更改以下行:

  • 1389-matching_matrix = torch.zeros_like(cost)matching_matrix = torch.zeros_like(cost, device="cpu")
  • 1543-matching_matrix = torch.zeros_like(cost)matching_matrix = torch.zeros_like(cost, device="cpu")

在同一文件中 ( utils/loss.py)。

I spent 3-4 hours trying other methods, and in the end, I successfully ran YOLO7 training w6 on August 6, 2023. Thank you

@LeAyky
Copy link

LeAyky commented Aug 8, 2023

如果您正在训练P6像这样的模型e6 or w6 or x,那么您还需要更改以下行:

  • 1389-matching_matrix = torch.zeros_like(cost)matching_matrix = torch.zeros_like(cost, device="cpu")
  • 1543-matching_matrix = torch.zeros_like(cost)matching_matrix = torch.zeros_like(cost, device="cpu")

在同一文件中 ( utils/loss.py)。

I spent 3-4 hours trying other methods, and in the end, I successfully ran YOLO7 training w6 on August 6, 2023. Thank you

I feel you. Would really appreciate, if it would get fixed. Thank you! :) @WongKinYiu

yuyanwang-mineral added a commit to yuyanwang-mineral/yolov7 that referenced this issue Jan 20, 2024
indices should be either on cpu or on the same device as the indexed tensor (cpu)
WongKinYiu#1101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants