RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #1101

sushovanjena · 2022-11-14T22:49:43Z

Traceback (most recent call last):
File "/home/arnav/Sushovan/yolov7-main/train.py", line 622, in
train(hyp, opt, device, tb_writer)
File "/home/arnav/Sushovan/yolov7-main/train.py", line 369, in train
loss, loss_items = compute_loss_ota(pred, targets, imgs) # loss scaled by batch_size changed
File "/home/arnav/Sushovan/yolov7-main/utils/loss.py", line 585, in call
bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)
File "/home/arnav/Sushovan/yolov7-main/utils/loss.py", line 759, in build_targets
from_which_layer = from_which_layer[fg_mask_inboxes]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Please help in solving the error. Do I need to transfer some part of code to gpu ? Then which part ?
Actually, it was working fine in simple GPU, but showing this error in HPC.

jeffacce · 2022-11-15T02:07:25Z

https://github.com/WongKinYiu/yolov7/blob/main/utils/loss.py#L742

Changing this line to

matching_matrix = torch.zeros_like(cost, device="cpu")

worked for me.

sushovanjena · 2022-11-15T14:33:17Z

Bro, it worked for me. Your help is godly. I have been trying to solve this for 4 days. Lots of thanks.
But I don't understand, when the same code was able to run properly on GPU, what problem is it getting on HPC.

sushovanjena · 2022-11-15T14:59:43Z

I am wandering, even if we dont mention device="cpu" explicitly, its gets created in cpu only right ?
why to mention "CPU" explicitly ?

jeffacce · 2022-11-15T18:51:40Z

I think torch.zeros_like(x) by default allocates to the same device as x, which is on GPU in this case.

rakshith-ramagiri · 2022-11-16T10:45:44Z

If you're training P6 models like e6 or w6 or x, then you'll need to change the following lines as well:

1389 - matching_matrix = torch.zeros_like(cost) to matching_matrix = torch.zeros_like(cost, device="cpu")
1543 - matching_matrix = torch.zeros_like(cost) to matching_matrix = torch.zeros_like(cost, device="cpu")

in the same file (utils/loss.py).

fix for WongKinYiu#1101

alexandrerays · 2022-11-24T01:22:14Z

Great, that worked for me!

Manpreetkour95 · 2022-12-08T18:16:29Z

This is occurring due to device issue. I run the same code on colab it worked perfectly fine. But when I tried using AWS it was giving this error. The above answer worked for me. Thanks.

when i used the command python train.py --workers 8 --device 0 --batch-size 16 --data data.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights yolov7x.pt --name yolov7 --hyp data/hyp.scratch.p5.yaml I got this error RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) I modified the loss.py file to automatically get the index of the default GPU selected using torch.device('cuda') function fixes WongKinYiu#1225 WongKinYiu#1224 WongKinYiu#1101 WongKinYiu#1045

Boualytpv · 2022-12-16T13:26:41Z

https://github.com/WongKinYiu/yolov7/blob/main/utils/loss.py#L742

Changing this line to

matching_matrix = torch.zeros_like(cost, device="cpu")

worked for me.

WongKinYiu/yolov7#1101

RANA-ATI · 2022-12-31T08:03:44Z

matching_matrix = torch.zeros_like(cost, device="cuda") at line 742 in loss.py worked for me

etale-cohomology · 2023-01-16T02:11:15Z

matching_matrix = torch.zeros_like(cost, device="cpu")

wasn't working, but

matching_matrix = torch.zeros_like(cost, device="cuda")

did, probably because I had already modified a couple lines in loss.py to set the device to cuda.

HUAYEFE · 2023-08-06T15:03:40Z

如果您正在训练P6像这样的模型e6 or w6 or x，那么您还需要更改以下行：

1389-matching_matrix = torch.zeros_like(cost)到matching_matrix = torch.zeros_like(cost, device="cpu")

1543-matching_matrix = torch.zeros_like(cost)到matching_matrix = torch.zeros_like(cost, device="cpu")

在同一文件中 ( utils/loss.py)。

I spent 3-4 hours trying other methods, and in the end, I successfully ran YOLO7 training w6 on August 6, 2023. Thank you

LeAyky · 2023-08-08T11:02:11Z

如果您正在训练P6像这样的模型e6 or w6 or x，那么您还需要更改以下行：

1389-matching_matrix = torch.zeros_like(cost)到matching_matrix = torch.zeros_like(cost, device="cpu")

1543-matching_matrix = torch.zeros_like(cost)到matching_matrix = torch.zeros_like(cost, device="cpu")

在同一文件中 ( utils/loss.py)。

I spent 3-4 hours trying other methods, and in the end, I successfully ran YOLO7 training w6 on August 6, 2023. Thank you

I feel you. Would really appreciate, if it would get fixed. Thank you! :) @WongKinYiu

indices should be either on cpu or on the same device as the indexed tensor (cpu) WongKinYiu#1101

sushovanjena closed this as completed Nov 15, 2022

sushovanjena reopened this Nov 15, 2022

Mauro-Antonello added a commit to Mauro-Antonello/yolov7 that referenced this issue Nov 23, 2022

fix loss indexing from different devices

c1bd823

fix for WongKinYiu#1101

d246810g2000 mentioned this issue Dec 9, 2022

Yolov7-w6.pt custom training runtime error indices should be either in cpu or on the same device #1228

Closed

This was referenced Dec 9, 2022

fixed the GPU indexing error when running in colab RobEn-AAST/yolov7#1

Merged

fixed the GPU indexing error when running in colab #1229

Open

dariush-bahrami added a commit to dariush-bahrami/Yolov7-training that referenced this issue Dec 23, 2022

fix: 🐛 fix device error in indexed tensor

e552a4a

WongKinYiu/yolov7#1101

dariush-bahrami mentioned this issue Dec 23, 2022

fix: 🐛 fix device error in indexed tensor Chris-hughes10/Yolov7-training#5

Merged

Chris-hughes10 pushed a commit to Chris-hughes10/Yolov7-training that referenced this issue Dec 24, 2022

fix: 🐛 fix device error in indexed tensor (#5)

77eebd8

WongKinYiu/yolov7#1101

Samuel-wei mentioned this issue Mar 30, 2023

Train data by yolov7 in PyTorch BumbleBee-BBStream/PyTorch_YOLOv4#1

Closed

ijdoc added a commit to ijdoc/yolov7 that referenced this issue Apr 2, 2023

Fix WongKinYiu#1101 (comment)

8875587

cnavarrete mentioned this issue Jun 28, 2023

YOLOV7 problem training p6 models #1770

Open

Jacobsolawetz mentioned this issue Aug 1, 2023

Colab notebook error roboflow/roboflow-100-benchmark#51

Closed

Hoku113 mentioned this issue Dec 9, 2023

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) TechC-SugarCane/ImageRecognitionWinApp#38

Closed

yuyanwang-mineral added a commit to yuyanwang-mineral/yolov7 that referenced this issue Jan 20, 2024

Update loss.py

b2209b6

indices should be either on cpu or on the same device as the indexed tensor (cpu) WongKinYiu#1101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #1101

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #1101

sushovanjena commented Nov 14, 2022

jeffacce commented Nov 15, 2022

sushovanjena commented Nov 15, 2022

sushovanjena commented Nov 15, 2022

jeffacce commented Nov 15, 2022

rakshith-ramagiri commented Nov 16, 2022 •

edited

alexandrerays commented Nov 24, 2022

Manpreetkour95 commented Dec 8, 2022

Boualytpv commented Dec 16, 2022

RANA-ATI commented Dec 31, 2022

etale-cohomology commented Jan 16, 2023

HUAYEFE commented Aug 6, 2023

LeAyky commented Aug 8, 2023

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #1101

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #1101

Comments

sushovanjena commented Nov 14, 2022

jeffacce commented Nov 15, 2022

sushovanjena commented Nov 15, 2022

sushovanjena commented Nov 15, 2022

jeffacce commented Nov 15, 2022

rakshith-ramagiri commented Nov 16, 2022 • edited

alexandrerays commented Nov 24, 2022

Manpreetkour95 commented Dec 8, 2022

Boualytpv commented Dec 16, 2022

RANA-ATI commented Dec 31, 2022

etale-cohomology commented Jan 16, 2023

HUAYEFE commented Aug 6, 2023

LeAyky commented Aug 8, 2023

rakshith-ramagiri commented Nov 16, 2022 •

edited