integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

jokober · 2022-01-07T16:27:02Z

Hi, I'm trying to use your tiling tools with my yolov5 model but in the following line I get following error:

pytorch-toolbelt/pytorch_toolbelt/inference/tiles.py

Line 341 in cab4fc4

self.image[:, y : y + tile_height, x : x + tile_width] += tile * self.weight

RuntimeError: The size of tensor a (6) must match the size of tensor b (928) at non-singleton dimension 2

The debugger shows a tile tensor size of (52983,6) and a weight tensor size of (1, 928,928). What could be the reason for the difference in the tensor size?

Some more infos:
model size: 928x928
image size is 3840*2160
I am leading the model using DetectMultiBackend from yolov5

The text was updated successfully, but these errors were encountered:

BloodAxe · 2022-01-07T17:21:13Z

Hi! Can you please attach the code snippet that I can use to reproduce the issue? Пт, 7 янв. 2022 г. в 18:27, Jokober ***@***.***>:

…

Hi, I'm trying to use your tiling tools with my yolov5 model but in the following line I get following error: https://github.com/BloodAxe/pytorch-toolbelt/blob/cab4fc4e209d9c9e5db18cf1e01bb979c65cf08b/pytorch_toolbelt/inference/tiles.py#L341 RuntimeError: The size of tensor a (6) must match the size of tensor b (928) at non-singleton dimension 2 The debugger shows a tile tensor size of (52983,6) and a weight tensor size of (1, 928,928). What could be the reason for the difference in the tensor size? *Some more infos:* model size: 928x928 image size is 3840*2160 I am leading the model using DetectMultiBackend from yolov5 — Reply to this email directly, view it on GitHub <#68>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEB6YDH3EPBVTZFQIXKCFDUU4H6BANCNFSM5LPER2IA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

jokober · 2022-01-07T18:12:26Z

Sure, it is pretty much the code from your readme:

weights = "./yolov5/runs/train/exp9/weights/best.pt"
device = 1
img_path = "./yolov5_playground/images/10054_721234.png"

image = cv2.imread(img_path)
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=True)

# Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(928, 928), tile_step=(626,728))

# HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

# Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

# Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=15, pin_memory=True):
    tiles_batch = tiles_batch.float().cuda()
    pred_batch = model(tiles_batch)

    merger.integrate_batch(pred_batch, coords_batch)

# Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)
merged_mask = tiler.crop_to_orignal_size(merged_mask)

BloodAxe · 2022-01-07T22:54:50Z

I'm not quite familiar with YOLO architecture, but I feel this would not work out of the box. The example from the README assumes the model returns a tensor of shape [B, Co, H, W] for input tensor of shape [B, Ci, H, W]. In other words, it expects the model returns same-sized segmentation map (for example). In case when returned feature map is smaller than original image you want to instantiate CudaTileMerger with shape scaled down according to the output stride of your model.

So what you can do:

Find the place in the model where you still have the tensor [B,C,H',W'] before you reshape it. This output should be accumulated into tile merger.
Run all tiles and generated final output features map
Feed this feature map to remaining decoder layers / nms / whatever comes next to get predictions from the entire image.

I believe this is the only correct way to run detection on arbitrary large images. One can run detection for each patch independently, but by doing so one has another problem of merging detection around patch edges.

jokober · 2022-01-12T19:26:48Z

Thanks for your helpful explanation! It totally makes sense. I was looking into the yolov5 implementation to get the feature map but as I already did the tiling and dataset preparation using DarkHelp and DarkMark (which support tiling) I just switched to the yolov4 darknet implementation. However I will very likely use try your suggestions for a segmentation task soon.

BloodAxe closed this as completed Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

jokober commented Jan 7, 2022

BloodAxe commented Jan 7, 2022 via email

jokober commented Jan 7, 2022

BloodAxe commented Jan 7, 2022

jokober commented Jan 12, 2022

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

Comments

jokober commented Jan 7, 2022

BloodAxe commented Jan 7, 2022 via email

jokober commented Jan 7, 2022

BloodAxe commented Jan 7, 2022

jokober commented Jan 12, 2022