Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate_batch throws error: RuntimeError: The size of tensor a (6) must match the size of tensor b (928) ... #68

Closed
jokober opened this issue Jan 7, 2022 · 4 comments

Comments

@jokober
Copy link

jokober commented Jan 7, 2022

Hi, I'm trying to use your tiling tools with my yolov5 model but in the following line I get following error:

self.image[:, y : y + tile_height, x : x + tile_width] += tile * self.weight

RuntimeError: The size of tensor a (6) must match the size of tensor b (928) at non-singleton dimension 2

The debugger shows a tile tensor size of (52983,6) and a weight tensor size of (1, 928,928). What could be the reason for the difference in the tensor size?

Some more infos:
model size: 928x928
image size is 3840*2160
I am leading the model using DetectMultiBackend from yolov5

@BloodAxe
Copy link
Owner

BloodAxe commented Jan 7, 2022 via email

@jokober
Copy link
Author

jokober commented Jan 7, 2022

Sure, it is pretty much the code from your readme:

weights = "./yolov5/runs/train/exp9/weights/best.pt"
device = 1
img_path = "./yolov5_playground/images/10054_721234.png"

image = cv2.imread(img_path)
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=True)

# Cut large image into overlapping tiles
tiler = ImageSlicer(image.shape, tile_size=(928, 928), tile_step=(626,728))

# HCW -> CHW. Optionally, do normalization here
tiles = [tensor_from_rgb_image(tile) for tile in tiler.split(image)]

# Allocate a CUDA buffer for holding entire mask
merger = CudaTileMerger(tiler.target_shape, 1, tiler.weight)

# Run predictions for tiles and accumulate them
for tiles_batch, coords_batch in DataLoader(list(zip(tiles, tiler.crops)), batch_size=15, pin_memory=True):
    tiles_batch = tiles_batch.float().cuda()
    pred_batch = model(tiles_batch)

    merger.integrate_batch(pred_batch, coords_batch)

# Normalize accumulated mask and convert back to numpy
merged_mask = np.moveaxis(to_numpy(merger.merge()), 0, -1).astype(np.uint8)
merged_mask = tiler.crop_to_orignal_size(merged_mask)

@BloodAxe
Copy link
Owner

BloodAxe commented Jan 7, 2022

I'm not quite familiar with YOLO architecture, but I feel this would not work out of the box. The example from the README assumes the model returns a tensor of shape [B, Co, H, W] for input tensor of shape [B, Ci, H, W]. In other words, it expects the model returns same-sized segmentation map (for example). In case when returned feature map is smaller than original image you want to instantiate CudaTileMerger with shape scaled down according to the output stride of your model.

So what you can do:

  1. Find the place in the model where you still have the tensor [B,C,H',W'] before you reshape it. This output should be accumulated into tile merger.
  2. Run all tiles and generated final output features map
  3. Feed this feature map to remaining decoder layers / nms / whatever comes next to get predictions from the entire image.

I believe this is the only correct way to run detection on arbitrary large images. One can run detection for each patch independently, but by doing so one has another problem of merging detection around patch edges.

@jokober
Copy link
Author

jokober commented Jan 12, 2022

Thanks for your helpful explanation! It totally makes sense. I was looking into the yolov5 implementation to get the feature map but as I already did the tiling and dataset preparation using DarkHelp and DarkMark (which support tiling) I just switched to the yolov4 darknet implementation. However I will very likely use try your suggestions for a segmentation task soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants