Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing patches when zarr-chunks is not "full" #5

Closed
ClementCaporal opened this issue Apr 25, 2024 · 3 comments
Closed

Missing patches when zarr-chunks is not "full" #5

ClementCaporal opened this issue Apr 25, 2024 · 3 comments

Comments

@ClementCaporal
Copy link

ClementCaporal commented Apr 25, 2024

Related to #3 (Inference Sampler)

The grid created by zds.PatchSampler doesn't take into account border of zarr if the zarr.shape is not a multiple of chunk.shape

Small example:

%load_ext autoreload
%autoreload 2

import zarr
import zarrdataset as zds
from torch.utils.data import DataLoader

filename = r"data.zarr"

# create empty zarr dataset
z = zarr.zeros((1, 1, 6), chunks=(1, 1, 4), dtype='uint8')
zarr.save(filename, z)


patch_size = dict(Z=1, Y=1, X=2)
patch_sampler = zds.PatchSampler(patch_size=patch_size)

my_datasets = zds.ZarrDataset(
    [
    zds.ImagesDatasetSpecs(
        filenames=filename,
        source_axes="ZYX",
        axes="ZYX",
    )
    ],
    patch_sampler=patch_sampler,
    return_positions=True,
    return_worker_id=True
)

my_dataloader = DataLoader(my_datasets,
                    num_workers=0,
                        worker_init_fn=zds.zarrdataset_worker_init_fn,
                    batch_size=1
                    )

for i, (wid, pos, sample) in enumerate(my_dataloader):
    print(pos)

result:

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])

Is there a reason it doesn't return (or is it a bug?)

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
tensor([[[0, 1],
         [0, 1],
         [4, 6]]])
@ClementCaporal
Copy link
Author

ClementCaporal commented Apr 25, 2024

Related to the possible solution of this issue:

In case of

z = zarr.zeros((1, 1, 5), chunks=(1, 1, 4), dtype='uint8')
zarr.save(filename, z)
patch_size = dict(Z=1, Y=1, X=2)

what would you expect as output?

  • Exact grid
tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
tensor([[[0, 1],
         [0, 1],
         [4, 5]]]) # <--- this one is smaller than the model might expect
  • Cropped grid
tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
# <--- But the border of this image won't be represented
  • Adapted grid
tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
tensor([[[0, 1],
         [0, 1],
         [3, 5]]]) # <--- But the img[...,4] will be loaded twice

Intuitively I would choose the adapted grid solution.
This is what I tried to implement here (I can opened a draft pull request just to show the differences) #6

@fercer
Copy link
Collaborator

fercer commented Apr 29, 2024

Hi @ClementCaporal, thanks for noticing this issue!

The reason for missing patches from non-full chunks is highly related to the previous way of computing patch locations based on the chunk size instead of the patch size.
I'll review your pull request and iterate there to find a solution, but I think that it will probably solve by using #4.

Thanks again!

@fercer
Copy link
Collaborator

fercer commented May 7, 2024

This is solved by PR #4, where patch size is used as base to compute the sampleable chunks in the input image.

@fercer fercer closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants