Skip to content

Conversation

@meggart
Copy link
Collaborator

@meggart meggart commented Apr 25, 2022

This is an attempt to solve the use case brought up here: https://discourse.julialang.org/t/is-it-possible-to-index-into-a-set-of-columns-of-a-3d-array-in-a-single-line/75695 , where one wants to access a random batch of indices from a DiskArray. Simple loops won't help here because of the high latency, so it is best to first find all affected chunks and then read chunk by chunk.

Here I implemented the function disk_getindex_batch, which would support this workflow and I extended the normal getindex to work on (partial) vectors of CertesianIndex and Boolean masks. So the following are possible and fast although data are remote:

using Zarr
a = zopen("https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr")
ar = a["air_temperature_2m"]
size(ar)
#Index with vector of CartesianIndex
indstoread = [CartesianIndex(rand(1000:1100),rand(300:400)) for _ in 1:1000]
ar[indstoread,:]
#Index with mask that has lower dim than array
mask = falses(1440,720)
mask[200:202,500:502] .= true
mask[300:305,400:405] .= true
ar[mask,:]

Still missing are unit tests and maybe documentation.

@meggart meggart mentioned this pull request May 11, 2022
@meggart
Copy link
Collaborator Author

meggart commented May 13, 2022

This is starting to come into shape. In particular it will help for many use cases in #61 . For example the following code

using Zarr
a = zopen("https://s3.bgc-jena.mpg.de:9000/esdl-esdc-v2.1.1/esdc-8d-0.25deg-1x720x1440-2.1.1.zarr")["air_temperature_2m"]
av = view(a,:,:,1:200:1840)
av[:,:,:]

runs pretty fast now. The data is chunked with chunk size 1 along time and when reading from the view only the affected chunks will be transfered from the remote source.

@meggart meggart merged commit 69b7a09 into master Jun 3, 2022
@meggart meggart changed the title WIP: Batch getindex Batch getindex Jun 13, 2022
@rafaqz rafaqz deleted the batch_getindex branch January 30, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants