Add rechunking example #47

jrbourbeau · 2024-05-15T18:08:55Z

This example reads in 1 TB worth of NVM data, rechunks it to be optimized for time selections, and then writes the rechunked dataset to S3 (in oss-scratch-space in us-east-1).

cc @mrocklin. Happy to keep iterating

review-notebook-app · 2024-05-15T18:09:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

mrocklin · 2024-05-15T19:10:56Z

Playing now. Neat. Some thoughts!

It might make sense to arrange the data to make spatial access cheap.

I think that the most common situation I've heard from people is "My satellite pumps out one file every day/hour, so it's organized by time, but I want it organized spatially, so that I can pick out a timeseries for a lat/lon pair really easily.
Maybe at the end we can open up the data with just zarr/xarray without Dask, and show that it's really cheap to get these timeseries, for example from a web application (what they seem to all want to do). I'm actually a little curious about sub-chunk access times. It may be that we want to store the zarr array with far finer chunking than Dask would want so that we're not accessing a bunch of neighboring lat/lon pairs at once. Maybe Xarray does this by default, but maybe not. My hope is that we could show ~100ms access times for little tiny timeseries'.
Thoughts on combining this into the geospatial notebook? I can imagine that in many cases it'll be nice to go from one example to the next, and I wouldn't mind consolidating example notebooks a little.

mrocklin · 2024-05-15T19:27:20Z

Oh, I guess the rechunking isn't very impressive though, because it's mostly chunked in this way already ...

Maybe we keep with time-optimized then but maybe some of the other feedback still holds?

jrbourbeau · 2024-05-17T18:22:57Z

@mrocklin you made some changes offline to this notebook -- want to push up those changes here, or to a different PR (whichever is easiest)?

mrocklin · 2024-05-23T15:18:12Z

I've merged your rechunk example to the xarray example.

jrbourbeau · 2024-05-23T15:29:24Z

Thanks @mrocklin -- I pushed up one minor update in #49

Add rechunking example

b6a1376

mrocklin added 2 commits May 23, 2024 10:17

add rechunking example to xarray example

0ccfa8e

remove rechunk notebook

78928c2

mrocklin merged commit 1094cf7 into main May 23, 2024
3 checks passed

mrocklin deleted the rechunking branch May 23, 2024 15:26

jrbourbeau mentioned this pull request May 23, 2024

Remove stray rechunk line #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rechunking example #47

Add rechunking example #47

jrbourbeau commented May 15, 2024

review-notebook-app bot commented May 15, 2024

mrocklin commented May 15, 2024

mrocklin commented May 15, 2024

jrbourbeau commented May 17, 2024

mrocklin commented May 23, 2024

jrbourbeau commented May 23, 2024

Add rechunking example #47

Add rechunking example #47

Conversation

jrbourbeau commented May 15, 2024

review-notebook-app bot commented May 15, 2024

mrocklin commented May 15, 2024

mrocklin commented May 15, 2024

jrbourbeau commented May 17, 2024

mrocklin commented May 23, 2024

jrbourbeau commented May 23, 2024