Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out feasibility of using geedim for downloading backend #55

Closed
1 of 6 tasks
aazuspan opened this issue Sep 24, 2022 · 2 comments
Closed
1 of 6 tasks

Figure out feasibility of using geedim for downloading backend #55

aazuspan opened this issue Sep 24, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@aazuspan
Copy link
Owner

aazuspan commented Sep 24, 2022

geedim is a Python package that supports downloading EE images with automatic tiling to bypass file size limits. I've been wanting to improve the download system in wxee for a while (see #19), and using geedim might be a good way to do that with the added bonus of removing most of the low-level thread and tempfile management that causes a lot of headaches. Ideally, I would replace the entire image downloading system with geedim, both for to_tif and for to_xarray.

It will be quite a bit of work just to figure out how feasible this is, so I'm going to start keeping track of and checking off potential incompatibilities below as I figure them out.

Possible Issues

  • Parallelizing - geedim uses threads to download tiles of large images whereas wxee uses threads to download images within collections. I'll need to figure out the feasibility of parallelizing on both dimensions or else download speed would tank on large collections of small images, which is the primary focus of wxee.
  • Download progress - geedim tracks progress of image tiles whereas I need to track progress of images in collections (or both would be fine). I give separate progress bars for retrieving data (requesting the download URLs) and the download itself because the URL request can take a lot of time, and I don't think this will be possible with geedim.
  • Tempfiles - I don't believe geedim supports tempfile outputs, but that's typically what you want when converting to xarray. I don't want to have to manage files manually, so I'll need to think more about how this will work. Maybe just create temp directories and download into them?
  • File-per-band - geedim automatically sets filePerBand=False for all downloads. I'll need to do some rewriting to load xarray objects from multi-band images, but that may improve performance on the IO side by reading/writing fewer files.
  • Masking - wxee takes a nodata argument and replaces masked values with that. After downloading, it sets that value in the image metadata or xarray.Dataset. geedim takes a different approach of adding a "FILL_MASK" band to the image before downloading. The advantage of the geedim approach is that you don't need to choose between exporting everything as a float or risking assigning nodata to real values, but it does require downloading more data from EE, and once you actually get the image into xarray and mask it there's no advantage since xarray will promote everything to float64 anyways to accommodate NaN values. I'll probably live with the geedim approach by applying and removing the mask band after downloading, but I should do some experiments to see how that affects performance (and to make sure I'm fully understanding the geedim approach).

Solved Issues

  • Setting filenames - The geedim.MaskedImage class exposes and caches EE properties, so building filenames from metadata is straightforward. The only consideration is that we need to persist that MaskedImage instance throughout the download process to avoid having to retrieve properties multiple times.
@aazuspan aazuspan added the enhancement New feature or request label Sep 24, 2022
@aazuspan aazuspan changed the title Use geedim for downloading backend Figure out feasibility of using geedim for downloading backend Sep 24, 2022
@aazuspan aazuspan self-assigned this Sep 24, 2022
@aazuspan aazuspan added the question Further information is requested label Sep 24, 2022
@aazuspan
Copy link
Owner Author

Worth noting that ee.Image.getDownloadURL now supports a format parameter that can be used to download images directly instead of to an intermediate ZIP. That would further simplify and speed up IO if downloading was done with filePerBand=False.

@aazuspan
Copy link
Owner Author

aazuspan commented Mar 8, 2023

I've decided not to pursue this further. There are a few too many complications in directly integrating wxee and geedim to be worth tackling. I also have reservations around bypassing the Earth Engine file size limits, since those are obviously in place for a reason.

@aazuspan aazuspan closed this as not planned Won't fix, can't repro, duplicate, stale Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant