Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download raster time series from Google Earth Engine #53

Closed
fdetsch opened this issue Oct 20, 2020 · 14 comments
Closed

Download raster time series from Google Earth Engine #53

fdetsch opened this issue Oct 20, 2020 · 14 comments

Comments

@fdetsch
Copy link

fdetsch commented Oct 20, 2020

I am trying to retrieve raster time series of NDWI for a given area of interest and time period. My code runs smoothly when using the VITO backend:

library(openeo)
library(raster)

## area of interest
aoi = list(
  west = 5.61
  , east = 5.66
  , south = 51.97
  , north = 51.99
)


### TERRASCOPE ====

#+ connect
vito = connect(
  host = "https://openeo.vito.be"
  , user = "group"
  , password = "group123"
)

#+ process
p1 = processes()

dat1 = p1$load_collection(
  id = "TERRASCOPE_S2_TOC_V2"
  , spatial_extent = aoi
  , temporal_extent = list("2020-04-01", "2020-04-10")
)

spectral_reduce1 = p1$apply(
  dat1
  , function(x, context) {
    p1$normalized_difference(x[8], x[11])
  }
)

r1 = p1$save_result(
  data = spectral_reduce1
  , format = "NetCDF"
)

#+ download
ofl1 = "vito_s2_ndwi.ncdf"
compute_result(
  r1,
  format = "NetCDF",
  output_file = ofl1
)

#+ import
rst1 = brick(ofl1)
dim(rst1)
# 236 (nrow) 336 (ncol) 4 (nlayers)

Unfortunately, my area of interest lies in the United States, for which VITO doesn't seem to provide Sentinel-2 data. At least, I get

SERVER-ERROR: no fitting raster sources found

from compute_result() when specifying a bounding box located in the US.

Therefore, I modified the above code to use Google's Earth Engine instead of VITO. Here, the apply() based approach to get raster time series for the same target extent as above no longer works:

### GOOGLE EARTH ENGINE ====

#+ connect
gee = connect(
  host = "https://earthengine.openeo.org"
  , user = "group1"
  , password = "test123"
)

#+ process_1
p2 = processes()

dat2 = p2$load_collection(
  id = "COPERNICUS/S2"
  , spatial_extent = aoi
  , temporal_extent = list("2020-04-01", "2020-04-10")
)

spectral_reduce2.1 = p2$apply(dat2, function(x, context) {
  p2$normalized_difference(x["B8"], x["B12"])
})

r2.1 = p2$save_result(
  data = spectral_reduce2.1
  , format = "GTIFF-ZIP"
)

#+ download_1
ofl2 = "gee_s2_ndwi.zip"
compute_result(
  r2.1,
  format = "GTIFF-ZIP",
  output_file = ofl2
)

SERVER-ERROR: Server error: Dimension 'undefined' does not exist.

Using a reduce_dimension() based approach instead, the program finishes successfully, but the resulting GeoTIFF doesn't have the expected four layers (ie. four overflights) and, in addition, has a different spatial resolution:

#+ process_2
spectral_reduce2.2 = p2$reduce_dimension(
  data = dat2
  , reducer = function(x, context) {
    p2$normalized_difference(x["B8"], x["B12"])
  }
  , dimension = "bands"
)

r2.2 = p2$save_result(
  data = spectral_reduce2.2
  , format = "GTIFF-ZIP"
)

#+ download_2
compute_result(
  r2.2,
  format = "GTIFF-ZIP",
  output_file = ofl2
)

#+ import_2
rst2.2 = raster::brick(
  unzip(
    ofl2
    , exdir = tempdir()
  )
)
dim(rst2.2)
# 401 (nrow) 1000 (ncol) 1 (nlayers)

My question here is two-fold:

  1. Is it possible at all to extract raster time series as shown in the VITO based example from Google Earth Engine? Of course, I could create a daily loop over the desired time period and calculate the NDWI on a day-to-day basis, but this involves a considerable overhead regarding recurrent backend queries.
  2. Where exactly does the higher spatial resolution in the Earth Engine based approach come from? Sentinel-2 bands 8 and 12 have different spatial resolutions of 10 and 20 m, respectively. Does VITO favor the lower resolution, while Earth Engine uses the higher?
@flahn
Copy link
Member

flahn commented Oct 20, 2020

To answer those question we might need some help from the service provider as the client software is merely a tool to interact with the back-ends.

Regarding your first question: I'm not sure how the GEE driver serializes the data at the end. My guess is that it can only serialize 2D data, where the dimensions are the spatial ones. @m-mohr might have some more insides here as he knows probably best, how the GEE data cube is serialized at the end. From a client perspective the only workaround would really be the loop approach you suggested.

About the second question: this is also probably up to @m-mohr and @jdries to discuss and elaborate. In general you can up- or downscale the bands in order to make them compatible for further analysis - meaning you need them to have the same resolution. If GEE shows a higher resolution I would assume that the lower resolution band is resampled via a nearest neighbor approach. Since the bands of S2 are aligned you would basically split up one pixel in band 12 into 4 pixel and assign them the each the value of the source pixel. But this is just a guess, I'm not exactly sure how Google Earth Engine handles those conflicts internally. Also I don't know how VITO handles those conflicts.

@jdries
Copy link

jdries commented Oct 21, 2020

Some tips from Terrascope side:

  • please filter bands in load_collection if possible: otherwise you will trigger loading of a lot more bands than you actually use
  • our Sentinel-2 layers all work at 10m resolution in UTM. GEE is 'smarter' and works on lower resolutions by default, for performance. It does have a parameter to change the resolution, but there's still a limit.

The 'SAR processing' example in this notebook:
https://github.com/Open-EO/openeo-usecases/blob/master/final_user_workshop/openeo_finaluserworkshop_livedemo.ipynb
shows my attempt of running something on Terrascope and GEE.

This is how to change projection and resolution in GEE output:
sar_difference(gee,collection_gee,to_decibel=False).download("sar-sigma0-rgb-scaled-gee.tiff",format='GTIFF-THUMB',options={'size':2500,'epsgCode':32631})

@fdetsch
Copy link
Author

fdetsch commented Oct 21, 2020

Thanks both of you for sharing your thoughts on this.

@jdries I wasn't aware that omitting bands when loading the collection caused such an overhead on the backend side even if subsequent calculations use two bands from that collection only – thanks for the hint.

In fact, it seems like GEE returns a higher spatial resolution than VITO by default (not sure how they achieve that, though; I would assume bilinear interpolation, nearest neighbor or the like). At least, when I reproject the GEE result layer from latlon to EPSG:32631, I get

prj2.2 = projectRaster(rst2.2, crs = "+init=epsg:32631")
res(prj2.2)
# 3.43 5.56 (x, y)

Sentinel-2 layers from VITO have the regular 10 m spatial resolution.

Any confirmation on the 2D serialization that @flahn mentioned would still be highly appreciated.

@m-mohr
Copy link
Member

m-mohr commented Oct 21, 2020

Interesting topic! Here are some thought from my side as developer of the GEE back-end (although I have no deeper insights into GEE) and the person who was mainly working on the process specifications.

First of all, the way you are using apply is not specification compliant. I'm not sure why this works with VITO (@jdries), but what you should use here is reduce_dimension, as you already mentioned. apply works on a per-pixel basis and you can't access labels there as you do (e.g. x["B8"]).

spectral_reduce2.1 = p2$apply(dat2, function(x, context) {
  p2$normalized_difference(x["B8"], x["B12"])
})

The error message GEE gives:

SERVER-ERROR: Server error: Dimension 'undefined' does not exist.

is not ideal and should be improved nevertheless.

Regarding the questions I think @flahn already is on the right track.

  1. Currently, the GEE back-end basically creates a mosaic of the remaining data to get a single image (2D with bands) as we can only call the getDownloadUrl/getThumbUrl GEE API calls on single images. Requesting the logs should mention that as a warning: "Compositing the image collection to a single image.". We'd likely get better results if we'd directly run batch jobs on GEE, but that is much more complex and we never had time to implement that. The GEE implementation is only a proof of concept and can't be "completed" with the limited funding we have.

  2. I'm also not 100% sure how GEE works internally, but with the methods I have at hand for downloading files, I don't think I can influence that (reliably).

That's all things I would like to improve, but that would need support from Google and quite a lot of additional time (i.e. funding).

@fdetsch
Copy link
Author

fdetsch commented Oct 21, 2020

@m-mohr Thanks for sharing these insights. Can you elaborate a bit on the compositing procedure, ie. which function is applied (mean, max, etc.) to create the composite? Or is it simply the first/last image in the collection stack?

@m-mohr
Copy link
Member

m-mohr commented Oct 21, 2020

@fdetsch I'm using GEE's mosaic function for Image Collections: https://developers.google.com/earth-engine/apidocs/ee-imagecollection-mosaic
As the documentation is rather short, I also have no clue what GEE is doing...
I've seen that being used in examples and used it, too, but maybe there's a better way of handling this case. I'm always open for suggestions.

@m-mohr
Copy link
Member

m-mohr commented Oct 21, 2020

I just found some information in a GEE guide:

The mosaic() method composites overlapping images according to their order in the collection (last on top). To control the source of pixels in a mosaic (or a composite), use image masks.

I'm not using a mask, so it's the last image. Maybe I should use qualityMosaic() instead of mosaic() for (better?) results? Long term there should be no mosaicing, but as I mentioned above... funding and so on.

@fdetsch
Copy link
Author

fdetsch commented Oct 21, 2020

@jdries (or any of the others) Any idea how to adjust GEE output CRS and resolution in R? I tried to specify options via ... in compute_result() based on your Python code, but had no success so far.

@m-mohr
Copy link
Member

m-mohr commented Oct 23, 2020

I think @flahn can better answer R related questions... :-)

@fdetsch I'd expect that you can set it in save_result, something like:

r1 = p1$save_result(
  data = spectral_reduce1,
  format = "GTIFF-THUMB",
  options = list(size=2500,epsgCode=32631)
)

I'm not a R programmer, the syntax might slightly differ...

@fdetsch
Copy link
Author

fdetsch commented Oct 23, 2020

That was helpful! Image reprojection via epsgCode works like a charm (syntax seems to be correct ;-)).

Unfortunately, size determines the dimensions, ie. number of rows and cols, of the resulting image. Any idea if there is a parameter that controls image resolution, ie. pixel size? I tried res and resolution, but had no success so far. GEE's default resolution when projecting to UTM seems to be

resolution : 56.11531, 56.11531  (x, y)

, ie. much coarser than the expected 10 and 20 m for bands 8 and 12, respectively. This likely confirms the trade-off between file size and performance at GEE mentioned by @jdries in the beginning.

Do you know if such customization options accepted by save_result(options = list(...)) (for any supported client) are documented somewhere?

@fdetsch
Copy link
Author

fdetsch commented Oct 23, 2020

So from what I have found out so far, scale seems to be controlling pixel size on the backend side. Thing is, while epsgCode is taken into account, scale seems to be ignored both in standalone mode (ie. being the only option) and in combination with epsgCode 🤔

@m-mohr
Copy link
Member

m-mohr commented Oct 23, 2020

@fdetsch list_file_formats() gives you the options you can pass for the file formats to save_result, see an example here.

The size option is basically just passed through to https://developers.google.com/earth-engine/apidocs/ee-image-getdownloadurl and https://developers.google.com/earth-engine/apidocs/ee-image-getthumburl, depending on the file format. As you said, size corresponds to the dimensions in GEE.

We have not implemented scale yet. It only works for GTIFF-ZIP and also only works in GEE if crs is not set. We can implement it though, if there's demand for it.

@fdetsch
Copy link
Author

fdetsch commented Oct 26, 2020

Got it, thanks.

Just noticed that I was seemingly using a wrong UTM zone for reprojection although it's convention to use this zone for the area I am working with. Using the neighboring zone, I am down to a resolution of

resolution : 1.927902, 1.927902  (x, y)

(meters), so not losing any information. Resampling to 10 m needs to happen in R, then. I would definitely stand in line for a GEE based implementation, if only it weren't for the "CRS not set" limitation. Setting the target resolution in latlon native units seems a little cumbersome.

@m-mohr I consider this issue closed from my point of view. I'll leave it up to you to close it or not since you included a couple of things above that you are envisioning for the future.

@m-mohr
Copy link
Member

m-mohr commented Oct 26, 2020

I've now created issues in the GEE repository to track all remaining improvements. So closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants