Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Prototyping UDF (Debugging) #94

Closed
przell opened this issue Nov 18, 2021 · 4 comments
Closed

Local Prototyping UDF (Debugging) #94

przell opened this issue Nov 18, 2021 · 4 comments
Labels
Feature Request Possible Features to add to the client R4OpenEO Feature Specifcation ready
Milestone

Comments

@przell
Copy link
Member

przell commented Nov 18, 2021

Title Local Prototyping UDF (Debugging)
Date 2021-11-18
Issue #94
Category Debugging
Description OpenEO UDFs allow the user to run arbitrary R code within an openEO process graph. In order to debug, parametrize and validate the function that is sent to an backend the user needs the possiblity to test the function locally. Ideally the user can retreive a subset of the data with the same dimensionality that arrives in the UDF service for local prototyping.
Dependencies openEO API definition
Links Local Backend for testing (#88)
Priority High
Impact High
@przell przell added the Feature Request Possible Features to add to the client label Nov 18, 2021
@przell
Copy link
Member Author

przell commented Nov 30, 2021

With new approach to UDFs (bridge to python).
Idea:
Process Graph... run_udf(, debug = TRUE), return stars object as .Rdata, needs to be saved in user_workspace or returned via synchronous call.

@flahn
Copy link
Member

flahn commented Feb 11, 2022

As discussed internally the retrieveal of sample data is crucial for local prototyping. Therefore we need a function that allows the user to retrieve those data.

There we have different realization choices and face some problems:

  1. configurability: the user defines the process graph or we simply give some options for properties
  2. running the job: sync vs. async
  3. size: how large can the sample data get
  4. result retrieval: in genereal not a problem, but what happens if there are auxillary files that ship metadata
  5. results interpretation: not a problem for a single time instance image, but how is time propagated correctly and coherently amongst back-ends when downloading a serialized raster time series, also maybe band - all that information should resolve into a stars object with which the user can "play-around"
  6. data format: different back-ends will most definetely offer different file formats which will structure relevant dimensional meta data differently

The result interpretation bit might also be relevant for #39 and the immediate creation of a stars object. Unless there is a convenient and well-defined way of doing this, this will cause problems, because every back-end provides the data differently, which results in having different data representations in R which do not properly reflect the data structure in the back-end. @m-mohr For now results must be described as STAC elements. But for serializing raster time series or images with multiple bands, there is no recommended way of describing it, right?

@flahn
Copy link
Member

flahn commented Feb 18, 2022

At this point we are not able to get the exact data that is injected into the UDF, because

  • there is not user filesystem at the moment (2022-02-18)
  • each back-end chunks the data differently to achieve best performance

As an intermediate solution we can retrieve sample data before a UDF shall be run and the user can the experiment with the data that is returned in a convenient way (probably a stars object that will also be used inside the R-UDF).

Maybe as an addition to the list before:
7. due to the complexity of the users UDF function the processing time can be very slow depending on whether the processing has to be done for each element or it vectorized functions can be used

@flahn flahn added this to the v1.2.0 milestone Feb 18, 2022
flahn added a commit that referenced this issue Feb 18, 2022
* compute_results has now parameters 'as_stars' and 'format' to open the
data as stars object
* 'output_file' can be omitted, if so a temporary file will be created
@flahn
Copy link
Member

flahn commented Mar 10, 2022

A first version is now available in the develop branch. You can now get a sample with get_sample(). A vignette with some examples will follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Possible Features to add to the client R4OpenEO Feature Specifcation ready
Projects
None yet
Development

No branches or pull requests

2 participants