Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicated export processes in the process graph #153

Closed
ghost opened this issue Nov 18, 2018 · 5 comments
Closed

Dedicated export processes in the process graph #153

ghost opened this issue Nov 18, 2018 · 5 comments
Labels
job management incl. /result processes Process definitions and descriptions
Milestone

Comments

@ghost
Copy link

ghost commented Nov 18, 2018

The definition of the process graph output format is located at the moment in the preview and job requests. However, IMHO it makes more sense to have export definitions in the process graph itself.

If you have complex process graphs you might want to export intermediate results and statistical analysis data. It is more flexible to have dedicated export processes, in which you can set the export format directly.
With export processes one can export many different formats in a single process graph.

For example, the following process will export all its inputs as GTiff files somewhere within a process graph and pipes the inputs upstream to the next process, so that further processing is possible with the same data:

        "process_id": "raster_export",
        "format": "GTiff",
        "imagery": {
            "process_id": "get_data",
            "data_id": "nc_spm_08.landsat.raster.elevation",
            "imagery": {
                "process_id": "get_data",
                "data_id": "nc_spm_08.landsat.raster.slope"
            }
        }
@m-mohr
Copy link
Member

m-mohr commented Nov 18, 2018

Thanks for the idea. In general, I like it and am always open to replace processing steps by processes, but have some concerns in this case:

  • We currently only have output formats for jobs and preview. How well does that integrate with the web services?
  • This could reduce portability of process graphs as format parameters can vary across back-ends. That's a reason why it is separated at the moment. So if we do this, we should push the standardization of the output formats and parameters more. Currently it recommends to follow GDAL.

@m-mohr m-mohr added this to the v0.4 milestone Nov 18, 2018
@m-mohr m-mohr added job management incl. /result processes Process definitions and descriptions labels Nov 18, 2018
@ghost
Copy link
Author

ghost commented Nov 18, 2018

Thanks for the idea. In general, I like it and am always open to replace processing steps by processes, but have some concerns in this case:

  • We currently only have output formats for jobs and preview. How well does that integrate with the web services?

Can we support job specific and process specific exports at one?
How about: intermediate process graph exports are not listed in web services, but are part of the downloadable data for a specific job. The output definition of a job is used in web services.

  • This could reduce portability of process graphs as format parameters can vary across back-ends. That's a reason why it is separated at the moment. So if we do this, we should push the standardization of the output formats and parameters more. Currently it recommends to follow GDAL.

The portability of process graphs between backends assumes that they provide exactly the same processes. Since the backends are quite different, there will be processes that are available on one backend but not on another. The process graph approach allow any process design we can imagine and the backend can process. Will there be a collection of standardized process definitions, that each backend has to support?

However, we can either specify that export processes must support a subset of OGR/GDAL formats, or we can say that export processes are backend specific. However, how to standardize unstructured data that may be the result of a specific statistical analysis or machine learn model result?

@m-mohr
Copy link
Member

m-mohr commented Nov 18, 2018

Can we support job specific and process specific exports at one?

I guess we could, but I don't really like the idea that there are two places where this can be specified. I think we could just mention in the process description that this is not allowed and throw an error or so. I assume there are more processes that don't really make much sense for webservices.

How about: intermediate process graph exports are not listed in web services, but are part of the downloadable data for a specific job. The output definition of a job is used in web services.

What exactly do you mean with "not listed"? There's only one global place to list processes.

The portability of process graphs between backends assumes that they provide exactly the same processes.

Yes, that's what the consortium is working on in the last weeks (see mails/telcos) and next weeks (in the december sprint for example). So it's definately a strong aim to achieve this! We can't fully make this happen I guess, but to a good extent.

Since the backends are quite different, there will be processes that are available on one backend but not on another.

Sure, that is to be expected, but...

Will there be a collection of standardized process definitions, that each backend has to support?

... there will be a collection of standardized process definitions! We are working on it and I guess you are attending the Dec sprint, so you'll be involved, too. Seems you didn't got all the mails in the last weeks where we discussed the process descriptions for the use cases?

However, we can either specify that export processes must support a subset of OGR/GDAL formats

Maybe we really can do something like that. Maybe we can have specific processes to export via GDAL. We should discuss that with the processes in Dec. I'll add export processes to the proposal and we'll see how things iterate.

or we can say that export processes are backend specific.

As mentioned I don't think this is a good idea. Portability of process graphs is strongly desired for openEO. Back-end specific things should not be in the process graph (that's also a reason why we have process graph variables).

However, how to standardize unstructured data that may be the result of a specific statistical analysis or machine learn model result?

That specific statistiscal analysis should be defined as part of the process, I think. We probably can't solve all portability issues, but I aim to solve as many as possible. ;-)

@m-mohr
Copy link
Member

m-mohr commented Dec 6, 2018

Sprint result: We will go this way with the introduction of the new process graph structure. It just makes more sense to have it as a process. This also has some implications on other endpoints:

  • We don't need a default output format in GET output_formats any longer. That needs to be explicit now.
  • The output property can be removed from the preview and job endpoints.

m-mohr added a commit that referenced this issue Dec 6, 2018
…tput format properties in `POST /preview`, `POST /jobs`, `PATCH /jobs` and `GET /jobs/{job_id}` requests removed in favor of export processes. (#153)
@m-mohr
Copy link
Member

m-mohr commented Dec 6, 2018

Implemented.

@m-mohr m-mohr closed this as completed Dec 6, 2018
soxofaan added a commit to soxofaan/openeo-api that referenced this issue Jun 27, 2019
…tput_format

see commit 5fcecc1

    Removed the default output format in `GET /output_formats` and the output format properties in `POST /preview`, `POST /jobs`, `PATCH /jobs` and `GET /jobs/{job_id}` requests removed in favor of export processes. (Open-EO#153)
@m-mohr m-mohr removed the PSC vote label Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
job management incl. /result processes Process definitions and descriptions
Projects
None yet
Development

No branches or pull requests

1 participant