Dedicated export processes in the process graph #153

ghost · 2018-11-18T14:14:43Z

The definition of the process graph output format is located at the moment in the preview and job requests. However, IMHO it makes more sense to have export definitions in the process graph itself.

If you have complex process graphs you might want to export intermediate results and statistical analysis data. It is more flexible to have dedicated export processes, in which you can set the export format directly.
With export processes one can export many different formats in a single process graph.

For example, the following process will export all its inputs as GTiff files somewhere within a process graph and pipes the inputs upstream to the next process, so that further processing is possible with the same data:

        "process_id": "raster_export",
        "format": "GTiff",
        "imagery": {
            "process_id": "get_data",
            "data_id": "nc_spm_08.landsat.raster.elevation",
            "imagery": {
                "process_id": "get_data",
                "data_id": "nc_spm_08.landsat.raster.slope"
            }
        }

The text was updated successfully, but these errors were encountered:

m-mohr · 2018-11-18T14:28:53Z

Thanks for the idea. In general, I like it and am always open to replace processing steps by processes, but have some concerns in this case:

We currently only have output formats for jobs and preview. How well does that integrate with the web services?
This could reduce portability of process graphs as format parameters can vary across back-ends. That's a reason why it is separated at the moment. So if we do this, we should push the standardization of the output formats and parameters more. Currently it recommends to follow GDAL.

ghost · 2018-11-18T14:43:05Z

Thanks for the idea. In general, I like it and am always open to replace processing steps by processes, but have some concerns in this case:

We currently only have output formats for jobs and preview. How well does that integrate with the web services?

Can we support job specific and process specific exports at one?
How about: intermediate process graph exports are not listed in web services, but are part of the downloadable data for a specific job. The output definition of a job is used in web services.

This could reduce portability of process graphs as format parameters can vary across back-ends. That's a reason why it is separated at the moment. So if we do this, we should push the standardization of the output formats and parameters more. Currently it recommends to follow GDAL.

The portability of process graphs between backends assumes that they provide exactly the same processes. Since the backends are quite different, there will be processes that are available on one backend but not on another. The process graph approach allow any process design we can imagine and the backend can process. Will there be a collection of standardized process definitions, that each backend has to support?

However, we can either specify that export processes must support a subset of OGR/GDAL formats, or we can say that export processes are backend specific. However, how to standardize unstructured data that may be the result of a specific statistical analysis or machine learn model result?

m-mohr · 2018-11-18T23:34:54Z

Can we support job specific and process specific exports at one?

I guess we could, but I don't really like the idea that there are two places where this can be specified. I think we could just mention in the process description that this is not allowed and throw an error or so. I assume there are more processes that don't really make much sense for webservices.

How about: intermediate process graph exports are not listed in web services, but are part of the downloadable data for a specific job. The output definition of a job is used in web services.

What exactly do you mean with "not listed"? There's only one global place to list processes.

The portability of process graphs between backends assumes that they provide exactly the same processes.

Yes, that's what the consortium is working on in the last weeks (see mails/telcos) and next weeks (in the december sprint for example). So it's definately a strong aim to achieve this! We can't fully make this happen I guess, but to a good extent.

Since the backends are quite different, there will be processes that are available on one backend but not on another.

Sure, that is to be expected, but...

Will there be a collection of standardized process definitions, that each backend has to support?

... there will be a collection of standardized process definitions! We are working on it and I guess you are attending the Dec sprint, so you'll be involved, too. Seems you didn't got all the mails in the last weeks where we discussed the process descriptions for the use cases?

However, we can either specify that export processes must support a subset of OGR/GDAL formats

Maybe we really can do something like that. Maybe we can have specific processes to export via GDAL. We should discuss that with the processes in Dec. I'll add export processes to the proposal and we'll see how things iterate.

or we can say that export processes are backend specific.

As mentioned I don't think this is a good idea. Portability of process graphs is strongly desired for openEO. Back-end specific things should not be in the process graph (that's also a reason why we have process graph variables).

However, how to standardize unstructured data that may be the result of a specific statistical analysis or machine learn model result?

That specific statistiscal analysis should be defined as part of the process, I think. We probably can't solve all portability issues, but I aim to solve as many as possible. ;-)

m-mohr · 2018-12-06T15:46:24Z

Sprint result: We will go this way with the introduction of the new process graph structure. It just makes more sense to have it as a process. This also has some implications on other endpoints:

We don't need a default output format in GET output_formats any longer. That needs to be explicit now.
The output property can be removed from the preview and job endpoints.

…tput format properties in `POST /preview`, `POST /jobs`, `PATCH /jobs` and `GET /jobs/{job_id}` requests removed in favor of export processes. (#153)

m-mohr · 2018-12-06T15:58:32Z

Implemented.

…tput_format see commit 5fcecc1 Removed the default output format in `GET /output_formats` and the output format properties in `POST /preview`, `POST /jobs`, `PATCH /jobs` and `GET /jobs/{job_id}` requests removed in favor of export processes. (Open-EO#153)

m-mohr added this to the v0.4 milestone Nov 18, 2018

m-mohr added job management incl. /result processes Process definitions and descriptions labels Nov 18, 2018

m-mohr added the PSC vote label Nov 30, 2018

m-mohr added a commit that referenced this issue Dec 6, 2018

Removed the default output format in GET /output_formats and the ou…

5fcecc1

…tput format properties in `POST /preview`, `POST /jobs`, `PATCH /jobs` and `GET /jobs/{job_id}` requests removed in favor of export processes. (#153)

m-mohr closed this as completed Dec 6, 2018

m-mohr removed the PSC vote label Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dedicated export processes in the process graph #153

Dedicated export processes in the process graph #153

ghost commented Nov 18, 2018

m-mohr commented Nov 18, 2018

ghost commented Nov 18, 2018

m-mohr commented Nov 18, 2018 •

edited

Loading

m-mohr commented Dec 6, 2018 •

edited

Loading

m-mohr commented Dec 6, 2018

Dedicated export processes in the process graph #153

Dedicated export processes in the process graph #153

Comments

ghost commented Nov 18, 2018

m-mohr commented Nov 18, 2018

ghost commented Nov 18, 2018

m-mohr commented Nov 18, 2018 • edited Loading

m-mohr commented Dec 6, 2018 • edited Loading

m-mohr commented Dec 6, 2018

m-mohr commented Nov 18, 2018 •

edited

Loading

m-mohr commented Dec 6, 2018 •

edited

Loading