Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch job options/configuration settings #276

Open
soxofaan opened this issue Apr 23, 2020 · 18 comments · May be fixed by #471
Open

Batch job options/configuration settings #276

soxofaan opened this issue Apr 23, 2020 · 18 comments · May be fixed by #471
Assignees
Milestone

Comments

@soxofaan
Copy link
Member

Currently in VITO backend we support (backend specific) batch job options for example to finetune spark driver and executor memory limits. These are provided in the POST /jobs request in a "job_options" field, e.g. something like this:

POST /jobs
{
    "process": { .... },
    "job_options": {"driver-memory": "10g", "executor-memory": "13g"},
     ....

This is also supported in the python client (e.g. see https://github.com/Open-EO/openeo-python-client/blob/1171465c69f85f3c225719a6664c7a3865076ff1/openeo/rest/connection.py#L463-L471)

The actual options and usage of this is of course highly backend specific, but it's probably a good idea to reserve this field "job_options" in official spec.

@m-mohr
Copy link
Member

m-mohr commented Apr 23, 2020

The JSON object in the body of /jobs (and other similar endpoints) is intentionally open (i.e. additionalProperties is implicitly set to true) for such additions. Whether you add job_options or put for example driver-memory on the top-level is pretty back-end specific and thus I don't see any to do for the API spec. I'm open to discussing this once multiple back-ends and clients implement this, although I wouldn't want to call this "job_options". Additionally, you'd need an endpoint to expose what options are available, see /services / /service_types and the configuration property for a similar construct.

By the way, the client parameter additional was meant to be merged with the request body object. Not sure whether that's documented though.

@m-mohr m-mohr closed this as completed Apr 23, 2020
@m-mohr m-mohr added this to the future milestone Apr 23, 2020
@soxofaan
Copy link
Member Author

That and how we handle it in our VITO backend is of course our business.

But we've also put that "feature" it in the python client (because we need it).
I was just wondering if there were any issues with implementing backend specific extensions in a general client.

@soxofaan
Copy link
Member Author

Also: I'm not pushing to include this in the 1.0.0 API. If there is consensus to include it in a future version, that's fine too.

@m-mohr
Copy link
Member

m-mohr commented Apr 23, 2020

@soxofaan I just added a clarification to the client library guidelines that the additional parameter should be merged with the request body: https://openeo.org/documentation/1.0/developers/clients/library-guidelines.html#parameters-3 (seems the page build hasn't finished yet)

This behavior was actually foreseen in the API and was meant to be used as follows:

So for example:
createJob(process, title, description, plan, budget, {"driver-memory": "10g", "executor-memory": "13g"})
would lead to the following request:

POST /jobs
{
    "process": { .... },
    "driver-memory": "10g",
    "executor-memory": "13g",
     ....

It just lacked the actual wording that additional was meant to be merged with the request body.

If you want to stick with job_options on the back-end side, I'd propose to just use the Python client as follows:
createJob(process, title, description, plan, budget, {"job_options": {"driver-memory": "10g", "executor-memory": "13g"}})

That would make at least the clients compatible.

@m-mohr
Copy link
Member

m-mohr commented Apr 23, 2020

The page build hasn't finished yet:

The addition is:

  • additional in createJob and createService (also below in updateJob and updateService): May hold additional key-value-pairs as object. The object should be merged with the body of the request.

And yes, that came from a time where we experimented with custom fields (i.e. publishing resources to the public with a custom publish: true/false flag).

@soxofaan
Copy link
Member Author

interesting, thanks

I think we'll stick with "namespacing" these job configuration things under a toplevel "job_options" field

@m-mohr
Copy link
Member

m-mohr commented Nov 15, 2022

Reopening this with the idea to allow JSON-schema based exploration of additional options through the /jobs and /services endpoints (we could also add two additional endpoints, if you'd like), similar to how secondary service parameters are defined.

Please note that the primary use case is still top-level options, but it's still possible to expose the job options with small caveats in the rendered UIs.

Example for the VITO job options (I'm not sure about the exact options allowed) exposed in GET /jobs:

{
  "jobs": [...],
  "links": [...],
  "create_parameters": [
    {
      "name": "job_options",
      "schema": {
        "type": "object",
        "properties": {
          "driver-memory": {
            "type": "string",
            "description": "...",
            "default": "2g"
          },
          "executor-memory": {
            "type": "string",
            "description": "...",
            "default": "2g"
          },
          "large-scale": {
            "type": "boolean",
            "default": false
          }
        }
      },
      "description": "...",
      "optional": true,
      "default": {}
    }
  ]
}

@m-mohr
Copy link
Member

m-mohr commented Nov 15, 2022

Thoughts, @jdries and @soxofaan ? PR is open at #471

@m-mohr m-mohr modified the milestones: future, 1.2.0 Nov 15, 2022
@m-mohr m-mohr self-assigned this Nov 15, 2022
m-mohr added a commit that referenced this issue Nov 15, 2022
… define the parameters that can be submitted suring creation #276
@m-mohr m-mohr linked a pull request Nov 15, 2022 that will close this issue
@m-mohr m-mohr linked a pull request Nov 15, 2022 that will close this issue
@soxofaan
Copy link
Member Author

  "create_parameters": [
    {
      "name": "job_options",

That means that each backend can choose the actual field name of these job options ("job_options" in this example).
Would it be possible to enforce a choice here, e.g. at API level, or at level of federation extension?

@m-mohr
Copy link
Member

m-mohr commented Nov 15, 2022

Your example is not ideal, because in the API the job and service options should be top-level, but you've chosen to do group them in one parameter as an object. So yes, this is free to choose and the place where you could standardize this is the openEO Platform API/federation contract. job_options would not be standardized as part of the openEO API.

@soxofaan
Copy link
Member Author

in the API the job and service options should be top-level, but you've chosen to do group them in one parameter as an object.

Indeed, that's because it feels much cleaner like that.
For example see openEOPlatform/documentation#53 for a sample of job options we currently support (note that there are more experimental/ad-hoc options and flags not documented there). It doesn't feel very clean or maintainable to put all these at the top level of the request

@m-mohr
Copy link
Member

m-mohr commented Nov 21, 2022

Honestly, I don't see a difference between both approaches.

@soxofaan
Copy link
Member Author

Another example: if we are talking about "large area" processing in the aggregator, maybe some options are intended towards the aggregator, some other towards the VITO backend, some others towards the EODC backend, etc. In such a context it feels just more maintainable to have some kind of grouping or tree structure to cleanly separate options.

@m-mohr
Copy link
Member

m-mohr commented Nov 21, 2022

I don't quite get what your aim is? The top-level approach has been in the API spec since 0.3 or so and your way of grouping is actually supported, but the grouping itself is not standardized. So are we good or what is your expectation?

The aggregator surely can expose them in multiple groups, too (e.g. eodc_job_options, vito_job_options) and merge what is specified the same way. On the other hand, grouping them in the aggregator may also feel weird from a user perspective... But the aggregator thing is also a completely new story to be discussed separately, I think.

@soxofaan
Copy link
Member Author

I understand that anything is possible and that a backend can pick any approach to their liking,
but my concern is how to handle that freedom properly in clients and whether differences between backends will be perceived as inconsistent or annoying.

for example, in the python client, Connection.create_job() has an argument additional (a dictionary of job options) which will be grouped under a top-level field job_options in the request. I'm not sure how clean it will be to also allow the non-grouped variant in a non-breaking way.

@m-mohr
Copy link
Member

m-mohr commented Nov 24, 2022

That's the issue if implementors choose to not follow the spec and even integrate proprietary stuff into only one of the clients... The "additional" thing has been there since years and now a decision VITO has taken later should not enforce a change in the spec, sorry.

The JS client for example also has an additional flag for the top level fields and the GEE driver uses the top level fields to make resources public.

@jdries
Copy link

jdries commented Feb 8, 2023

We committed a change in the geotrellis backend to also support top-level job_options. This is backward compatible, and once it's rolled out we can also adjust the Python client to prefer that style.

PS: our proposal for specifying job_options predates the clarifications made to the spec:
Open-EO/openeo-geopyspark-driver@627b380
Meaning, we never intentionally 'choose' not to follow the spec, in fact the spec choose not to follow the only implementation at that time. You even said in this issue that you were open to discussing it when other backends would also require these extra properties, but that never happened. We were even so kind to point out the lack of clear specification in the first place. So case closed for us, but just don't assume that we're somehow implementing changes that intentionally go against the spec.

@m-mohr
Copy link
Member

m-mohr commented Feb 8, 2023

@jdries Sorry, I acknowledge that this wasn't bad intention from your side. I'm also always very happy that VITO is very active in pointing out issues and giving feedback. The primary reason for this misunderstanding is that ReDoc earlier showed the "additional properties for batch job" thing in the rendered version nicely and that indicated additional properties should go top-level (through the additionalProperties: true in the openAPI document). At some point the renderer lost this ability and now it doesn't explicitly show this anymore. This is generally an issue as ReDoc doesn't show everything that is formally specified in OpenAPI. Thus the rendered version can never be the only source of truth, unfortunately.

in fact the spec choose not to follow the only implementation at that time.

This is not true. The spec fully allows job_options in the PR and GEE still has an implementation that uses the top-level fields.

Anyway, great that we are coming to a conclusion here so that we can push the open PRs and issues to harmonize this in openEO. Thanks!

@m-mohr m-mohr modified the milestones: 1.3.0, 1.2.0 Feb 10, 2023
@m-mohr m-mohr modified the milestones: 1.2.0, 1.3.0 May 2, 2023
@soxofaan soxofaan removed their assignment Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants