Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspaces API #518

Merged
merged 5 commits into from
Oct 21, 2024
Merged

Workspaces API #518

merged 5 commits into from
Oct 21, 2024

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Oct 11, 2023

Implements #135
Partially implements #376
Creates the basics for #450
Related to the processes PR: Open-EO/openeo-processes#485

@jdries
Copy link

jdries commented Nov 6, 2023

Trying to summarize the current proposal:

  • A 'workspace' in openEO is the description of for instance an object storage bucket that stores the actual data
  • Workspaces can be created by registering an existing bucket, or by letting openEO create it for you?
  • Management of the actual data within the workspace is not handled by openEO

@GeraldIr
Copy link
Member

I do have some input on this after implementing our proof-of-concept:

  • I think there's no point in having a user-facing endpoint for changing the quota of a given bucket. This is either irrelevant for external workspaces, or shouldn't be controlled by the user for openeo-created workspaces.

  • This also goes for the "provisioning" status, and by extension for availability of workspaces more generally. Initializing a bucket (for s3 in my experience) is incredibly fast and a user needs to receive their credentials at the end of a call to the endpoint anyways. If a bucket is unavailable it should just not show up in the list workspace endpoint and the creation shouldn't be queued but the call should only return when the workspace is ready (like the synchronous processing endpoint)

  • There should be an endpoint for sharing access to an openeo-created workspace with other openeo users, it should be possible to do this granularly, for instance only sharing the results of a single job, or sharing a single folder/file.

  • There should be an option for storage type when registering an external workspace, so you can specify if it is for instance s3, Azure blob storage or anything else a backend decides to implement along with a list of supported storage types returned in some endpoint. This info is vital for interacting with a workspace for the backend and might not be apparent from the url or credentials.

  • I think saving results in a workspace should be configurable via an option or argument in any relevant result node process. Ideally the user gets a drop-down created from the get /workspaces endpoint for choosing which one to save to (potentially multiple).

These are my initial opinions on this extension. Excuse the delay in my response, but there was not much for me to add before.
If I think of anything else I'll be sure to make it known.

@m-mohr
Copy link
Member Author

m-mohr commented Dec 1, 2023

Thank you for the feedback @GeraldIr. Please see my comments below.

I think there's no point in having a user-facing endpoint for changing the quota of a given bucket. This is either irrelevant for external workspaces, or shouldn't be controlled by the user for openeo-created workspaces.

Okay, but this is purely for PATCH /workspaces/:id, right? I've left it in POST /workspaces for creation.
Also, I kept PATCH /workspaces/:id because I added a title and description for workspaces.
We have that all through openEO and should help to organize and describe the workspaces, especially also for sharing later.

This also goes for the "provisioning" status, and by extension for availability of workspaces more generally. Initializing a bucket (for s3 in my experience) is incredibly fast and a user needs to receive their credentials at the end of a call to the endpoint anyways. If a bucket is unavailable it should just not show up in the list workspace endpoint and the creation shouldn't be queued but the call should only return when the workspace is ready (like the synchronous processing endpoint)

Is this generally the case? We can't just base our assumptions on AWS and need more evidence across more providers. I assume the EOPCA has thought about this a bit more and as such I'd like to keep this as it is until we have more evidence around.

There should be an endpoint for sharing access to an openeo-created workspace with other openeo users, it should be possible to do this granularly, for instance only sharing the results of a single job, or sharing a single folder/file.

We don't really have established sharing in openEO in general. We should probably establish this as a common concept in openEO after creating the general API for workspaces and then also apply this here.

There should be an option for storage type when registering an external workspace, so you can specify if it is for instance s3, Azure blob storage or anything else a backend decides to implement along with a list of supported storage types returned in some endpoint. This info is vital for interacting with a workspace for the backend and might not be apparent from the url or credentials.

So I guess we need something like GET /workspace_types (somparable to GET /service_types)? What would we need to describe in there apart from title and description? And then a user can optionally also hoose the workspace type during workspace creation?

I think saving results in a workspace should be configurable via an option or argument in any relevant result node process. Ideally the user gets a drop-down created from the get /workspaces endpoint for choosing which one to save to (potentially multiple).

How to interact with the workspace in the processes is up to discussion as mentioned in the Teams chat in Oct and Nov. Let's discuss this in the openEO community call on the 6th of December.

Best,
Matthias

@jdries
Copy link

jdries commented Dec 6, 2023

@GeraldIr do you happen to have a pointer to the demo requests that you showcased? Or otherwise the implementation itself?
Have we considered how this will work in a federated setup? Are there calls for other backends to retrieve the info and credentials to be able to store something in a workspace?

@m-mohr m-mohr changed the title Workspaces API + User Collections Workspaces API Dec 8, 2023
@m-mohr
Copy link
Member Author

m-mohr commented Dec 8, 2023

I've updated the PR to only be the workspace API. The following PR adds a process to store data to a workspace:
Open-EO/openeo-processes#485

The User Collections are define now via a STAC API extension + openEO processes:

@GeraldIr
Copy link
Member

GeraldIr commented Jan 2, 2024

This is a short write-up of the main differences between our proof-of-concept and this specification:

Main differences for GET calls between our implementation and this specification are individual metadata descriptions. For instance in our proof-of-concept there is no concept of a storage quota yet.
Nothing major though.

We did have a different endpoint for registering and creating workspaces, although I wouldn't be opposed to combining them into a single POST the logic in the backend for both these actions is fundamentally different, as well as the payload/response.

Creating and deleting are functionally identical otherwise.

Updating the workspace is as of yet unimplemented.

There is also functionality for sharing results/workspace access with other users, this necessitates a "register-user" and a "share-workspace".
Register user allows a user to get credentials without having to create a workspace first, share workspace allows you to open up a workspace for other users.
This way nobody has to send around access credentials for things like this, but n-times use signed links could also be a more abstracted solution which does not require a step on the side of the user the results are being shared with.

Saving results to a workspace is supported via adding a "workspace" option with the workspace title/ID to the save result, but this isn't specifically part of this spec.

@m-mohr
Copy link
Member Author

m-mohr commented Jan 2, 2024

@GeraldIr Do you plan to update your implementation according to this spec? What would you like to change and why? As we are both in SAP07, I think the expectation is that we come up with a common solution.

Apart from register-user and share-workspace, I don't see immediate todos for me. I'm not quite sure yet what register-user does. share-workspace is something I'd not define yet as it's a bigger thing we need to discuss for openEO so a custom solution from your side is fine for me.

Storage quota is optional and comes from the EOEPCA API you've pointed me to.

Also, do you plan to update your implementation to support Open-EO/openeo-processes#485 instead of using the save_result workspace parameter?

@GeraldIr
Copy link
Member

GeraldIr commented Jan 2, 2024

@m-mohr

minIO has a list of users, which in reality is just a list of access credentials (access and secret keys) that are linked directly to policies which allows them to be used for accessing buckets. register-user allows one to create such a set of credentials without the extra step of creating a bucket. Then we can simply add policies for that pair of credentials (which represents a user) and share workspace access that way.

Yes, I think changing away from the workspace parameter would be the preferred way to go so I will be implementing it the proposed way.

@m-mohr
Copy link
Member Author

m-mohr commented Jan 2, 2024

@GeraldIr Thanks.

I effectively don't see a difference whether I submit the credentials during the workspace creation or upfront in register-user and then create the workspace with the user id. In both cases you need to send the credentials through a POST request to the server, in both cases you usually just send it once, right? Can you clarify why the separate register-user request is your preferred way?

I understand that you confirmed updating the processes, but could not understand from your reply whether you aim to align the HTTP APIs or not?

@GeraldIr
Copy link
Member

GeraldIr commented Jan 2, 2024

@m-mohr

Register user is functionally only useful for sharing results, so as long as that isn't in the actual draft you can probably just ignore it. When creating a workspace for the first time that part is abstracted away from the user regardless.

And yes I can align the rest of our API with the specifications, that shouldn't be a problem.

@m-mohr
Copy link
Member Author

m-mohr commented Jan 2, 2024

Great. If you have any feedback for the spec, please let me know. It's not set in stone at all...

@jdries
Copy link

jdries commented Jan 24, 2024

@GeraldIr we would also like to register workspace metadata. Is the source code for the component that you built in the SAP available somewhere? Or how does it integrate with the rest of the platform?

@m-mohr
Copy link
Member Author

m-mohr commented Jan 24, 2024

Generally, @GeraldIr when is it planned to close SAP07?

@GeraldIr
Copy link
Member

We've recently made progress on the backend implementation of workspaces and would be ready to discuss finalizing this draft/working out all the details. (maybe in the next developer meeting for openEO on the 7th of August, or a standalone meeting).

As for a (non-comprehensive) list of topics I think we should discuss and come to a final consensus on:

  • General Management (registration, provisioning etc.) of workspaces
  • Interaction with them in actual openEO jobs (saving and loading data, own processes or options in existing ones)
  • Front End Integration of workspaces (dropdown menus for picking which registered workspace to save to for instance).
  • User Created Collections?
  • Data/Result Sharing

Also if there is any questions about the implementation, which wouldn't fit this thread you can send me a private message and I'm happy to discuss.

@m-mohr
Copy link
Member Author

m-mohr commented Jul 30, 2024

@GeraldIr Based on your current implementation, are any changes required to this PR #518 or Open-EO/openeo-processes#485 ?

Currently, there is no work planned for front end integration into the Web Editor. Nevertheless, is the current implementation deployed somewhere for testing?

@GeraldIr
Copy link
Member

GeraldIr commented Aug 7, 2024

@m-mohr There are some minor differences, like having register workspace and create workspace on two different endpoints, and some major differences in how we handle integration with actual process graphs, but this won't need to be adjusted on your end at all (We handle loading and saving in load_collection and save_result respectively), because it was just more convenient to do it this way for now (We will switch to Open-EO/openeo-processes#485 as soon as that is finalized).

The only required change to the spec that I can see right now is that our implementation relies on is a register_user endpoint where users can just get credentials for the underlying object storage, but without actually creating a workspace as well, so that other users can share their results/workspaces with them.

As for a deployed version, yes, the current version is up and running and demos/tutorials on how to use it can be found here
https://github.com/eodcgmbh/eodc-examples/tree/main/demos/workspaces

If there is any questions or bugs you encounter while looking through this you can just create an issuse over in that repo or bring them up in todays meeting.

@GeraldIr
Copy link
Member

@m-mohr any updates on this?

@m-mohr
Copy link
Member Author

m-mohr commented Aug 26, 2024

@GeraldIr Can you point me to any specific documentation around the register user endpoint?
Is this what is shown https://github.com/eodcgmbh/eodc-examples/blob/main/demos/workspaces/demo-register-workspace.ipynb?

(Anyway, I might have issues working on this before November. This SAP has been delayed so much...)

@m-mohr m-mohr marked this pull request as ready for review September 9, 2024 11:11
@m-mohr
Copy link
Member Author

m-mohr commented Sep 9, 2024

Updated according to the discussions that I had today with @GeraldIr:

  • Added GET /workspace_providers
  • Added type to workspace creation and reponses
  • Added general way to specify parameters for workspace creation, instead of credentials

EODC also implements a register-user endpoint that allows to grant additional users access to a workspace.
As we generally have very weak user management or sharing capabilities, we left it out here but we could eventually adopt it from EODC if others have a need for it.

Copy link

@dthiex dthiex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without having too much context things look reasonable and relatively straight forward to me.

@m-mohr m-mohr merged commit 68fa258 into draft Oct 21, 2024
2 checks passed
@m-mohr m-mohr deleted the workspaces branch October 21, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants