Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add updateIdMetadata method to MNAdmin API #1766

Closed
taojing2002 opened this issue Jan 3, 2024 · 9 comments
Closed

Add updateIdMetadata method to MNAdmin API #1766

taojing2002 opened this issue Jan 3, 2024 · 9 comments
Assignees
Milestone

Comments

@taojing2002
Copy link
Contributor

We are removing the Metacat replication servlet, which has the methods to update the Datacite metadata documents for DOIs. So we need a new method in MNAdmin API to allow operator to do the same tasks.

Here is the proposed method:

updateDatacite(session, option, pid[], formatId[])

GET /updateDatacite?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]

This method triggers an updating task to ensure the all or partial (specified by either a list of pids or format ids) DOI objects' metadata is up-to-date. If the caller specify option all, both pid and formatId parameters will be ignored. If the caller specify option pids, the formatId parameter will be ignored. For many implementations, this method would only queue an asynchronous updating task, and not wait for the task to complete.

Parameters

  • option a choice among all, pids and formatIds
  • pid a repeatable list of identifiers of DOI objects to be updated. It will be ignored if the option is all or formatIds.
  • formatId a repeatable list of identifiers of object format. Any DOI objects with those format ids will be updated. It will be ignored if the option is all or pids.
  • Returns Boolean TRUE if the updating request is scheduled
  • Return type: Types.Boolean
  • Raises:
    - Exceptions.NotImplemented – The service is not implemented. (errorCode=501, detailCode=????)
    - Exceptions.ServiceFailure – (errorCode=500, detailCode=????)
    - Exceptions.InvalidRequest – The request was malformed. (errorCode=400, detailCode=????)
    - - Exceptions.NotAuthorized – The supplied identity information is not authorized for the requested operation. (errorCode=400, detailCode=????)
@taojing2002 taojing2002 self-assigned this Jan 3, 2024
@taojing2002 taojing2002 added this to the 3.0.0 milestone Jan 3, 2024
@artntek
Copy link
Contributor

artntek commented Jan 4, 2024

The proposed REST call (GET /updateDatacite?option=...) doesn't comply with RESTful design principles:

  • Instead of including the verb ("update") in the name updateDatacite, the verb "update" should really be represented by the HTTP method used.

  • With this in mind, I think GET is not the correct method. A GET should never cause any change of state on the object - it should simply be an idempotent read operation.

  • It seems like POST, PUT or PATCH might be more appropriate. Here are some descriptions copied from this site:

    • HTTP POST -- Used to create a new resource into the collection of resources. POST is neither safe nor idempotent, and invoking two identical POST requests will result in two different resources containing the same information (except resource ids)

    • HTTP PUT -- used primarily to update an existing resource (if the resource does not exist, then API may decide to create a new resource or not)

    • HTTP PATCH -- used to make a partial update on a resource (i.e. for partially updating an existing resource; whereas PUT is used for replacing a resource in its entirety.)

From these descriptions it sounds like PATCH might fit best, but my understanding is that it's not widely supported by browsers etc. In that case, maybe PUT?

In summary - I don't think GET is right, but I don't think there's a clear "right answer" from the other options - so am totally open to discussion.

Once that's decided, the REST call might look like this (Assuming we use PUT - just for an example)

PUT /datacite/batchUpdate?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]

Followup Question

Is there any reason we can't accept a mix of pids and formatIds in the same REST call? Could we allow the caller to mix them, instead of having to define ?option=?

@mbjones
Copy link
Member

mbjones commented Jan 9, 2024

I agree with much of what Matthew said. Let's discuss this before you implement.

In addition, REST is all about managing resources, rather than modeling remote procedure calls (RPC). So each API should be against a specific resource or collection of resources. Typically that is the objects resource collection in DataONE, but in this case, the resource of interest is the identifiers collection. In addition, DataCite is not the only identifier minting authority that we might use for registering identifier metadata -- we also use EZID and OSTI, and those may be extended to non-DOI types like ARKs in the future. So, a method that works on an identifiers collection and does a PUT to a specific identifier, and then based on that identifier knows which service to contact would be best. So, how about something like:

For a single identifier, the traditional REST structure would look like:

PUT /identifiers/{pid}

For the collection-wide operation, it might be more like:

PUT /identifiers?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]

Note there is no method like "batchUpdate" in a REST style API -- instead, we are modifying either a single resource or a whole collection. Let's discuss.

@taojing2002 taojing2002 changed the title Add updateDatacite method to MNAdmin API Add updateIdsMetadata method to MNAdmin API Jan 23, 2024
@taojing2002 taojing2002 changed the title Add updateIdsMetadata method to MNAdmin API Add updateIdMetadata method to MNAdmin API Jan 23, 2024
@taojing2002
Copy link
Contributor Author

In our today's backend developers' meeting, we revised the rest API:

PUT /identifiers/doi/?[all=true] | [&pid={pid1}] | [&formatId={formatId1}]

//examples
PUT /identifiers/doi/?all=true  //update everything
PUT /identifiers/doi/?pid={pid1}&pid={pid2}  //update pid1 and pid2
PUT /identifiers/doi/?formatId={formatId1}&formatId={formatId2} //update dois with formatId1 or formatId2
// process all elements in a mixed list:
PUT /identifiers/doi/?formatId={formatId1}&pid={pid2} 
// formatIds and pids ignored, since `all=true` updates everything anyway
PUT /identifiers/doi/?all=true&formatId={formatId}&pid={pid2}
PUT /identifiers/doi/?all=false&pid={pid2} //update pid2

Methods:
updateIdMetadata(Session session, Enum type, String[] pids, String[] formatIds)
Parameters:
type a type of the identifier, e.g. DOI
pid a repeatable list of identifiers of objects to be updated
formatId a repeatable list of identifiers of object format. Any objects of the type with those format ids will be updated.
Returns Boolean TRUE if the updating request is scheduled
Return type: Types.Boolean
Raises:

updateAllIdMetadata(Session session, Enum type)
Parameters:
type a type of the identifier, e.g. DOI
Returns Boolean TRUE if the updating request is scheduled
Return type: Types.Boolean
Raises:

@artntek
Copy link
Contributor

artntek commented Jan 24, 2024

LGTM!

@mbjones
Copy link
Member

mbjones commented Jan 24, 2024

Overall looks good, but I think you can omit the DOI part of the rest url. The service should be able to tell what kind of identifier it is.

@taojing2002
Copy link
Contributor Author

@mbjones @artntek
If we don't have DOI part of the rest url, how we interpret those two requests:

PUT /identifiers/?all=true
PUT /identifiers/doi/?formatId=formatId1

Does the first one mean we update metadata for all identifiers of all supported types (e.g. doi, ark)?
Does the second one mean we update metadata for all identifiers of all supported types which have the formatId1 format?

@mbjones
Copy link
Member

mbjones commented Jan 30, 2024

Earlier, I proposed:

For a single identifier, the service will update that one identifier's metadata based on its type (e.g., DOIs go one place and arks go another):

PUT /identifiers/{pid}

For the collection-wide operation, we can update multiple at once, and for each PID, the right location to update is determined by the identifier type:

PUT /identifiers?option={all | pids | formatIds} [&pid={pid}] [&formatId={formatId}]

So, under that scenario, your two calls are below in the proposed REST syntax:

PUT /identifiers/?option=all
PUT /identifiers/?option=formatId&formatId=formatId1

Let's discuss on slack or in our meeting and summarize here when we reach a conclusion.

@taojing2002
Copy link
Contributor Author

Final Version:

Rest Calls:

PUT /identifiers/{pid1}  
PUT /identifiers[/]?[all=true] | [&pid={pid1}] | [&formatId={formatId1}]


PUT /identifiers/{pid1}     //update pid1
PUT /identifiers/?all=true  //update everything
PUT /identifiers/?pid={pid1}&pid={pid2}  //update pid1 and pid2
PUT /identifiers/?formatId={formatId1}&formatId={formatId2} //update dois with formatId1 or formatId2

PUT /identifiers?formatId={formatId1}&pid={pid2} //process all elements in a mixed list
PUT /identifiers?all=true&formatId={formatId}&pid={pid2} //formatIds and pids ignored, since `all=true` updates everything anyway
PUT /identifiers?all=false&pid={pid2} //update pid2

Two Methods:
updateIdMetadata(Session session, String[] pids, String[] formatIds)
Parameters:
pid a repeatable list of identifiers of objects to be updated
formatId a repeatable list of identifiers of object format. Any objects of the type with those format ids will be updated.
Returns Boolean TRUE if the updating request is scheduled
Return type: Types.Boolean
Raises:

updateAllIdMetadata(Session session)
Parameters:
Returns Boolean TRUE if the updating request is scheduled
Return type: Types.Boolean
Raises:

@artntek
Copy link
Contributor

artntek commented Feb 8, 2024

need to add this to admin guide documentation too, since it's not part of D1 API docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants