Skip to content

Commit

Permalink
Merge pull request #10299 from IQSS/10280-get-file-api-extension
Browse files Browse the repository at this point in the history
Update get file endpoint to add a datasetVersion optional query parameter and extend its payload
  • Loading branch information
sekmiller committed Feb 23, 2024
2 parents dc597ae + f49a48a commit de45c13
Show file tree
Hide file tree
Showing 15 changed files with 622 additions and 202 deletions.
10 changes: 10 additions & 0 deletions doc/release-notes/10280-get-file-api-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
The API endpoint `api/files/{id}` has been extended to support the following optional query parameters:

- `includeDeaccessioned`: Indicates whether or not to consider deaccessioned dataset versions in the latest file search. (Default: `false`).
- `returnDatasetVersion`: Indicates whether or not to include the dataset version of the file in the response. (Default: `false`).

A new endpoint `api/files/{id}/versions/{datasetVersionId}` has been created. This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use ``:latest-published``, or ``:latest``, or ``:draft`` or ``1.0`` or any other available version identifier.

The endpoint supports the `includeDeaccessioned` and `returnDatasetVersion` optional query parameters, as does the `api/files/{id}` endpoint.

`api/files/{id}/draft` endpoint is no longer available in favor of the new endpoint `api/files/{id}/versions/{datasetVersionId}`, which can use the version identifier ``:draft`` (`api/files/{id}/versions/:draft`) to obtain the same result.
104 changes: 103 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2722,7 +2722,9 @@ Files
Get JSON Representation of a File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
.. note:: When a file has been assigned a persistent identifier, it can be used in the API. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
This endpoint returns the file metadata present in the latest dataset version.
Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB*:
Expand Down Expand Up @@ -2790,6 +2792,106 @@ The fully expanded example above (without environment variables) looks like this
The file id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``).
By default, files from deaccessioned dataset versions are not included in the search. If no accessible dataset draft version exists, the search of the latest published file will ignore dataset deaccessioned versions unless ``includeDeaccessioned`` query parameter is set to ``true``.
Usage example:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB&includeDeaccessioned=true"
If you want to include the dataset version of the file in the response, there is an optional parameter for this called ``returnDatasetVersion`` whose default value is ``false``.
Usage example:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER&returnDatasetVersion=true"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB&returnDatasetVersion=true"
Get JSON Representation of a File given a Dataset Version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. note:: When a file has been assigned a persistent identifier, it can be used in the API. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use ``:latest-published``, or ``:latest``, or ``:draft`` or ``1.0`` or any other style listed under :ref:`dataset-version-specifiers`.
Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* present in the published dataset version ``1.0``:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export DATASET_VERSION=1.0
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/1.0?persistentId=doi:10.5072/FK2/J8SJZB"
You may obtain a not found error depending on whether or not the specified version exists or you have permission to view it.
By default, files from deaccessioned dataset versions are not included in the search unless ``includeDeaccessioned`` query parameter is set to ``true``.
Usage example:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export DATASET_VERSION=:latest-published
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:latest-published?persistentId=doi:10.5072/FK2/J8SJZB&includeDeaccessioned=true"
If you want to include the dataset version of the file in the response, there is an optional parameter for this called ``returnDatasetVersion`` whose default value is ``false``.
Usage example:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export DATASET_VERSION=:draft
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&returnDatasetVersion=true"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB&returnDatasetVersion=true"
Adding Files
~~~~~~~~~~~~
Expand Down
74 changes: 37 additions & 37 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -545,61 +545,61 @@ public void setDescription(String description) {
fmd.setDescription(description);
}
}

public FileMetadata getDraftFileMetadata() {
FileMetadata latestFileMetadata = getLatestFileMetadata();
if (latestFileMetadata.getDatasetVersion().isDraft()) {
return latestFileMetadata;
}
return null;
}

public FileMetadata getFileMetadata() {
return getLatestFileMetadata();
}

public FileMetadata getLatestFileMetadata() {
FileMetadata fmd = null;
FileMetadata resultFileMetadata = null;

// for newly added or harvested, just return the one fmd
if (fileMetadatas.size() == 1) {
return fileMetadatas.get(0);
}

for (FileMetadata fileMetadata : fileMetadatas) {
// if it finds a draft, return it
if (fileMetadata.getDatasetVersion().getVersionState().equals(VersionState.DRAFT)) {
return fileMetadata;
}

// otherwise return the one with the latest version number
// duplicate logic in getLatestPublishedFileMetadata()
if (fmd == null || fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber() ) > 0 ) {
fmd = fileMetadata;
} else if ((fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber())==0 )&&
( fileMetadata.getDatasetVersion().getMinorVersionNumber().compareTo( fmd.getDatasetVersion().getMinorVersionNumber()) > 0 ) ) {
fmd = fileMetadata;
}
resultFileMetadata = getTheNewerFileMetadata(resultFileMetadata, fileMetadata);
}
return fmd;

return resultFileMetadata;
}

// //Returns null if no published version

public FileMetadata getLatestPublishedFileMetadata() throws UnsupportedOperationException {
FileMetadata fmd = null;

for (FileMetadata fileMetadata : fileMetadatas) {
// if it finds a draft, skip
if (fileMetadata.getDatasetVersion().getVersionState().equals(VersionState.DRAFT)) {
continue;
}

// otherwise return the one with the latest version number
// duplicate logic in getLatestFileMetadata()
if (fmd == null || fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber() ) > 0 ) {
fmd = fileMetadata;
} else if ((fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber())==0 )&&
( fileMetadata.getDatasetVersion().getMinorVersionNumber().compareTo( fmd.getDatasetVersion().getMinorVersionNumber()) > 0 ) ) {
fmd = fileMetadata;
}
}
if(fmd == null) {
FileMetadata resultFileMetadata = fileMetadatas.stream()
.filter(metadata -> !metadata.getDatasetVersion().getVersionState().equals(VersionState.DRAFT))
.reduce(null, DataFile::getTheNewerFileMetadata);

if (resultFileMetadata == null) {
throw new UnsupportedOperationException("No published metadata version for DataFile " + this.getId());
}

return fmd;
return resultFileMetadata;
}

public static FileMetadata getTheNewerFileMetadata(FileMetadata current, FileMetadata candidate) {
if (current == null) {
return candidate;
}

DatasetVersion currentVersion = current.getDatasetVersion();
DatasetVersion candidateVersion = candidate.getDatasetVersion();

if (DatasetVersion.compareByVersion.compare(candidateVersion, currentVersion) > 0) {
return candidate;
}

return current;
}

/**
Expand All @@ -610,7 +610,7 @@ public long getFilesize() {
if (this.filesize == null) {
// -1 means "unknown"
return -1;
}
}
return this.filesize;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,6 @@ public Command<DatasetVersion> handleLatestPublished() {
}

protected DataFile findDataFileOrDie(String id) throws WrappedResponse {

DataFile datafile;
if (id.equals(PERSISTENT_ID_KEY)) {
String persistentId = getRequestParameter(PERSISTENT_ID_KEY.substring(1));
Expand Down
Loading

0 comments on commit de45c13

Please sign in to comment.