Skip to content

Commit

Permalink
Merge pull request #10336 from PaulBoon/RetentionPeriod
Browse files Browse the repository at this point in the history
IQSS 9375 - Retention period
  • Loading branch information
sekmiller committed May 1, 2024
2 parents e615050 + 8f65a46 commit a329f29
Show file tree
Hide file tree
Showing 45 changed files with 1,476 additions and 117 deletions.
3 changes: 2 additions & 1 deletion conf/solr/9.3.0/schema.xml
Expand Up @@ -157,7 +157,8 @@
<field name="publicationStatus" type="string" stored="true" indexed="true" multiValued="true"/>
<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>
<field name="embargoEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="retentionEndDate" type="plong" stored="true" indexed="true" multiValued="false"/>

<field name="subtreePaths" type="string" stored="true" indexed="true" multiValued="true"/>

<field name="fileName" type="text_en" stored="true" indexed="true" multiValued="true"/>
Expand Down
8 changes: 8 additions & 0 deletions doc/release-notes/9375-retention-period.md
@@ -0,0 +1,8 @@
The Dataverse Software now supports file-level retention periods. The ability to set retention periods, with a minimum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the [Retention Periods section](https://guides.dataverse.org/en/6.3/user/dataset-management.html#retention-periods) of the Dataverse Software Guides.

- Users can configure a specific retention period, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the 'Retention Period' menu item and entering information in a popup dialog. Retention Periods can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.

- After the retention period expires, files can not be previewed or downloaded (as if restricted, with no option to allow access requests). The file (landing) page and all the metadata remains available.


Release notes should mention that a Solr schema update is needed.
38 changes: 36 additions & 2 deletions doc/sphinx-guides/source/api/native-api.rst
Expand Up @@ -1228,6 +1228,7 @@ File access filtering is also optionally supported. In particular, by the follow
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``
* ``RetentionPeriodExpired``

If no filter is specified, the files will match all of the above categories.

Expand Down Expand Up @@ -1277,7 +1278,7 @@ The returned file counts are based on different criteria:
- Per content type
- Per category name
- Per tabular tag name
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic, RetentionPeriodExpired)

.. code-block:: bash
Expand Down Expand Up @@ -1331,6 +1332,7 @@ File access filtering is also optionally supported. In particular, by the follow
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``
* ``RetentionPeriodExpired``

If no filter is specified, the files will match all of the above categories.

Expand Down Expand Up @@ -2146,6 +2148,7 @@ File access filtering is also optionally supported. In particular, by the follow
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``
* ``RetentionPeriodExpired``

If no filter is specified, the files will match all of the above categories.

Expand Down Expand Up @@ -2583,7 +2586,38 @@ The API call requires a Json body that includes the list of the fileIds that the
export JSON='{"fileIds":[300,301]}'
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-embargo?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
Set a Retention Period on Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/datasets/$dataset-id/files/actions/:set-retention`` can be used to set a retention period on one or more files in a dataset. Retention periods can be set on files that are only in a draft dataset version (and are not in any previously published version) by anyone who can edit the dataset. The same API call can be used by a superuser to add a retention period to files that have already been released as part of a previously published dataset version.

The API call requires a Json body that includes the retention period's end date (dateUnavailable), a short reason (optional), and a list of the fileIds that the retention period should be set on. The dateUnavailable must be after the current date and the duration (dateUnavailable - today's date) must be larger than the value specified by the :ref:`:MinRetentionDurationInMonths` setting. All files listed must be in the specified dataset. For example:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON='{"dateUnavailable":"2051-12-31", "reason":"Standard project retention period", "fileIds":[300,301,302]}'
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:set-retention?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
Remove a Retention Period on Files in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/datasets/$dataset-id/files/actions/:unset-retention`` can be used to remove a retention period on one or more files in a dataset. Retention periods can be removed from files that are only in a draft dataset version (and are not in any previously published version) by anyone who can edit the dataset. The same API call can be used by a superuser to remove retention periods from files that have already been released as part of a previously published dataset version.

The API call requires a Json body that includes the list of the fileIds that the retention period should be removed from. All files listed must be in the specified dataset. For example:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON='{"fileIds":[300,301]}'
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-retention?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
.. _Archival Status API:

Expand Down
12 changes: 12 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Expand Up @@ -4549,6 +4549,18 @@ can enter for an embargo end date. This limit will be enforced in the popup dial

``curl -X PUT -d 24 http://localhost:8080/api/admin/settings/:MaxEmbargoDurationInMonths``

.. _:MinRetentionDurationInMonths:

:MinRetentionDurationInMonths
+++++++++++++++++++++++++++++

This setting controls whether retention periods are allowed in a Dataverse instance and can limit the minimum duration users are allowed to specify. A value of 0 months or non-existent
setting indicates retention periods are not supported. A value of -1 allows retention periods of any length. Any other value indicates the minimum number of months (from the current date) a user
can enter for a retention period end date. This limit will be enforced in the popup dialog in which users enter the retention period end date. For example, to set a ten year minimum:

``curl -X PUT -d 120 http://localhost:8080/api/admin/settings/:MinRetentionDurationInMonths``


:DataverseMetadataValidatorScript
+++++++++++++++++++++++++++++++++

Expand Down
8 changes: 8 additions & 0 deletions doc/sphinx-guides/source/user/dataset-management.rst
Expand Up @@ -735,6 +735,14 @@ Once a dataset with embargoed files has been published, no further action is nee

As the primary use case of embargoes is to make the existence of data known now, with a promise (to a journal, project team, etc.) that the data itself will become available at a given future date, users cannot change an embargo once a dataset version is published. Dataverse instance administrators do have the ability to correct mistakes and make changes if/when circumstances warrant.

Retention Periods
=================

Support for file-level retention periods can also be configured in a Dataverse instance. Retention periods make file content inaccessible after the retention period end date. This means that file previews and the ability to download files will be blocked. The effect is similar to when a file is restricted except that the retention periods will end at the specified date without further action and after the retention periods expires, requests for file access cannot be made.

Retention periods are intended to support use cases where files must be made unavailable - and in most cases destroyed, e.g. to meet legal requirements - after a certain period or date.
Actual destruction is not automatically handled, but would have to be done on the storage if needed.

Dataset Versions
================

Expand Down
12 changes: 12 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Expand Up @@ -242,6 +242,18 @@ public void setEmbargo(Embargo embargo) {
this.embargo = embargo;
}

@ManyToOne
@JoinColumn(name="retention_id")
private Retention retention;

public Retention getRetention() {
return retention;
}

public void setRetention(Retention retention) {
this.retention = retention;
}

public DataFile() {
this.fileMetadatas = new ArrayList<>();
initFileReplaceAttributes();
Expand Down
Expand Up @@ -1366,7 +1366,10 @@ public Embargo findEmbargo(Long id) {
DataFile d = find(id);
return d.getEmbargo();
}


public boolean isRetentionExpired(FileMetadata fm) {
return FileUtil.isRetentionExpired(fm);
}
/**
* Checks if the supplied DvObjectContainer (Dataset or Collection; although
* only collection-level storage quotas are officially supported as of now)
Expand Down

0 comments on commit a329f29

Please sign in to comment.