Skip to content

Commit

Permalink
Merge branch 'develop' into 9601-keep-features #9601
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Jul 10, 2023
2 parents 1b6f4b2 + f24c7de commit bbea224
Show file tree
Hide file tree
Showing 115 changed files with 4,579 additions and 1,949 deletions.
2 changes: 2 additions & 0 deletions doc/release-notes/5042-add-mydata-doc-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
An API named 'MyData' is supported by Dataverse. A documentation has been added describing its use (PR #9596)
This API is used to get a list of only the objects (datasets, dataverses or datafiles) that an authenticated user can modify.
3 changes: 3 additions & 0 deletions doc/release-notes/8889-filepids-in-collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
It is now possible to configure registering PIDs for files in individual collections.

For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the [:FilePIDsEnabled](https://guides.dataverse.org/en/latest/installation/config.html#filepidsenabled) section of the Configuration guide for details.
4 changes: 4 additions & 0 deletions doc/release-notes/9431-checksum-alg-in-direct-uploads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Direct upload via the Dataverse UI will now support any algorithm configured via the :FileFixityChecksumAlgorithm setting.
External apps using the direct upload API can now query Dataverse to discover which algorithm should be used.

Sites that have been using an algorithm other than MD5 and direct upload and/or dvwebloader may want to use the /api/admin/updateHashValues call (see https://guides.dataverse.org/en/latest/installation/config.html?highlight=updatehashvalues#filefixitychecksumalgorithm) to replace any MD5 hashes on existing files.
1 change: 1 addition & 0 deletions doc/release-notes/9480-h5web.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A file previewer called H5Web is now available for exploring and visualizing NetCDF and HDF5 files.
3 changes: 3 additions & 0 deletions doc/release-notes/9558-async-indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Performance improvements, especially for large datasets containing thousands of files.
Uploading files one by one to the dataset is much faster now, allowing uploading thousands of files in an acceptable timeframe. Not only uploading a file, but all edit operations on datasets containing many files, got faster.
Performance tweaks include indexing of the datasets in the background and optimizations in the amount of the indexing operations needed. Furthermore, updates to the dateset no longer wait for ingesting to finish. Ingesting was already running in the background, but it took a lock, preventing updating the dataset and degrading performance for datasets containing many files.
6 changes: 6 additions & 0 deletions doc/release-notes/9588-datasets-api-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
The following APIs have been added:

- /api/datasets/summaryFieldNames
- /api/datasets/privateUrlDatasetVersion/{privateUrlToken}
- /api/datasets/privateUrlDatasetVersion/{privateUrlToken}/citation
- /api/datasets/{datasetId}/versions/{version}/citation
5 changes: 5 additions & 0 deletions doc/release-notes/9656-api-optional-dataset-params.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The following fields are now available in the native JSON output:

- alternativePersistentId
- publicationDate
- citationDate
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ Tool Type Scope Description
Data Explorer explore file A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/dataverse-data-explorer-v2 for the instructions on adding Data Explorer to your Dataverse.
Whole Tale explore dataset A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html>`_.
Binder explore dataset Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. `Installation instructions <https://github.com/data-exp-lab/girder_ythub/issues/10>`_ are in the Data Exploration Lab girder_ythub project. See also :ref:`binder`.
File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is <https://hypothes.is/>`_ annotations, images, PDF, text, video, tabular data, spreadsheets, GeoJSON, zip, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers
File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is <https://hypothes.is/>`_ annotations, images, PDF, text, video, tabular data, spreadsheets, GeoJSON, zip, HDF5, NetCDF, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers
Data Curation Tool configure file A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@
{
"locale":"{localeCode}"
}
],
"allowedApiCalls": [
{
"name":"retrieveDatasetJson",
"httpMethod":"GET",
"urlTemplate":"/api/v1/datasets/{datasetId}",
"timeOut":10
}
]
}
]
},
"allowedApiCalls": [
{
"name":"retrieveDatasetJson",
"httpMethod":"GET",
"urlTemplate":"/api/v1/datasets/{datasetId}",
"timeOut":10
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@
{
"locale":"{localeCode}"
}
],
"allowedApiCalls": [
{
"name":"retrieveDataFile",
"httpMethod":"GET",
"urlTemplate":"/api/v1/access/datafile/{fileId}",
"timeOut":270
}
]
}
},
"allowedApiCalls": [
{
"name":"retrieveDataFile",
"httpMethod":"GET",
"urlTemplate":"/api/v1/access/datafile/{fileId}",
"timeOut":270
}
]
}
47 changes: 43 additions & 4 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,28 @@ Creates a link between a dataset and a Dataverse collection (see the :ref:`datas

curl -H "X-Dataverse-key: $API_TOKEN" -X PUT http://$SERVER/api/datasets/$linked-dataset-id/link/$linking-dataverse-alias

List Collections that are Linked from a Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Lists the link(s) created between a dataset and a Dataverse collection (see the :ref:`dataset-linking` section of the User Guide for more information). ::

curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/datasets/$linked-dataset-id/links

It returns a list in the following format:

.. code-block:: json
{
"status": "OK",
"data": {
"dataverses that link to dataset id 56782": [
"crc990 (id 18802)"
]
}
}
.. _unlink-a-dataset:

Unlink a Dataset
^^^^^^^^^^^^^^^^

Expand All @@ -131,15 +153,32 @@ Mint a PID for a File That Does Not Have One
In the following example, the database id of the file is 42::

export FILE_ID=42
curl http://localhost:8080/api/admin/$FILE_ID/registerDataFile
curl "http://localhost:8080/api/admin/$FILE_ID/registerDataFile"

Mint PIDs for all unregistered published files in the specified collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Mint PIDs for Files That Do Not Have Them
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following API will register the PIDs for all the yet unregistered published files in the datasets **directly within the collection** specified by its alias::

If you have a large number of files, you might want to consider miniting PIDs for files individually using the ``registerDataFile`` endpoint above in a for loop, sleeping between each registration::
curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}"

It will not attempt to register the datafiles in its sub-collections, so this call will need to be repeated on any sub-collections where files need to be registered as well. File-level PID registration must be enabled on the collection. (Note that it is possible to have it enabled for a specific collection, even when it is disabled for the Dataverse installation as a whole. See :ref:`collection-attributes-api` in the Native API Guide.)

This API will sleep for 1 second between registration calls by default. A longer sleep interval can be specified with an optional ``sleep=`` parameter::

curl "http://localhost:8080/api/admin/registerDataFiles/{collection_alias}?sleep=5"

Mint PIDs for ALL unregistered files in the database
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following API will attempt to register the PIDs for all the published files in your instance that do not yet have them::

curl http://localhost:8080/api/admin/registerDataFileAll

The application will attempt to sleep for 1 second between registration attempts as not to overload your persistent identifier service provider. Note that if you have a large number of files that need to be registered in your Dataverse, you may want to consider minting file PIDs within indivdual collections, or even for individual files using the ``registerDataFiles`` and/or ``registerDataFile`` endpoints above in a loop, with a longer sleep interval between calls.



Mint a New DOI for a Dataset with a Handle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
9 changes: 9 additions & 0 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,11 @@ Each of the three main sections own sets of properties:
| displayName | Acts as a brief label for display related to this | Should be relatively brief. The limit is 256 character, |
| | #metadataBlock. | but very long names might cause display problems. |
+----------------+---------------------------------------------------------+---------------------------------------------------------+
| displayFacet | Label displayed in the search area when this | Should be brief. Long names will cause display problems |
| | #metadataBlock is configured as a search facet | in the search area. |
| | for a collection. See | |
| | :ref:`the API <metadata-block-facet-api>`. | |
+----------------+---------------------------------------------------------+---------------------------------------------------------+
| blockURI | Associates the properties in a block with an external | The citation #metadataBlock has the blockURI |
| | URI. | https://dataverse.org/schema/citation/ which assigns a |
| | Properties will be assigned the | default global URI to terms such as |
Expand Down Expand Up @@ -452,12 +457,16 @@ metadatablock.name=(the value of **name** property from #metadatablock)

metadatablock.displayName=(the value of **displayName** property from #metadatablock)

metadatablock.displayFacet=(the value of **displayFacet** property from #metadatablock)

example:

metadatablock.name=citation

metadatablock.displayName=Citation Metadata

metadatablock.displayFacet=Citation

#datasetField (field) properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
datasetfieldtype.(the value of **name** property from #datasetField).title=(the value of **title** property from #datasetField)
Expand Down
14 changes: 14 additions & 0 deletions doc/sphinx-guides/source/api/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,20 @@ If you ever want to check an environment variable, you can "echo" it like this:
echo $SERVER_URL
With curl version 7.56.0 and higher, it is recommended to use --form-string with outer quote rather than -F flag without outer quote.

For example, curl command parameter below might cause error such as ``warning: garbage at end of field specification: ,"categories":["Data"]}``.

.. code-block:: bash
-F jsonData={\"description\":\"My description.\",\"categories\":[\"Data\"]}
Instead, use --form-string with outer quote. See https://github.com/curl/curl/issues/2022

.. code-block:: bash
--form-string 'jsonData={"description":"My description.","categories":["Data"]}'
If you don't like curl, don't have curl, or want to use a different programming language, you are encouraged to check out the Python, Javascript, R, and Java options in the :doc:`client-libraries` section.

.. _curl: https://curl.haxx.se
Expand Down
140 changes: 140 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ The fully expanded example above (without environment variables) looks like this
Where :download:`dataverse-facets.json <../_static/api/dataverse-facets.json>` contains a JSON encoded list of metadata keys (e.g. ``["authorName","authorAffiliation"]``).
.. _metadata-block-facet-api:
List Metadata Block Facets Configured for a Dataverse Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -736,6 +738,24 @@ The fully expanded example above (without environment variables) looks like this
curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/guestbookResponses?guestbookId=1 -o myResponses.csv
.. _collection-attributes-api:

Change Collection Attributes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block::
curl -X PUT -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/attribute/$ATTRIBUTE?value=$VALUE"
The following attributes are supported:

* ``alias`` Collection alias
* ``name`` Name
* ``description`` Description
* ``affiliation`` Affiliation
* ``filePIDsEnabled`` ("true" or "false") Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).


Datasets
--------

Expand Down Expand Up @@ -2173,6 +2193,50 @@ Signposting is not supported for draft dataset versions.
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/linkset?persistentId=$PERSISTENT_IDENTIFIER"
Get Dataset By Private URL Token
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PRIVATE_URL_TOKEN=a56444bc-7697-4711-8964-e0577f055fd2
curl "$SERVER_URL/api/datasets/privateUrlDatasetVersion/$PRIVATE_URL_TOKEN"
Get Citation
~~~~~~~~~~~~

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
export VERSION=1.0
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/{version}/citation?persistentId=$PERSISTENT_IDENTIFIER"
Get Citation by Private URL Token
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PRIVATE_URL_TOKEN=a56444bc-7697-4711-8964-e0577f055fd2
curl "$SERVER_URL/api/datasets/privateUrlDatasetVersion/$PRIVATE_URL_TOKEN/citation"
.. _get-dataset-summary-field-names:

Get Summary Field Names
~~~~~~~~~~~~~~~~~~~~~~~

See :ref:`:CustomDatasetSummaryFields` in the Installation Guide for how the list of dataset fields that summarize a dataset can be customized. Here's how to list them:

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
curl "$SERVER_URL/api/datasets/summaryFieldNames"
Files
-----

Expand Down Expand Up @@ -2963,6 +3027,22 @@ The response is a JSON object described in the :doc:`/api/external-tools` sectio
curl -H "X-Dataverse-key: $API_TOKEN" -H "Accept:application/json" "$SERVER_URL/api/files/$FILE_ID/metadata/$FILEMETADATA_ID/toolparams/$TOOL_ID
.. _get-fixity-algorithm:

Get Fixity Algorithm
~~~~~~~~~~~~~~~~~~~~~~

This API call can be used to discover the configured fixity/checksum algorithm being used by a Dataverse installation (as configured by - :ref:`:FileFixityChecksumAlgorithm`).
Currently, the possible values are MD5, SHA-1, SHA-256, and SHA-512.
This algorithm will be used when the Dataverse software manages a file upload and should be used by external clients uploading files to a Dataverse instance. (Existing files may or may not have checksums with this algorithm.)

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
curl "$SERVER_URL/api/files/fixityAlgorithm
Users Token Management
----------------------
Expand Down Expand Up @@ -4351,6 +4431,26 @@ It will report the specific files that have failed the validation. For example::
These are only available to super users.

.. _UpdateChecksums:

Update Checksums To Use New Algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The fixity algorithm used on existing files can be changed by a superuser using this API call. An optional query parameter (num) can be used to limit the number of updates attempted (i.e. to do processing in batches).
The API call will only update the algorithm and checksum for a file if the existing checksum can be validated against the file.
Statistics concerning the updates are returned in the response to the API call with details in the log.
The primary use for this API call is to update existing files after the algorithm used when uploading new files is changes - see - :ref:`:FileFixityChecksumAlgorithm`.
Allowed values are MD5, SHA-1, SHA-256, and SHA-512

.. code-block:: bash
export ALG=SHA-256
export BATCHSIZE=1
curl http://localhost:8080/api/admin/updateHashValues/$ALG
curl http://localhost:8080/api/admin/updateHashValues/$ALG?num=$BATCHSIZE
.. _dataset-validation-api:

Dataset Validation
Expand Down Expand Up @@ -4580,6 +4680,7 @@ A curl example using allowing access to a dataset's metadata
Please see :ref:`dataverse.api.signature-secret` for the configuration option to add a shared secret, enabling extra
security.


.. _send-feedback:

Send Feedback To Contact(s)
Expand All @@ -4605,3 +4706,42 @@ A curl example using an ``ID``
curl -X POST -H 'Content-Type:application/json' -d "$JSON" $SERVER_URL/api/admin/feedback
Note that this call could be useful in coordinating with dataset authors (assuming they are also contacts) as an alternative/addition to the functionality provided by :ref:`return-a-dataset`.


MyData
------

The MyData API is used to get a list of just the datasets, dataverses or datafiles an authenticated user can edit.

A curl example listing objects

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ROLE_IDS=6
export DVOBJECT_TYPES=Dataset
export PUBLISHED_STATES=Unpublished
export PER_PAGE=10
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/mydata/retrieve?role_ids=$ROLE_IDS&dvobject_types=$DVOBJECT_TYPES&published_states=$PUBLISHED_STATES&per_page=$PER_PAGE"
Parameters:

``role_id`` Roles are customizable. Standard roles include:

- ``1`` = Admin
- ``2`` = File Downloader
- ``3`` = Dataverse + Dataset Creator
- ``4`` = Dataverse Creator
- ``5`` = Dataset Creator
- ``6`` = Contributor
- ``7`` = Curator
- ``8`` = Member

``dvobject_types`` Type of object, several possible values among: ``DataFile`` , ``Dataset`` & ``Dataverse`` .

``published_states`` State of the object, several possible values among:``Published`` , ``Unpublished`` , ``Draft`` , ``Deaccessioned`` & ``In+Review`` .

``per_page`` Number of results returned per page.

0 comments on commit bbea224

Please sign in to comment.