Skip to content

Commit

Permalink
Merge pull request #10109 from IQSS/9464-schema-creator-validator
Browse files Browse the repository at this point in the history
JSON Schema creator and validator
  • Loading branch information
jp-tosca committed Dec 5, 2023
2 parents e3e122a + 2379828 commit d514545
Show file tree
Hide file tree
Showing 14 changed files with 692 additions and 33 deletions.
3 changes: 3 additions & 0 deletions doc/release-notes/9464-json-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Functionality has been added to help validate dataset JSON prior to dataset creation. There are two new API endpoints in this release. The first takes in a collection alias and returns a custom dataset schema based on the required fields of the collection. The second takes in a collection alias and a dataset JSON file and does an automated validation of the JSON file against the custom schema for the collection. In this release funtionality is limited to json format validation and validating required elements. Future releases will address field types, controlled vocabulary, etc. (Issue #9464 and #9465)

For documentation see the API changelog: http://preview.guides.gdcc.io/en/develop/api/changelog.html
122 changes: 122 additions & 0 deletions doc/sphinx-guides/source/_static/api/dataset-schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$defs": {
"field": {
"type": "object",
"required": ["typeClass", "multiple", "typeName"],
"properties": {
"value": {
"anyOf": [
{
"type": "array"
},
{
"type": "string"
},
{
"$ref": "#/$defs/field"
}
]
},
"typeClass": {
"type": "string"
},
"multiple": {
"type": "boolean"
},
"typeName": {
"type": "string"
}
}
}
},
"type": "object",
"properties": {
"datasetVersion": {
"type": "object",
"properties": {
"license": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"uri": {
"type": "string",
"format": "uri"
}
},
"required": ["name", "uri"]
},
"metadataBlocks": {
"type": "object",
"properties": {
"citation": {
"type": "object",
"properties": {
"fields": {
"type": "array",
"items": {
"$ref": "#/$defs/field"
},
"minItems": 5,
"allOf": [
{
"contains": {
"properties": {
"typeName": {
"const": "title"
}
}
}
},
{
"contains": {
"properties": {
"typeName": {
"const": "author"
}
}
}
},
{
"contains": {
"properties": {
"typeName": {
"const": "datasetContact"
}
}
}
},
{
"contains": {
"properties": {
"typeName": {
"const": "dsDescription"
}
}
}
},
{
"contains": {
"properties": {
"typeName": {
"const": "subject"
}
}
}
}
]
}
},
"required": ["fields"]
}
},
"required": ["citation"]
}
},
"required": ["metadataBlocks"]
}
},
"required": ["datasetVersion"]
}
13 changes: 9 additions & 4 deletions doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,13 @@ API Changelog
:local:
:depth: 1

6.1
---
v6.1
----

New
~~~
- **/api/dataverses/{id}/datasetSchema**: See :ref:`get-dataset-json-schema`.
- **/api/dataverses/{id}/validateDatasetJson**: See :ref:`validate-dataset-json`.

New
~~~
Expand All @@ -17,8 +22,8 @@ Changes
~~~~~~~
- **/api/datasets/{id}/versions/{versionId}/citation**: This endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain the citation. See :ref:`get-citation`.

6.0
---
v6.0
----

Changes
~~~~~~~
Expand Down
50 changes: 50 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,56 @@ The fully expanded example above (without environment variables) looks like this
.. note:: Previous endpoints ``$SERVER/api/dataverses/$id/metadatablocks/:isRoot`` and ``POST https://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` are deprecated, but supported.

.. _get-dataset-json-schema:

Retrieve a Dataset JSON Schema for a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Retrieves a JSON schema customized for a given collection in order to validate a dataset JSON file prior to creating the dataset. This
first version of the schema only includes required elements and fields. In the future we plan to improve the schema by adding controlled
vocabulary and more robust dataset field format testing:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=root
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/datasetSchema"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/datasetSchema"
Note: you must have "Add Dataset" permission in the given collection to invoke this endpoint.

While it is recommended to download a copy of the JSON Schema from the collection (as above) to account for any fields that have been marked as required, you can also download a minimal :download:`dataset-schema.json <../_static/api/dataset-schema.json>` to get a sense of the schema when no customizations have been made.

.. _validate-dataset-json:

Validate Dataset JSON File for a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validates a dataset JSON file customized for a given collection prior to creating the dataset. The validation only tests for json formatting
and the presence of required elements:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=root
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/dataverses/$ID/validateDatasetJson" -H 'Content-type:application/json' --upload-file dataset.json
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/dataverses/root/validateDatasetJson" -H 'Content-type:application/json' --upload-file dataset.json
Note: you must have "Add Dataset" permission in the given collection to invoke this endpoint.

.. _create-dataset-command:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@
@NamedQuery(name = "DataverseFieldTypeInputLevel.findByDataverseIdDatasetFieldTypeId",
query = "select f from DataverseFieldTypeInputLevel f where f.dataverse.id = :dataverseId and f.datasetFieldType.id = :datasetFieldTypeId"),
@NamedQuery(name = "DataverseFieldTypeInputLevel.findByDataverseIdAndDatasetFieldTypeIdList",
query = "select f from DataverseFieldTypeInputLevel f where f.dataverse.id = :dataverseId and f.datasetFieldType.id in :datasetFieldIdList")

query = "select f from DataverseFieldTypeInputLevel f where f.dataverse.id = :dataverseId and f.datasetFieldType.id in :datasetFieldIdList"),
@NamedQuery(name = "DataverseFieldTypeInputLevel.findRequiredByDataverseId",
query = "select f from DataverseFieldTypeInputLevel f where f.dataverse.id = :dataverseId and f.required = 'true' ")
})
@Table(name="DataverseFieldTypeInputLevel"
, uniqueConstraints={
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,16 @@ public DataverseFieldTypeInputLevel findByDataverseIdDatasetFieldTypeId(Long dat
return null;
}
}

public List<DataverseFieldTypeInputLevel> findRequiredByDataverseId(Long dataverseId) {
Query query = em.createNamedQuery("DataverseFieldTypeInputLevel.findRequiredByDataverseId", DataverseFieldTypeInputLevel.class);
query.setParameter("dataverseId", dataverseId);
try{
return query.getResultList();
} catch ( NoResultException nre ) {
return null;
}
}

public void delete(DataverseFieldTypeInputLevel dataverseFieldTypeInputLevel) {
em.remove(em.merge(dataverseFieldTypeInputLevel));
Expand Down
Loading

0 comments on commit d514545

Please sign in to comment.