Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema creator and validator #10109

Merged
merged 45 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
61abac1
#9464 create json
sekmiller Nov 1, 2023
6ba4ef5
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 1, 2023
38f09f6
#9464 fix json schema formatting
sekmiller Nov 2, 2023
5ca4cc0
#9464 remove license from required
sekmiller Nov 2, 2023
02a570a
#9464 Add commands, endpoints, IT, etc
sekmiller Nov 8, 2023
42e055f
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 8, 2023
521e8d2
#9464 delete test dataverse
sekmiller Nov 8, 2023
7c630f7
#9464 add release note
sekmiller Nov 8, 2023
720b3b0
add doc for get schema
sekmiller Nov 8, 2023
7be5347
#9464 fix typo
sekmiller Nov 8, 2023
c553d1b
Add permission note
sekmiller Nov 8, 2023
a080f84
#9464 add doc for validate json
sekmiller Nov 9, 2023
7d38366
#9464 add strings to bundle
sekmiller Nov 9, 2023
7887a05
#9464 simplify commands
sekmiller Nov 9, 2023
437e7cc
#9464 remove unused import
sekmiller Nov 13, 2023
d7fccf7
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 17, 2023
73593ac
#9464 query by dvo. update IT
sekmiller Nov 17, 2023
33aefff
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 17, 2023
e4ede35
#9464 fix logger reference
sekmiller Nov 20, 2023
766c9c3
#9464 add base schema as a file
sekmiller Nov 20, 2023
c82faf9
#9464 fix formatting
sekmiller Nov 21, 2023
44a07a3
#9464 more code cleanup
sekmiller Nov 21, 2023
7d687e9
#9464 third time's the charm?
sekmiller Nov 21, 2023
3bc5ef7
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 21, 2023
e501845
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 27, 2023
212baf2
#9464 return json object as api response
sekmiller Nov 27, 2023
9367026
#9464 revert harvesting changes made in error
sekmiller Nov 27, 2023
b7a3e78
add dataset JSON Schema to API guide, add test #9464
pdurbin Nov 27, 2023
2d3f7ab
just return the JSON Schema, don't wrap in "data, message" #9464
pdurbin Nov 27, 2023
0a77e2a
tweak docs #9464
pdurbin Nov 27, 2023
7db3629
removing trailing newline #9464
pdurbin Nov 27, 2023
194945b
remove cruft (unused) #9464
pdurbin Nov 28, 2023
c1bd009
format code (no-op) #9464
pdurbin Nov 28, 2023
c4d9b6e
add new endpoints to API changelog #9464
pdurbin Nov 28, 2023
45df764
tweak release note #9464
pdurbin Nov 28, 2023
d8e327d
add "v" to make anchor links meaningful #9464 #10060
pdurbin Nov 28, 2023
866b5ea
Adds -X POST on the docs for validateDatasetJson
jp-tosca Nov 28, 2023
e235257
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 30, 2023
2c41687
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Nov 30, 2023
547d71c
#9464 add more detail to validation error message
sekmiller Dec 4, 2023
c9374f3
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Dec 4, 2023
7697157
#9464 handle single errors
sekmiller Dec 4, 2023
e3bff3c
Merge branch 'develop' into 9464-schema-creator-validator
sekmiller Dec 5, 2023
c54a85f
#9464 add caveats to release note.
sekmiller Dec 5, 2023
2379828
Update native-api.rst
sekmiller Dec 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/release-notes/9464-json-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Functionality has been added to help validate dataset json prior to dataset creation. There are two new API endpoints in this release. The first takes in a Dataverse Collection alias and returns a custom schema based on the required fields of the collection.
The second takes in a Dataverse collection alias and a dataset json file and does an automated validation of the json file against the custom schema for the collection. (Issue 9464 and 9465)

45 changes: 45 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,51 @@ The fully expanded example above (without environment variables) looks like this

.. note:: Previous endpoints ``$SERVER/api/dataverses/$id/metadatablocks/:isRoot`` and ``POST https://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` are deprecated, but supported.

.. _get-dataset-json-schema:

Retrieve a JSON schema for a Collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Retrieve a JSON schema for a Collection
Retrieve a JSON Schema for a Collection

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Retrieves a JSON schema customized for a given Dataverse collection in order to validate a Dataset JSON file prior to creating the dataset:

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=root

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/datasetSchema"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/datasetSchema"

Note: you must have Add Dataset permission in the given Dataverse collection to invoke this endpoint.

.. _validate-dataset-json:

Validate Dataset.json file for a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Validates a Dataset json file customized for a given Dataverse collection prior to creating the dataset:

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=root

curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/dataverses/$ID/validateDatasetJson" --upload-file dataset.json -H 'Content-type:application/json'

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/validateDatasetJson" --upload-file dataset.json -H 'Content-type:application/json'

Note: you must have Add Dataset permission in the given Dataverse collection to invoke this endpoint.

.. _create-dataset-command:

Expand Down
233 changes: 233 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DataverseServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import edu.harvard.iq.dataverse.search.IndexServiceBean;
import edu.harvard.iq.dataverse.search.SolrIndexServiceBean;
import edu.harvard.iq.dataverse.search.SolrSearchResult;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.StringUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
import java.io.File;
Expand All @@ -42,7 +43,13 @@
import jakarta.persistence.NonUniqueResultException;
import jakarta.persistence.PersistenceContext;
import jakarta.persistence.TypedQuery;
import org.apache.commons.lang3.StringUtils;
import org.apache.solr.client.solrj.SolrServerException;
import org.everit.json.schema.Schema;
import org.everit.json.schema.ValidationException;
import org.everit.json.schema.loader.SchemaLoader;
import org.json.JSONObject;
import org.json.JSONTokener;

/**
*
Expand Down Expand Up @@ -80,6 +87,9 @@ public class DataverseServiceBean implements java.io.Serializable {
@EJB
PermissionServiceBean permissionService;

@EJB
DataverseFieldTypeInputLevelServiceBean dataverseFieldTypeInputLevelService;

@EJB
SystemConfig systemConfig;

Expand Down Expand Up @@ -919,5 +929,228 @@ public List<Object[]> getDatasetTitlesWithinDataverse(Long dataverseId) {
return em.createNativeQuery(cqString).getResultList();
}


public String getCollectionDatasetSchema(String dataverseAlias) {

List<MetadataBlock> selectedBlocks = new ArrayList<>();
List<DatasetFieldType> requiredDSFT = new ArrayList<>();

Dataverse testDV = this.findByAlias(dataverseAlias);

while (!testDV.isMetadataBlockRoot()) {
if (testDV.getOwner() == null) {
break; // we are at the root; which by defintion is metadata blcok root, regarldess of the value
}
testDV = testDV.getOwner();
}

selectedBlocks.addAll(testDV.getMetadataBlocks());

for (MetadataBlock mdb : selectedBlocks) {
for (DatasetFieldType dsft : mdb.getDatasetFieldTypes()) {
if (!dsft.isChild()) {
DataverseFieldTypeInputLevel dsfIl = dataverseFieldTypeInputLevelService.findByDataverseIdDatasetFieldTypeId(testDV.getId(), dsft.getId());
if (dsfIl != null) {
dsft.setRequiredDV(dsfIl.isRequired());
dsft.setInclude(dsfIl.isInclude());
} else {
dsft.setRequiredDV(dsft.isRequired());
dsft.setInclude(true);
}
if (dsft.isHasChildren()) {
for (DatasetFieldType child : dsft.getChildDatasetFieldTypes()) {
DataverseFieldTypeInputLevel dsfIlChild = dataverseFieldTypeInputLevelService.findByDataverseIdDatasetFieldTypeId(testDV.getId(), child.getId());
if (dsfIlChild != null) {
child.setRequiredDV(dsfIlChild.isRequired());
child.setInclude(dsfIlChild.isInclude());
} else {
// in the case of conditionally required (child = true, parent = false)
// we set this to false; i.e this is the default "don't override" value
child.setRequiredDV(child.isRequired() && dsft.isRequired());
child.setInclude(true);
}
}
}
if(dsft.isRequiredDV()){
requiredDSFT.add(dsft);
}
}
}

}

String reqMDBNames = "";
List<MetadataBlock> hasReqFields = new ArrayList<>();
String retval = datasetSchemaPreface;
for (MetadataBlock mdb : selectedBlocks) {
for (DatasetFieldType dsft : requiredDSFT) {
if (dsft.getMetadataBlock().equals(mdb)) {
hasReqFields.add(mdb);
if (!reqMDBNames.isEmpty()) reqMDBNames += ",";
reqMDBNames += "\"" + mdb.getName() + "\"";
break;
}
}
}

for (MetadataBlock mdb : hasReqFields) {
retval += getCustomMDBSchema(mdb, requiredDSFT);
}

retval += "\n }";

retval += endOfjson.replace("blockNames", reqMDBNames);

return retval;

}

private String getCustomMDBSchema (MetadataBlock mdb, List<DatasetFieldType> requiredDSFT){
String retval = "";
boolean mdbHasReqField = false;
int numReq = 0;
List<DatasetFieldType> requiredThisMDB = new ArrayList<>();

for (DatasetFieldType dsft : requiredDSFT ){

if(dsft.getMetadataBlock().equals(mdb)){
numReq++;
mdbHasReqField = true;
requiredThisMDB.add(dsft);
}
}
if (mdbHasReqField){
retval += startOfMDB.replace("blockName", mdb.getName());

retval += minItemsTemplate.replace("numMinItems", Integer.toString(requiredThisMDB.size()));
int count = 0;
for (DatasetFieldType dsft:requiredThisMDB ){
count++;
String reqValImp = reqValTemplate.replace("reqFieldTypeName", dsft.getName());
Comment on lines +1047 to +1053
Copy link
Member

@pdurbin pdurbin Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of wondering why we're starting with a string and doing replace() here and there rather than building up a JSON object, like we do with JsonPrinter.

if (count < requiredThisMDB.size()){
retval += reqValImp + "\n";
} else {
reqValImp = StringUtils.substring(reqValImp, 0, reqValImp.length() - 1);
retval += reqValImp+ "\n";
retval += endOfReqVal;
}
}

}

return retval;
}

public String isDatasetJsonValid(String dataverseAlias, String jsonInput) {
JSONObject rawSchema = new JSONObject(new JSONTokener(getCollectionDatasetSchema(dataverseAlias)));

try {
Schema schema = SchemaLoader.load(rawSchema);
schema.validate(new JSONObject(jsonInput)); // throws a ValidationException if this object is invalid
} catch (ValidationException vx) {
logger.info(BundleUtil.getStringFromBundle("dataverses.api.validate.json.failed") + " " + vx.getErrorMessage());
return BundleUtil.getStringFromBundle("dataverses.api.validate.json.failed") + " " + vx.getErrorMessage();
} catch (Exception ex) {
logger.info(BundleUtil.getStringFromBundle("dataverses.api.validate.json.exception") + ex.getLocalizedMessage());
return BundleUtil.getStringFromBundle("dataverses.api.validate.json.exception") + ex.getLocalizedMessage();
}

return BundleUtil.getStringFromBundle("dataverses.api.validate.json.succeeded");
}

private String datasetSchemaPreface =
"{\n" +
" \"$schema\": \"http://json-schema.org/draft-04/schema#\",\n" +
" \"$defs\": {\n" +
" \"field\": {\n" +
" \"type\": \"object\",\n" +
" \"required\": [\"typeClass\", \"multiple\", \"typeName\"],\n" +
Comment on lines +1110 to +1115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the fancy new text blocks here: https://docs.oracle.com/en/java/javase/21/text-blocks/index.html

" \"properties\": {\n" +
" \"value\": {\n" +
" \"anyOf\": [\n" +
" {\n" +
" \"type\": \"array\"\n" +
" },\n" +
" {\n" +
" \"type\": \"string\"\n" +
" },\n" +
" {\n" +
" \"$ref\": \"#/$defs/field\"\n" +
" }\n" +
" ]\n" +
" },\n" +
" \"typeClass\": {\n" +
" \"type\": \"string\"\n" +
" },\n" +
" \"multiple\": {\n" +
" \"type\": \"boolean\"\n" +
" },\n" +
" \"typeName\": {\n" +
" \"type\": \"string\"\n" +
" }\n" +
" }\n" +
" }\n" +
"},\n" +
"\"type\": \"object\",\n" +
"\"properties\": {\n" +
" \"datasetVersion\": {\n" +
" \"type\": \"object\",\n" +
" \"properties\": {\n" +
" \"license\": {\n" +
" \"type\": \"object\",\n" +
" \"properties\": {\n" +
" \"name\": {\n" +
" \"type\": \"string\"\n" +
" },\n" +
" \"uri\": {\n" +
" \"type\": \"string\",\n" +
" \"format\": \"uri\"\n" +
" }\n" +
" },\n" +
" \"required\": [\"name\", \"uri\"]\n" +
" },\n" +
" \"metadataBlocks\": {\n" +
" \"type\": \"object\",\n" +
" \"properties\": {\n" +
"" ;

private String startOfMDB = "" +
" \"blockName\": {\n" +
" \"type\": \"object\",\n" +
" \"properties\": {\n" +
" \"fields\": {\n" +
" \"type\": \"array\",\n" +
" \"items\": {\n" +
" \"$ref\": \"#/$defs/field\"\n" +
" },";

private String reqValTemplate = " {\n" +
" \"contains\": {\n" +
" \"properties\": {\n" +
" \"typeName\": {\n" +
" \"const\": \"reqFieldTypeName\"\n" +
" }\n" +
" }\n" +
" }\n" +
" },";

private String minItemsTemplate = "\n \"minItems\": numMinItems,\n" +
" \"allOf\": [\n";
private String endOfReqVal = " ]\n" +
" }\n" +
" },\n" +
" \"required\": [\"fields\"]\n" +
" },";

private String endOfjson = ",\n" +
" \"required\": [blockNames]\n" +
" }\n" +
" },\n" +
" \"required\": [\"metadataBlocks\"]\n" +
" }\n" +
" },\n" +
" \"required\": [\"datasetVersion\"]\n" +
"}\n";


}
36 changes: 35 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
import edu.harvard.iq.dataverse.engine.command.impl.DeleteDataverseCommand;
import edu.harvard.iq.dataverse.engine.command.impl.DeleteDataverseLinkingDataverseCommand;
import edu.harvard.iq.dataverse.engine.command.impl.DeleteExplicitGroupCommand;
import edu.harvard.iq.dataverse.engine.command.impl.GetDatasetSchemaCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateMetadataBlockFacetRootCommand;
import edu.harvard.iq.dataverse.engine.command.impl.GetDataverseCommand;
import edu.harvard.iq.dataverse.engine.command.impl.GetDataverseStorageSizeCommand;
Expand All @@ -68,6 +69,7 @@
import edu.harvard.iq.dataverse.engine.command.impl.UpdateDataverseMetadataBlocksCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateExplicitGroupCommand;
import edu.harvard.iq.dataverse.engine.command.impl.UpdateMetadataBlockFacetsCommand;
import edu.harvard.iq.dataverse.engine.command.impl.ValidateDatasetJsonCommand;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
Expand Down Expand Up @@ -126,7 +128,6 @@
import java.util.Optional;
import java.util.stream.Collectors;
import jakarta.servlet.http.HttpServletResponse;
import jakarta.validation.constraints.NotNull;
import jakarta.ws.rs.WebApplicationException;
import jakarta.ws.rs.core.Context;
import jakarta.ws.rs.core.StreamingOutput;
Expand Down Expand Up @@ -232,6 +233,39 @@ public Response addDataverse(@Context ContainerRequestContext crc, String body,

}
}

@POST
@AuthRequired
@Path("{identifier}/validateDatasetJson")
@Consumes("application/json")
public Response validateDatasetJson(@Context ContainerRequestContext crc, String body, @PathParam("identifier") String idtf) {
User u = getRequestUser(crc);
try {
String validationMessage = execCommand(new ValidateDatasetJsonCommand(createDataverseRequest(u), findDataverseOrDie(idtf), body));
return ok(validationMessage);
} catch (WrappedResponse ex) {
Logger.getLogger(Dataverses.class.getName()).log(Level.SEVERE, null, ex);
return ex.getResponse();
}
}

@GET
@AuthRequired
@Path("{identifier}/datasetSchema")
@Produces(MediaType.APPLICATION_JSON)
public Response getDatasetSchema(@Context ContainerRequestContext crc, @PathParam("identifier") String idtf) {
User u = getRequestUser(crc);

try {
String datasetSchema = execCommand(new GetDatasetSchemaCommand(createDataverseRequest(u), findDataverseOrDie(idtf)));
return ok(datasetSchema);
Copy link
Member

@pdurbin pdurbin Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sekmiller and were talking abouthow we probably want to return just the JSON (instead of escaped JSON in our normal "ok... data" data structure).

Jim did this recently in this commit: 9953 - don't wrap linkset in a data element 3a4d8f9

Otherwise, it looks like this:

{
"status": "OK",
"data": {
"message": "{\n "$schema": "http://json-schema.org/draft-04/schema#\",\n "$defs": {\n "field": {\n "type": "object",\n "required": ["typeClass", "multiple", "typeName"],\n "properties": {\n...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sekmiller and I decided to return just the JSON Schema. See 2d3f7ab. This way instead of the \n characters and data...message, you get just what you want:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "$defs": {
        "field": {
            "type": "object",
            "required": [
                "typeClass",
...

} catch (WrappedResponse ex) {
Logger.getLogger(Dataverses.class.getName()).log(Level.SEVERE, null, ex);
return ex.getResponse();
}
}



@POST
@AuthRequired
Expand Down