use check-naming conventions #15

mbjones · 2023-02-15T02:44:52Z

@jeanetteclark Peter and I noted that, as the number of checks grew in metadig, they were hard to find and understand. So we shifted to a check naming pattern of entity.subentity.checktype, where entity is the kind of resource being checked, subentity is the component of that entity being checked, and checktype was a short name for the kind of check being run. We tried to be consistent with our checktypes across entities (e.g., present and resolvable had the same meaning across different entity types). This allows checks on similar entities to sort together, and for one to find similar kinds of checks across different entities. For example, the following metadata check names are fairly sortable and interpretable:

metadata.identifier.present
metadata.identifier.resolvable
metadata.alternateIdentifier.present
metadata.alternateIdentifier.resolvable
resource.URLs.resolvable
resource.landingPage.present

resource refers to the thing being described by a metadata document (e.g., a Dataset), while metadata refers to the metadata about that resource. I'm not sure that distinction really holds up to scrutiny, open to discussion.

It would be good if the data checks followed a similar convention. Maybe something like:

data.table-text-delimited.well-formed
data.table-text-fixed.well-formed
data.binary-format.valid
data.text-encoding.valid
data.text-encoding.congruent

In this case, data is a stand-in for the individual data objects that might be contained within a Dataset. By valid I mean that the bytes we find on disk correspond to the corresponding specification. By congruent I mean that what we find on disk matches what was claimed in the metadata. Lot's of ways to structure this, but I think it would be good to pick a decent convention to get started (which will also make renaming checks easier if we decide to do so later). Thoughts welcome.

The text was updated successfully, but these errors were encountered:

jeanetteclark · 2023-02-15T19:55:55Z

Yeah, definitely on the right track here. So with this system of entity.subentity.checktype the checks we have now are:

data.table-text-delimited.well-formed (#2)
data.format.congruent (#9)

upcoming checks to write would be:

data.text-encoding.valid (#12)
data.text-encoding.congruent (#12)
data.attribute-names.congruent (#3)

Any concern about deviating from camel case in the subentity slot for kebab case? I'm guessing the camel case was to match the corresponding EML slots?

jeanetteclark · 2023-02-17T21:28:25Z

existing checks have been renamed

mbjones added the enhancement New feature or request label Feb 15, 2023

mbjones assigned jeanetteclark Feb 15, 2023

mbjones mentioned this issue Feb 15, 2023

Check: Text file format valid #2

Closed

jeanetteclark closed this as completed Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use check-naming conventions #15

use check-naming conventions #15

mbjones commented Feb 15, 2023

jeanetteclark commented Feb 15, 2023

jeanetteclark commented Feb 17, 2023

use check-naming conventions #15

use check-naming conventions #15

Comments

mbjones commented Feb 15, 2023

jeanetteclark commented Feb 15, 2023

jeanetteclark commented Feb 17, 2023