Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCTAP JSON and YAML outputs #100

Open
kcoyle opened this issue Feb 1, 2024 · 9 comments
Open

DCTAP JSON and YAML outputs #100

kcoyle opened this issue Feb 1, 2024 · 9 comments

Comments

@kcoyle
Copy link
Collaborator

kcoyle commented Feb 1, 2024

At the February 1 2024 meeting we agreed that we would look at the JSON (and YAML) outputs and make sure that they have a structure we can agree on. A follow-up will be to create a JSON schema (does YAML have schemas?) that can facilitate consistent interpretations of DCTAP into common serializations.

@philbarker
Copy link
Collaborator

JSON Schema can be used to validate YAML documents, https://json-schema-everywhere.github.io/yaml

@kcoyle
Copy link
Collaborator Author

kcoyle commented Feb 2, 2024

As a reminder, we have defined the DCTAP elements as:

a profile
   has zero or more shapes      #zero is being interpreted as a default, unnamed profile
   a shape is identified by its `shapeID`
   a shape
       has one or more statement_templates
       a statement_template is identified by its `propertyID`

shape and statement_template are included in the vocabulary documentation, but are not table column headers.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Feb 5, 2024

I played around with both the JSON and YAML using the idea of adding explicit shape and statement_template for each, and failed. Not surprising since I don't know either JSON or YAML (spent lots of time with tutorials). I did convince myself of the logic of the current output format in those serializations. I wait to hear from those of you who actually know what you're talking about.

I'll look at JSON schema - but may fail there as well.

@philbarker
Copy link
Collaborator

philbarker commented Feb 7, 2024

If we keep the hierarchical representation then we would maybe need to do similar to what Tom did for DC-TAP-Python and have a default shape for when none was specified in the TAP.

Alternatively, if we take the statement_templates as foundational and kind of work upwards, or have a graph model, then we could use the shapeID column as a property of the statement_template to say what shape, if any, the template was included in.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Feb 7, 2024

@philbarker I've been working from the dctap output so the default shape is there. I'm not sure I'm reading your comment right, so perhaps we need to chat?

@philbarker
Copy link
Collaborator

@kcoyle that's fine, but it is a change from what we say about the TAP, which as you posted above can have "zero or more shapes" I don't think the YAML/JSON can have zero shapes, but it is in accordance with the comment.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Feb 7, 2024

@philbarker ah, yes. There is a difference from the table when it's output. I think the default shape came about because it worked that way in Tom's program. I removed the default shape from one JSON file and it validates.

{
    "statement_templates": [
        {
            "propertyID": "dct:title",
            "propertyLabel": "Book title",
            "mandatory": "true",
            "repeatable": "false",
            "valueDataType": "xsd:string"
        },
        {
            "propertyID": "dct:description",
            "propertyLabel": "Book description",
            "mandatory": "false",
            "repeatable": "true",
            "valueDataType": "xsd:string"
        },
        {
            "propertyID": "dct:date",
            "propertyLabel": "Publication date",
            "valueDataType": "xsd:date"
        },
        {
            "propertyID": "dct:extent",
            "propertyLabel": "Pages",
            "valueDataType": "xsd:decimal"
        },
        {
            "propertyID": "sdo:isbn",
            "propertyLabel": "ISBN",
            "valueDataType": "xsd:string"
        },
        {
            "propertyID": "dct:creator",
            "propertyLabel": "Author",
            "mandatory": "true",
            "repeatable": "true",
            "valueShape": "authorShape"
        },
        {
            "propertyID": "dct:publisher",
            "propertyLabel": "Publisher",
            "mandatory": "true",
            "repeatable": "false",
            "valueShape": "publisherShape"
        }
    ],
    "namespaces": {
        "xsd:": "http://www.w3.org/2001/XMLSchema#",
        "dct:": "http://purl.org/dc/terms/"
    }
}

I can even remove the statement_templates and get valid JSON (which may or may not be useful):

[
    {
        "propertyID": "dct:title",
        "propertyLabel": "Book title",
        "mandatory": "true",
        "repeatable": "false",
        "valueDataType": "xsd:string"
    },
    {
        "propertyID": "dct:description",
        "propertyLabel": "Book description",
        "mandatory": "false",
        "repeatable": "true",
        "valueDataType": "xsd:string"
    },
    {
        "propertyID": "dct:date",
        "propertyLabel": "Publication date",
        "valueDataType": "xsd:date"
    },
    {
        "propertyID": "dct:extent",
        "propertyLabel": "Pages",
        "valueDataType": "xsd:decimal"
    },
    {
        "propertyID": "sdo:isbn",
        "propertyLabel": "ISBN",
        "valueDataType": "xsd:string"
    },
    {
        "propertyID": "dct:creator",
        "propertyLabel": "Author",
        "mandatory": "true",
        "repeatable": "true",
        "valueShape": "authorShape"
    },
    {
        "propertyID": "dct:publisher",
        "propertyLabel": "Publisher",
        "mandatory": "true",
        "repeatable": "false",
        "valueShape": "publisherShape"
    }
]

All of this confirms to me that we need to think some more about output and what will be most useful for folks down the line.

@philbarker
Copy link
Collaborator

@kcoyle that is indeed valid JSON, but I think you would need different JSON-Schema for the different options (this is the limit of my JSON-Schema knowledge so I may be wrong) so yes, we do need to think some more about what will be more useful.

@kcoyle
Copy link
Collaborator Author

kcoyle commented Feb 7, 2024

@philbarker I'm playing around with JSON schema (there are some sites that translate from JSON to schema as a way to get started). I can make the shape optional, but I'm still trying out various options. I'll be back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants