Metadata standard (Schema) #6

nsinai · 2015-04-08T01:59:32Z

This thread is for a discussion about metadata standards. Would suggest starting with global standard like DCAT and reducing to a lightweight few that are required (e.g. title, description, keywords, point of contact name, point of contact email, URL, license).

Here is the current Data.gov schema: https://project-open-data.cio.gov/v1.1/schema/

dweinberger · 2015-04-11T14:48:13Z

If we're going for simple (+1), might we consider Schema.org? https://schema.org/Dataset

rebeccawilliams · 2015-04-20T22:24:55Z

For reference @dweinberger, here is how schema.org maps to DCAT: https://project-open-data.cio.gov/v1.1/metadata-resources/#field-mappings

nsinai · 2015-05-04T18:48:31Z

Thanks @rebeccawilliams !

nsinai · 2015-05-06T14:47:59Z

Proposed MVP of Schema:
Title
Description
Tags
Last Update
Publisher
Contact Name
Contact Email
Access URL
Download URL
License

dweinberger · 2015-05-06T14:55:31Z

This seems like a reasonable list, but are we already in agreement that we want to adopt an existing schema rather than create our own, even if our own is just that simple?

hathix · 2015-05-06T15:23:24Z

The schema Nick is proposing is a pared-down version of the Data.gov schema, so you could say we're creating our own schema based off existing schema. I feel it's a good idea to keep the schema this way until we need domain-specific extensions.

dweinberger · 2015-05-06T15:43:33Z

I personally think it's important to state from the gitgo that we are not planning on creating our own schema, unless we have a compelling reason to do so. One way to flag that would be to say that we are in fact using data.gov's schema, even though we're not using all of its terms/fields/vocabulary.

If down the road it turns out that data.gov doesn't have all the fields we need, we can find one that does or extend data.gov.

But I do like saying as part of the MVP launch that we have adopted an existing standard. Sends the right signal, doesn't it?

hathix · 2015-05-07T02:41:17Z

So we'd say we're using a subset of the existing Data.gov standard, which we might extend later as needed?

dweinberger · 2015-05-07T03:09:07Z

Yes, although I think it'd be slightly better just to say that we're using the data.gov standard. That doesn't imply that we're using every available field.

And I'd leave out the "which we might extend later" part because that's always assumed. And the point of this is to signal a reluctance to create new standards and enthusiasm about using existing standards.

So, I'd say something like, "We are using the Data.gov schema to describe the data sets DROID is referencing." Perfectly true. Not misleading in the least. Clear. Excellent signal.

bsapozhnikov · 2015-11-22T04:42:23Z

Hi I've just pushed a proposed schema based on the Data.gov schema to the master branch - feel free to take a look and would love any feedback you have!

philipashlock · 2015-11-23T23:47:13Z

It's probably best not to call the schema used by Data.gov, the "Data.gov standard" since it's really an international standard (called DCAT) used by a lot of other countries and data catalogs too. You can also read at the bottom of http://schema.org/Dataset that their schema is also based on DCAT. We have a particular JSON serialization of DCAT with a few additional fields which we typically refer to as the Project Open Data Metadata schema, but even that is not exclusive to Data.gov since it's used by local governments and incorporated directly into platforms like Socrata.

As for the the current schema.md in this repo, it looks like there are a number of things out of sync with DCAT, e.g. "tags" instead of "keyword" and "updated" instead of "modified" etc

Feel free to copy our schema.md file and make use of our JSON Schema files

nsinai · 2015-12-02T23:24:26Z

Thanks @philipashlock!

@bsapozhnikov -- can you update?

nsinai · 2015-12-02T23:41:23Z

@mcrosas asked in an email:

"Thanks, for sharing this. What's the intention of choosing this schema? For clarification, the Dataverse supports an extensive set of metadata fields (including the fields in this data.gov schema), which map to metadata standards such as Dublin Core Terms and DataCite Schema, needed to implement best practices in data sharing and publishing."

The short answer is that this currently a student project, with faculty support and mentoring. Similar to open data portals by governments, the idea is to catalog interesting data sets that anyone can find and use. The idea isn't to host any data, but simply be an accessible and useful catalog.

mercecrosas · 2015-12-04T20:28:55Z

@nsinai the problem with this approach is that data are not guaranteed to be accessible and reusable if the catalog doesn't point to trusted archival data repositories that provide long-term access to the data. Making data open and accessible does not only require a catalog with metadata to search and learn what the dataset is about, but also requires long-term access to a data in a reusable format (which is what a repository like Dataverse would provide if the actually datasets for this project were hosted and archived in the repository).

hathix mentioned this issue May 6, 2015

Dataset metadata #15

Closed

hathix closed this as completed Nov 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata standard (Schema) #6

Metadata standard (Schema) #6

nsinai commented Apr 8, 2015

dweinberger commented Apr 11, 2015

rebeccawilliams commented Apr 20, 2015

nsinai commented May 4, 2015

nsinai commented May 6, 2015

dweinberger commented May 6, 2015

hathix commented May 6, 2015

dweinberger commented May 6, 2015

hathix commented May 7, 2015

dweinberger commented May 7, 2015

bsapozhnikov commented Nov 22, 2015

philipashlock commented Nov 23, 2015

nsinai commented Dec 2, 2015

nsinai commented Dec 2, 2015

mercecrosas commented Dec 4, 2015

Metadata standard (Schema) #6

Metadata standard (Schema) #6

Comments

nsinai commented Apr 8, 2015

dweinberger commented Apr 11, 2015

rebeccawilliams commented Apr 20, 2015

nsinai commented May 4, 2015

nsinai commented May 6, 2015

dweinberger commented May 6, 2015

hathix commented May 6, 2015

dweinberger commented May 6, 2015

hathix commented May 7, 2015

dweinberger commented May 7, 2015

bsapozhnikov commented Nov 22, 2015

philipashlock commented Nov 23, 2015

nsinai commented Dec 2, 2015

nsinai commented Dec 2, 2015

mercecrosas commented Dec 4, 2015