Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata standard (Schema) #6

Closed
nsinai opened this issue Apr 8, 2015 · 14 comments
Closed

Metadata standard (Schema) #6

nsinai opened this issue Apr 8, 2015 · 14 comments

Comments

@nsinai
Copy link
Contributor

nsinai commented Apr 8, 2015

This thread is for a discussion about metadata standards. Would suggest starting with global standard like DCAT and reducing to a lightweight few that are required (e.g. title, description, keywords, point of contact name, point of contact email, URL, license).

Here is the current Data.gov schema: https://project-open-data.cio.gov/v1.1/schema/

@dweinberger
Copy link

If we're going for simple (+1), might we consider Schema.org? https://schema.org/Dataset

@rebeccawilliams
Copy link

For reference @dweinberger, here is how schema.org maps to DCAT: https://project-open-data.cio.gov/v1.1/metadata-resources/#field-mappings

@nsinai
Copy link
Contributor Author

nsinai commented May 4, 2015

Thanks @rebeccawilliams !

@hathix hathix mentioned this issue May 6, 2015
@nsinai
Copy link
Contributor Author

nsinai commented May 6, 2015

Proposed MVP of Schema:
Title
Description
Tags
Last Update
Publisher
Contact Name
Contact Email
Access URL
Download URL
License

@dweinberger
Copy link

This seems like a reasonable list, but are we already in agreement that we want to adopt an existing schema rather than create our own, even if our own is just that simple?

@hathix
Copy link
Contributor

hathix commented May 6, 2015

The schema Nick is proposing is a pared-down version of the Data.gov schema, so you could say we're creating our own schema based off existing schema. I feel it's a good idea to keep the schema this way until we need domain-specific extensions.

@dweinberger
Copy link

I personally think it's important to state from the gitgo that we are not planning on creating our own schema, unless we have a compelling reason to do so. One way to flag that would be to say that we are in fact using data.gov's schema, even though we're not using all of its terms/fields/vocabulary.

If down the road it turns out that data.gov doesn't have all the fields we need, we can find one that does or extend data.gov.

But I do like saying as part of the MVP launch that we have adopted an existing standard. Sends the right signal, doesn't it?

@hathix
Copy link
Contributor

hathix commented May 7, 2015

So we'd say we're using a subset of the existing Data.gov standard, which we might extend later as needed?

@dweinberger
Copy link

Yes, although I think it'd be slightly better just to say that we're using the data.gov standard. That doesn't imply that we're using every available field.

And I'd leave out the "which we might extend later" part because that's always assumed. And the point of this is to signal a reluctance to create new standards and enthusiasm about using existing standards.

So, I'd say something like, "We are using the Data.gov schema to describe the data sets DROID is referencing." Perfectly true. Not misleading in the least. Clear. Excellent signal.

@bsapozhnikov
Copy link
Contributor

Hi I've just pushed a proposed schema based on the Data.gov schema to the master branch - feel free to take a look and would love any feedback you have!

@philipashlock
Copy link

It's probably best not to call the schema used by Data.gov, the "Data.gov standard" since it's really an international standard (called DCAT) used by a lot of other countries and data catalogs too. You can also read at the bottom of http://schema.org/Dataset that their schema is also based on DCAT. We have a particular JSON serialization of DCAT with a few additional fields which we typically refer to as the Project Open Data Metadata schema, but even that is not exclusive to Data.gov since it's used by local governments and incorporated directly into platforms like Socrata.

As for the the current schema.md in this repo, it looks like there are a number of things out of sync with DCAT, e.g. "tags" instead of "keyword" and "updated" instead of "modified" etc

Feel free to copy our schema.md file and make use of our JSON Schema files

@nsinai
Copy link
Contributor Author

nsinai commented Dec 2, 2015

Thanks @philipashlock!

@bsapozhnikov -- can you update?

@nsinai
Copy link
Contributor Author

nsinai commented Dec 2, 2015

@mcrosas asked in an email:

"Thanks, for sharing this. What's the intention of choosing this schema? For clarification, the Dataverse supports an extensive set of metadata fields (including the fields in this data.gov schema), which map to metadata standards such as Dublin Core Terms and DataCite Schema, needed to implement best practices in data sharing and publishing."

The short answer is that this currently a student project, with faculty support and mentoring. Similar to open data portals by governments, the idea is to catalog interesting data sets that anyone can find and use. The idea isn't to host any data, but simply be an accessible and useful catalog.

@mercecrosas
Copy link

@nsinai the problem with this approach is that data are not guaranteed to be accessible and reusable if the catalog doesn't point to trusted archival data repositories that provide long-term access to the data. Making data open and accessible does not only require a catalog with metadata to search and learn what the dataset is about, but also requires long-term access to a data in a reusable format (which is what a repository like Dataverse would provide if the actually datasets for this project were hosted and archived in the repository).

@hathix hathix closed this as completed Nov 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants