-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata standard (Schema) #6
Comments
If we're going for simple (+1), might we consider Schema.org? https://schema.org/Dataset |
For reference @dweinberger, here is how schema.org maps to DCAT: https://project-open-data.cio.gov/v1.1/metadata-resources/#field-mappings |
Thanks @rebeccawilliams ! |
Proposed MVP of Schema: |
This seems like a reasonable list, but are we already in agreement that we want to adopt an existing schema rather than create our own, even if our own is just that simple? |
The schema Nick is proposing is a pared-down version of the Data.gov schema, so you could say we're creating our own schema based off existing schema. I feel it's a good idea to keep the schema this way until we need domain-specific extensions. |
I personally think it's important to state from the gitgo that we are not planning on creating our own schema, unless we have a compelling reason to do so. One way to flag that would be to say that we are in fact using data.gov's schema, even though we're not using all of its terms/fields/vocabulary. If down the road it turns out that data.gov doesn't have all the fields we need, we can find one that does or extend data.gov. But I do like saying as part of the MVP launch that we have adopted an existing standard. Sends the right signal, doesn't it? |
So we'd say we're using a subset of the existing Data.gov standard, which we might extend later as needed? |
Yes, although I think it'd be slightly better just to say that we're using the data.gov standard. That doesn't imply that we're using every available field. And I'd leave out the "which we might extend later" part because that's always assumed. And the point of this is to signal a reluctance to create new standards and enthusiasm about using existing standards. So, I'd say something like, "We are using the Data.gov schema to describe the data sets DROID is referencing." Perfectly true. Not misleading in the least. Clear. Excellent signal. |
Hi I've just pushed a proposed schema based on the Data.gov schema to the master branch - feel free to take a look and would love any feedback you have! |
It's probably best not to call the schema used by Data.gov, the "Data.gov standard" since it's really an international standard (called DCAT) used by a lot of other countries and data catalogs too. You can also read at the bottom of http://schema.org/Dataset that their schema is also based on DCAT. We have a particular JSON serialization of DCAT with a few additional fields which we typically refer to as the Project Open Data Metadata schema, but even that is not exclusive to Data.gov since it's used by local governments and incorporated directly into platforms like Socrata. As for the the current schema.md in this repo, it looks like there are a number of things out of sync with DCAT, e.g. "tags" instead of "keyword" and "updated" instead of "modified" etc Feel free to copy our schema.md file and make use of our JSON Schema files |
Thanks @philipashlock! @bsapozhnikov -- can you update? |
@mcrosas asked in an email: "Thanks, for sharing this. What's the intention of choosing this schema? For clarification, the Dataverse supports an extensive set of metadata fields (including the fields in this data.gov schema), which map to metadata standards such as Dublin Core Terms and DataCite Schema, needed to implement best practices in data sharing and publishing." The short answer is that this currently a student project, with faculty support and mentoring. Similar to open data portals by governments, the idea is to catalog interesting data sets that anyone can find and use. The idea isn't to host any data, but simply be an accessible and useful catalog. |
@nsinai the problem with this approach is that data are not guaranteed to be accessible and reusable if the catalog doesn't point to trusted archival data repositories that provide long-term access to the data. Making data open and accessible does not only require a catalog with metadata to search and learn what the dataset is about, but also requires long-term access to a data in a reusable format (which is what a repository like Dataverse would provide if the actually datasets for this project were hosted and archived in the repository). |
This thread is for a discussion about metadata standards. Would suggest starting with global standard like DCAT and reducing to a lightweight few that are required (e.g. title, description, keywords, point of contact name, point of contact email, URL, license).
Here is the current Data.gov schema: https://project-open-data.cio.gov/v1.1/schema/
The text was updated successfully, but these errors were encountered: