-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schemas for ckanext-scheming for DCAT-AP 2.1 #56
Comments
I am using I would like to have the metadata of these data types automatically represented in DCAT via this extension. I'm pretty sure that is the scope described by this issue, but just clarifying. Currently, when I append .jsonld or .rdf to the dataset it displays only the normal metadata and ignores the other fields. Do those field need to have a namespace defined for them somehow? Is there a recommended way to achieve this functionality? |
I think this is what I've been looking for (and hoping someone else would have done). When I first saw that this extension was compliant with dcat-ap, I assumed that it meant that it would automagically create the relevant fields (with the Does one already exist? If not, it looks like I'll need to create one - if it's helpful to add to a repo of common schemas (that may or may not already exist) I'd be happy to add it when I've done it |
@wood-chris there is a scheming mapping for DCAT-AP Switzerland (which is a slightly adapted version of DCAT-AP EU) here: https://github.com/opendata-swiss/ckanext-switzerland/blob/master/ckanext/switzerland/dcat-ap-switzerland_scheming.json |
Dumping some thoughts here on scheming support. At the time the processors (parsers/serializers) that map between CKAN and DCAT were written, usage of ckanext-scheming was still starting to become widespread, so custom DCAT fields that didn't link directly to standard CKAN fields were stored as extras (see all the ones marked {
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
// ....
"extras": [
{"key": "version_notes", "value": "Some version notes"}
// ....
]
} The pattern nowadays is to create custom fields in a scheming schema, that internally handles the conversion to / from extras: {
"name": "test_dataset_dcat",
"title": "Test dataset DCAT",
"version_notes": "Some version notes"
// ....
} That's the goal of creating a DCAT scheming schema, that all properties are custom fields of the CKAN Dataset, aligning with the dcat:Dataset ones. The difficulty here is how to offer support for existing sites using previous versions of the extension.
The second case is relevant to CKAN sites that import DCAT RDF representations from other systems (I'd imagine that in most cases through the DCAT harvester) and create CKAN datasets from them. My current thinking on how to approach this is:
[1] At first I thought about being clever and inspecting the dataset schema (if scheming was being used) to see if there were DCAT fields defined, and not store the values in extras if so but that seemed brittle, and sites could have different schemas in used, potentially for different DCAT versions even. I think it's better to be explicit and make site maintainers Sorry if this is a bit convoluted, here's a TLDR: In the next major version of ckanext-dcat I want to change the RDF DCAT Parsers so they store custom DCAT fields as first level CKAN dataset fields instead of dataset extras, but keep the old behviour via config option for backwards compatibility |
Chatting with @wardi about this, actually using
|
This allows to check if a field should be stored as a custom field or an extra
PR with summary of work done so far here: #281 |
New repeating subfield, supporting all properties for the location class: uri, text, geom, bbox and centroid. Used spatial_coverage as name to not interfere with the `spatial` field expected by ckanext-scheming, in a future commit we will extract the relevant value to index it as a geometry.
The previous field names based on indexes didn't allow to retrieve results easily. We are now flattening all values for the same subfield to at least get a text hit. See #281 (comment)
If the `spatial_coverage` field is present, store the first geometry found so ckanext-spatial can pick it up for spatial indexing. Added indexing tests
At least the ones supported by the current processors. TODO: * spatial_resolution in meters: needs a new multiple_text_decimal validator * hvd_category: will be done as part of the wider HVD work
This required a new scheming_multiple_number validator, adapted from scheming_multiple_text
Support at the validator level for year, year-month, date and datetime values, which are correctly typed in the RDF serialization. At the UI level a date input is used by default as it was difficult to provide one that supported all inputs.
Mostly taken from the DCAT-AP 2.1 spec doc, adapted for CKAN
As this is a `text` field that allows free text search
Scheming adds a dict with empty keys when empty repeating subfields are submitted from the form. Check that there's an actual value before creating the triples when serializing
We could provide one or two predefined schemas covering all DCAT and DCAT-AP fields ready to be used or customized by instance maintainers.
This will also help in handling multilingual metadata (#55)
We could have:
What we need:
The text was updated successfully, but these errors were encountered: