Add schemas for ckanext-scheming for DCAT-AP 2.1 #56

amercader · 2016-01-19T11:37:35Z

We could provide one or two predefined schemas covering all DCAT and DCAT-AP fields ready to be used or customized by instance maintainers.

This will also help in handling multilingual metadata (#55)

We could have:

Basic schema with most common fields (eg mandatory and recommended properties in DCAT-AP)
Full schema with all fields (that can later be slimmed down)
Presets file with validators if needed

What we need:

JSON schema files in the relevant location
API and functional tests
Docs

ted-strauss-K1 · 2018-02-07T22:31:42Z

I am using ckanext-scheming to define schemas for special dataset types.
Link to example schema: https://github.com/ckan/ckanext-scheming/blob/master/ckanext/scheming/camel_photos.json

I would like to have the metadata of these data types automatically represented in DCAT via this extension. I'm pretty sure that is the scope described by this issue, but just clarifying.

Currently, when I append .jsonld or .rdf to the dataset it displays only the normal metadata and ignores the other fields. Do those field need to have a namespace defined for them somehow?

Is there a recommended way to achieve this functionality?

wood-chris · 2019-11-07T15:51:15Z

I think this is what I've been looking for (and hoping someone else would have done). When I first saw that this extension was compliant with dcat-ap, I assumed that it meant that it would automagically create the relevant fields (with the required label for properties that are mandatory in the AP) - but it seems like I was a bit optimistic!

Does one already exist? If not, it looks like I'll need to create one - if it's helpful to add to a repo of common schemas (that may or may not already exist) I'd be happy to add it when I've done it

metaodi · 2019-11-07T16:51:47Z

@wood-chris there is a scheming mapping for DCAT-AP Switzerland (which is a slightly adapted version of DCAT-AP EU) here: https://github.com/opendata-swiss/ckanext-switzerland/blob/master/ckanext/switzerland/dcat-ap-switzerland_scheming.json

amercader · 2024-04-29T13:18:59Z

Dumping some thoughts here on scheming support.

At the time the processors (parsers/serializers) that map between CKAN and DCAT were written, usage of ckanext-scheming was still starting to become widespread, so custom DCAT fields that didn't link directly to standard CKAN fields were stored as extras (see all the ones marked extra: here). So the DCAT version_notes field would be stored as:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    // ....
    "extras": [
         {"key": "version_notes", "value": "Some version notes"}
         // ....
    ]
}

The pattern nowadays is to create custom fields in a scheming schema, that internally handles the conversion to / from extras:

{
    "name": "test_dataset_dcat",
    "title": "Test dataset DCAT",
    "version_notes": "Some version notes"
    // ....
}

That's the goal of creating a DCAT scheming schema, that all properties are custom fields of the CKAN Dataset, aligning with the dcat:Dataset ones.

The difficulty here is how to offer support for existing sites using previous versions of the extension.

For the CKAN -> DCAT (Serialization) direction it should be fine. The serializers support both forms, and for a given field (like version_notes) will check first for a root level field in the dataset_dict and if it's not there, look for an extra with that key (here)
For the DCAT -> CKAN (Parsing) direction, we will find issues. As I said, the current parsers will store the custom fields in extras, which will make the resulting dataset dict incompatible with a scheming schema that defines the field as a dataset field. When creating or updating, we will get the following error:

  File "/home/adria/dev/pyenvs/ckan-py3.9/ckan/ckan/logic/action/create.py", line 185, in package_create
    raise ValidationError(errors)
ckan.logic.ValidationError: None - {'extras': [{'key': ['There is a schema field with the same name']}]}

The second case is relevant to CKAN sites that import DCAT RDF representations from other systems (I'd imagine that in most cases through the DCAT harvester) and create CKAN datasets from them.

My current thinking on how to approach this is:

Change the default parsers so they can store DCAT fields as root fields rather than extras, keeping the old behaviour via config option [1]
This will go into a new major ckanext-dcat version (2.0.0), with clear documentation:
- New sites using scheming will just need to use the included schema and everything should work
- New sites that don't want to use scheming (unlikely) will have to set the config option, otherwise the custom fields will be ignored
- Existing sites that have custom DCAT profiles and want to use scheming will have to store custom fields as root level fields (or drop any custom validators/logic they are currently using now to make dcat and scheming work together)
- Existing sites that they don't want to use scheming and keep the old behaviour, they just need to set the config option [1]

[1] At first I thought about being clever and inspecting the dataset schema (if scheming was being used) to see if there were DCAT fields defined, and not store the values in extras if so but that seemed brittle, and sites could have different schemas in used, potentially for different DCAT versions even. I think it's better to be explicit and make site maintainers

Sorry if this is a bit convoluted, here's a TLDR:

In the next major version of ckanext-dcat I want to change the RDF DCAT Parsers so they store custom DCAT fields as first level CKAN dataset fields instead of dataset extras, but keep the old behviour via config option for backwards compatibility

amercader · 2024-04-30T08:47:51Z

Chatting with @wardi about this, actually using scheming_dataset_schema_show to check if there is a schema that contains DCAT fields could be a really good approach, as besides detecting if values need to be stored at the root level, we could use it to mark DCAT fields with certain keys in the schema (like dcat_validators, etc). These are available to the output of scheming_dataset_schema_show, snippets, validators etc.
To the point of knowing what schema to use a good approach might be:

One explicitly provided when creating the Profile class: for instance for sites with harvesters that have more than one dataset schema that could be a harvester config option
If scheming is loaded, default to the dataset schema
If scheming is not loaded or there is no schema defined, fall back to store things in extras

This allows to check if a field should be stored as a custom field or an extra

amercader · 2024-05-22T10:21:34Z

PR with summary of work done so far here: #281

New repeating subfield, supporting all properties for the location class: uri, text, geom, bbox and centroid. Used spatial_coverage as name to not interfere with the `spatial` field expected by ckanext-scheming, in a future commit we will extract the relevant value to index it as a geometry.

The previous field names based on indexes didn't allow to retrieve results easily. We are now flattening all values for the same subfield to at least get a text hit. See #281 (comment)

If the `spatial_coverage` field is present, store the first geometry found so ckanext-spatial can pick it up for spatial indexing. Added indexing tests

At least the ones supported by the current processors. TODO: * spatial_resolution in meters: needs a new multiple_text_decimal validator * hvd_category: will be done as part of the wider HVD work

This required a new scheming_multiple_number validator, adapted from scheming_multiple_text

Support at the validator level for year, year-month, date and datetime values, which are correctly typed in the RDF serialization. At the UI level a date input is used by default as it was difficult to provide one that supported all inputs.

Mostly taken from the DCAT-AP 2.1 spec doc, adapted for CKAN

As this is a `text` field that allows free text search

Scheming adds a dict with empty keys when empty repeating subfields are submitted from the form. Check that there's an actual value before creating the triples when serializing

amercader · 2024-07-15T10:46:33Z

Done in #281 + #288

amercader mentioned this issue Jan 19, 2016

Support for multilingual RDF #55

Open

amercader mentioned this issue Feb 5, 2016

Support for DCAT-AP 1.1 #53

Open

ted-strauss-K1 mentioned this issue Feb 8, 2018

Custom fields can't be created from API ckan/ckanext-scheming#158

Closed

amercader mentioned this issue Apr 22, 2024

[META] DCAT v3 support #271

Open

amercader changed the title ~~Add schemas for ckanext-scheming~~ Add schemas for ckanext-scheming for DCAT-AP 2.1 Apr 23, 2024

amercader added a commit that referenced this issue May 8, 2024

[#56] Allow to provide a dataset schema to profiles

65abb1f

This allows to check if a field should be stored as a custom field or an extra

amercader added a commit that referenced this issue May 8, 2024

[#56] Handle list values

9faf5f5

amercader added a commit that referenced this issue May 8, 2024

[#56] Handle repeating subfields

a808f72

amercader added a commit that referenced this issue May 8, 2024

[#56] Add draft schema

d0b219e

amercader added a commit that referenced this issue May 8, 2024

[#56] Add some examples

7ee354a

amercader added a commit that referenced this issue May 10, 2024

[#56] Fix repeating subfields index logic

9b847e9

amercader added a commit that referenced this issue May 10, 2024

[#56] [#56] Initial e2e scheming support test

e6583aa

amercader added a commit that referenced this issue May 20, 2024

[#56] Serialize repeating subfields

d86f467

amercader added a commit that referenced this issue May 20, 2024

[#56] Add sample of resource fields

000baa4

amercader added a commit that referenced this issue May 20, 2024

[#56] push

770628e

amercader added a commit that referenced this issue May 20, 2024

[#56] [#56] Serialize repeating subfields

2d8d969

amercader added a commit that referenced this issue May 20, 2024

[#56] [#56] Add sample of resource fields

c5865fb

amercader added a commit that referenced this issue May 20, 2024

[#56] Use profiles from config in CLI

a77d5c2

amercader added a commit that referenced this issue May 20, 2024

[#56] Separate scheming compat profile, parsing

35657ef

amercader added a commit that referenced this issue May 21, 2024

[#56] e2e test DCAT -> CKAN

e0f15f5

amercader added a commit that referenced this issue May 21, 2024

[#56] Scheming compatibility profile, serialization

0b6a8dd

amercader added a commit that referenced this issue May 21, 2024

[#56] Install scheming in github actions

20ac269

amercader added a commit that referenced this issue May 21, 2024

[#56] Add CKAN<2.10 before index hook variant

5375232

amercader added a commit that referenced this issue May 22, 2024

[#56] Use profiles from config in CLI

c27a456

amercader added a commit that referenced this issue May 30, 2024

[#56] Add missing var

a862d77

amercader added a commit that referenced this issue May 30, 2024

[#56] Store geometry in spatial field for indexing

4256e73

If the `spatial_coverage` field is present, store the first geometry found so ckanext-spatial can pick it up for spatial indexing. Added indexing tests

amercader added a commit that referenced this issue Jun 4, 2024

[#56] Add spatial_resolution_in_meters

c6fc970

This required a new scheming_multiple_number validator, adapted from scheming_multiple_text

amercader added a commit that referenced this issue Jun 4, 2024

[#56] Review validators for resource fields

99b4c89

amercader added a commit that referenced this issue Jun 4, 2024

[#56] Fix spatial_resolution validators

1790404

amercader added a commit that referenced this issue Jun 6, 2024

[#56] Don't mess with field keys if using scheming

73523d6

amercader added a commit that referenced this issue Jun 6, 2024

[#56] Display snippets for file size, markdown

d456c00

amercader added a commit that referenced this issue Jun 6, 2024

[#56] Fix dates tests

209fda5

amercader added a commit that referenced this issue Jun 6, 2024

[#56] Fix number form snippet

634ff52

amercader added a commit that referenced this issue Jun 6, 2024

[#56] Help texts for all fields in the schema

8b78139

Mostly taken from the DCAT-AP 2.1 spec doc, adapted for CKAN

amercader added a commit that referenced this issue Jun 6, 2024

[#56] Use choices for resource status

15b0cc1

amercader added a commit that referenced this issue Jun 10, 2024

[#56] Create a full and a slimmed down schema version

602d505

amercader added a commit that referenced this issue Jun 10, 2024

[#56] Update README

614e23b

amercader added a commit that referenced this issue Jun 11, 2024

[#56] README tweaks

c11f3c2

amercader added a commit that referenced this issue Jun 11, 2024

[#56] Docstrings

030cd3d

amercader added a commit that referenced this issue Jun 11, 2024

[#56] Fix function call

5fffa15

amercader added a commit that referenced this issue Jun 13, 2024

[#56] Index subfields as extras_ Solr field

b600493

As this is a `text` field that allows free text search

amercader added a commit that referenced this issue Jun 13, 2024

[#56] Clean the index before tests

f88e433

amercader added a commit that referenced this issue Jul 2, 2024

[#56] Avoid empty list in spatial resolution

898912c

amercader added a commit that referenced this issue Jul 2, 2024

[#56] Markdown for provenance

97e68de

amercader added a commit that referenced this issue Jul 2, 2024

[#56] More robust date parsing with dateutil, expand tests

c7b8c02

amercader added a commit that referenced this issue Jul 3, 2024

[#56] Add tests for invalid and ambiguous dates

39b4d91

amercader added a commit that referenced this issue Jul 5, 2024

[#56] Update changelog with scheming changes

ae78f0f

amercader closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schemas for ckanext-scheming for DCAT-AP 2.1 #56

Add schemas for ckanext-scheming for DCAT-AP 2.1 #56

amercader commented Jan 19, 2016

ted-strauss-K1 commented Feb 7, 2018 •

edited

Loading

wood-chris commented Nov 7, 2019

metaodi commented Nov 7, 2019

amercader commented Apr 29, 2024 •

edited

Loading

amercader commented Apr 30, 2024

amercader commented May 22, 2024

amercader commented Jul 15, 2024

Add schemas for ckanext-scheming for DCAT-AP 2.1 #56

Add schemas for ckanext-scheming for DCAT-AP 2.1 #56

Comments

amercader commented Jan 19, 2016

ted-strauss-K1 commented Feb 7, 2018 • edited Loading

wood-chris commented Nov 7, 2019

metaodi commented Nov 7, 2019

amercader commented Apr 29, 2024 • edited Loading

amercader commented Apr 30, 2024

amercader commented May 22, 2024

amercader commented Jul 15, 2024

ted-strauss-K1 commented Feb 7, 2018 •

edited

Loading

amercader commented Apr 29, 2024 •

edited

Loading