Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheming support #281

Merged
merged 57 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
65abb1f
[#56] Allow to provide a dataset schema to profiles
amercader May 8, 2024
9faf5f5
[#56] Handle list values
amercader May 8, 2024
a808f72
[#56] Handle repeating subfields
amercader May 8, 2024
d0b219e
[#56] Add draft schema
amercader May 8, 2024
7ee354a
[#56] Add some examples
amercader May 8, 2024
9b847e9
[#56] Fix repeating subfields index logic
amercader May 9, 2024
e6583aa
[#56] [#56] Initial e2e scheming support test
amercader May 9, 2024
d86f467
[#56] Serialize repeating subfields
amercader May 14, 2024
000baa4
[#56] Add sample of resource fields
amercader May 15, 2024
2d8d969
[#56] [#56] Serialize repeating subfields
amercader May 14, 2024
c5865fb
[#56] [#56] Add sample of resource fields
amercader May 15, 2024
a77d5c2
[#56] Use profiles from config in CLI
amercader May 20, 2024
35657ef
[#56] Separate scheming compat profile, parsing
amercader May 20, 2024
62a7962
Merge branch '56-add-schema-file-dcat-ap-2.1' of github.com:ckan/ckan…
amercader May 20, 2024
e0f15f5
[#56] e2e test DCAT -> CKAN
amercader May 21, 2024
0b6a8dd
[#56] Scheming compatibility profile, serialization
amercader May 21, 2024
20ac269
[#56] Install scheming in github actions
amercader May 21, 2024
5375232
[#56] Add CKAN<2.10 before index hook variant
amercader May 21, 2024
9b0abce
[#56] dataset_schema -> dataset_type
amercader May 23, 2024
e1b5f32
[#56] Add most DCAT AP 1.1 standard and list fields
amercader May 27, 2024
2e4b4bc
[#56] Test fixes
amercader May 28, 2024
f9467d4
[#56] Consolidate and simplify publisher handling
amercader May 29, 2024
214d853
Merge branch 'master' into 56-add-schema-file-dcat-ap-2.1
amercader May 29, 2024
1bce834
Fix merge errors
amercader May 29, 2024
cd1d3f0
[#56] Add temporal extent
amercader May 30, 2024
103aa08
[#56] Add support for spatial_coverage
amercader May 30, 2024
a862d77
[#56] Add missing var
amercader May 30, 2024
aa23a70
[#56] Update repeating subfields indexing logic
amercader May 30, 2024
4256e73
[#56] Store geometry in spatial field for indexing
amercader May 30, 2024
afb74d1
[#56] Add rest of DCAT-AP 1 and 2.1 fields
amercader Jun 3, 2024
c6fc970
[#56] Add spatial_resolution_in_meters
amercader Jun 4, 2024
99b4c89
[#56] Review validators for resource fields
amercader Jun 4, 2024
4763d2b
Merge branch 'master' into 56-add-schema-file-dcat-ap-2.1
amercader Jun 4, 2024
1790404
[#56] Fix spatial_resolution validators
amercader Jun 4, 2024
73523d6
[#56] Don't mess with field keys if using scheming
amercader Jun 6, 2024
d456c00
[#56] Display snippets for file size, markdown
amercader Jun 6, 2024
b1e1718
[#56] Common preset for DCAT date-based fields
amercader Jun 6, 2024
209fda5
[#56] Fix dates tests
amercader Jun 6, 2024
634ff52
[#56] Fix number form snippet
amercader Jun 6, 2024
8b78139
[#56] Help texts for all fields in the schema
amercader Jun 6, 2024
15b0cc1
[#56] Use choices for resource status
amercader Jun 6, 2024
da8de09
Merge branch 'master' into 56-add-schema-file-dcat-ap-2.1
amercader Jun 10, 2024
602d505
[#56] Create a full and a slimmed down schema version
amercader Jun 10, 2024
614e23b
[#56] Update README
amercader Jun 10, 2024
c11f3c2
[#56] README tweaks
amercader Jun 11, 2024
030cd3d
[#56] Docstrings
amercader Jun 11, 2024
5fffa15
[#56] Fix function call
amercader Jun 11, 2024
ad35359
Schemas description
amercader Jun 12, 2024
b600493
[#56] Index subfields as extras_ Solr field
amercader Jun 13, 2024
f88e433
[#56] Clean the index before tests
amercader Jun 13, 2024
898912c
[#56] Avoid empty list in spatial resolution
amercader Jun 19, 2024
97e68de
[#56] Markdown for provenance
amercader Jun 19, 2024
a8a3f25
[#56] Don't serialize empty repeating subfields
amercader Jun 19, 2024
c7b8c02
[#56] More robust date parsing with dateutil, expand tests
amercader Jul 2, 2024
39b4d91
[#56] Add tests for invalid and ambiguous dates
amercader Jul 3, 2024
31a69f5
Merge branch 'master' into 56-add-schema-file-dcat-ap-2.1
amercader Jul 3, 2024
ae78f0f
[#56] Update changelog with scheming changes
amercader Jul 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,13 @@ jobs:
pip install -e .
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
- name: Setup harvest extension
- name: Setup other extension
run: |
git clone https://github.com/ckan/ckanext-harvest
pip install -e ckanext-harvest
pip install -r ckanext-harvest/pip-requirements.txt
pip install -r ckanext-harvest/requirements.txt
git clone https://github.com/ckan/ckanext-scheming
pip install -e ckanext-scheming
- name: Setup extension
run: |
ckan -c test.ini db init
Expand Down
264 changes: 221 additions & 43 deletions README.md

Large diffs are not rendered by default.

78 changes: 77 additions & 1 deletion ckanext/dcat/plugins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from builtins import object
import os
import json

from ckantoolkit import config

Expand All @@ -19,6 +20,7 @@
dcat_auth,
)
from ckanext.dcat import utils
from ckanext.dcat.validators import dcat_validators


CUSTOM_ENDPOINT_CONFIG = 'ckanext.dcat.catalog_endpoint'
Expand All @@ -28,6 +30,19 @@
I18N_DIR = os.path.join(HERE, u"../i18n")


def _get_dataset_schema(dataset_type="dataset"):
schema = None
try:
schema_show = p.toolkit.get_action("scheming_dataset_schema_show")
try:
schema = schema_show({}, {"type": dataset_type})
except p.toolkit.ObjectNotFound:
pass
except KeyError:
pass
return schema


class DCATPlugin(p.SingletonPlugin, DefaultTranslation):

p.implements(p.IConfigurer, inherit=True)
Expand All @@ -38,6 +53,7 @@ class DCATPlugin(p.SingletonPlugin, DefaultTranslation):
p.implements(p.ITranslation, inherit=True)
p.implements(p.IClick)
p.implements(p.IBlueprint)
p.implements(p.IValidators)

# IClick

Expand Down Expand Up @@ -101,17 +117,31 @@ def get_auth_functions(self):
'dcat_catalog_search': dcat_auth,
}

# IValidators
def get_validators(self):
return dcat_validators

# IPackageController

# CKAN < 2.10 hooks
def after_show(self, context, data_dict):
return self.after_dataset_show(context, data_dict)

def before_index(self, dataset_dict):
return self.before_dataset_index(dataset_dict)

# CKAN >= 2.10 hooks
def after_dataset_show(self, context, data_dict):

schema = _get_dataset_schema(data_dict["type"])
# check if config is enabled to translate keys (default: True)
if not p.toolkit.asbool(config.get(TRANSLATE_KEYS_CONFIG, True)):
# skip if scheming is enabled, as this will be handled there
translate_keys = (
p.toolkit.asbool(config.get(TRANSLATE_KEYS_CONFIG, True))
and not schema
)

if not translate_keys:
return data_dict

if context.get('for_view'):
Expand All @@ -132,6 +162,52 @@ def set_titles(object_dict):

return data_dict

def before_dataset_index(self, dataset_dict):
schema = _get_dataset_schema(dataset_dict["type"])
spatial = None
if schema:
for field in schema['dataset_fields']:
if field['field_name'] in dataset_dict and 'repeating_subfields' in field:
for item in dataset_dict[field['field_name']]:
for key in item:
value = item[key]
if not isinstance(value, dict):
# Index a flattened version
new_key = f'{field["field_name"]}__{key}'
if not dataset_dict.get(new_key):
dataset_dict[new_key] = value
else:
dataset_dict[new_key] += ' ' + value

subfields = dataset_dict.pop(field['field_name'], None)
if field['field_name'] == 'spatial_coverage':
spatial = subfields

# Store the first geometry found so ckanext-spatial can pick it up for indexing
def _check_for_a_geom(spatial_dict):
value = None

for field in ('geom', 'bbox', 'centroid'):
if spatial_dict.get(field):
value = spatial_dict[field]
if isinstance(value, dict):
try:
value = json.dumps(value)
break
except ValueError:
pass
return value

if spatial and not dataset_dict.get('spatial'):
for item in spatial:
value = _check_for_a_geom(item)
if value:
dataset_dict['spatial'] = value
dataset_dict['extras_spatial'] = value
break

return dataset_dict


class DCATJSONInterface(p.SingletonPlugin):
p.implements(p.IActions)
Expand Down
18 changes: 14 additions & 4 deletions ckanext/dcat/processors.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,15 @@

class RDFProcessor(object):

def __init__(self, profiles=None, compatibility_mode=False):
def __init__(self, profiles=None, dataset_type='dataset', compatibility_mode=False):
'''
Creates a parser or serializer instance

You can optionally pass a list of profiles to be used.

A scheming dataset type can be provided, in which case the scheming schema
will be loaded by the base profile so it can be used by other profiles.

In compatibility mode, some fields are modified to maintain
compatibility with previous versions of the ckanext-dcat parsers
(eg adding the `dcat_` prefix or storing comma separated lists instead
Expand All @@ -56,6 +59,8 @@ def __init__(self, profiles=None, compatibility_mode=False):
raise RDFProfileException(
'No suitable RDF profiles could be loaded')

self.dataset_type = dataset_type

if not compatibility_mode:
compatibility_mode = p.toolkit.asbool(
config.get(COMPAT_MODE_CONFIG_OPTION, False))
Expand Down Expand Up @@ -177,11 +182,16 @@ def datasets(self):
for dataset_ref in self._datasets():
dataset_dict = {}
for profile_class in self._profiles:
profile = profile_class(self.g, self.compatibility_mode)
profile = profile_class(
self.g,
dataset_type=self.dataset_type,
compatibility_mode=self.compatibility_mode
)
profile.parse_dataset(dataset_dict, dataset_ref)

yield dataset_dict


class RDFSerializer(RDFProcessor):
'''
A CKAN to RDF serializer based on rdflib
Expand Down Expand Up @@ -245,7 +255,7 @@ def graph_from_dataset(self, dataset_dict):
dataset_ref = URIRef(dataset_uri(dataset_dict))

for profile_class in self._profiles:
profile = profile_class(self.g, self.compatibility_mode)
profile = profile_class(self.g, compatibility_mode=self.compatibility_mode)
profile.graph_from_dataset(dataset_dict, dataset_ref)

return dataset_ref
Expand All @@ -263,7 +273,7 @@ def graph_from_catalog(self, catalog_dict=None):
catalog_ref = URIRef(catalog_uri())

for profile_class in self._profiles:
profile = profile_class(self.g, self.compatibility_mode)
profile = profile_class(self.g, compatibility_mode=self.compatibility_mode)
profile.graph_from_catalog(catalog_dict, catalog_ref)

return catalog_ref
Expand Down
1 change: 1 addition & 0 deletions ckanext/dcat/profiles/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@

from .euro_dcat_ap import EuropeanDCATAPProfile
from .euro_dcat_ap_2 import EuropeanDCATAP2Profile
from .euro_dcat_ap_scheming import EuropeanDCATAPSchemingProfile
from .schemaorg import SchemaOrgProfile
Loading
Loading