-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribution's format type in DCAT XML #17
Comments
Mapping the formats provided in the remote DCAT document to the ones that CKAN previews understand is out of the scope of this extension for now, and is up to extensions to define them. I've created #18 to add this feature. In this particular case the remote file is defining them "properly" as mime types, which is what CKAN core should eventually support (see ckan/ckan#1350 and ckan/ckan#1336). In the meantime, is up to particular extensions to implement the mapping. You can do this by extending the # ckanext.myext.harvesters.py
from ckanext.dcat.harvesters import DCATXMLHarvester
class MyDCATHarvester(DCATXMLHarvester):
def modify_package_dict(self, package_dict, dcat_dict, harvest_object):
# See https://github.com/ckan/ckan/blob/master/ckan/config/resource_formats.json
mapping = {
'application/xml': 'xml',
'application/gzip': 'gz',
# ...
}
for resource in package_dict.get('resources', []):
if resource.get('format') in mapping:
resource['format'] = mapping[resource['format']]
return package_dict To register your own DCAT harvester, add it as normal plugin to the entry_points=\
"""
[ckan.plugins]
# Add plugins here, eg
my_dcat_harvester=ckanext.myext.harvesters:MyDCATHarvester
""", And then add it to your ini file:
Hope this helps |
The RDF harvester ( |
Hello again... Now that its all working great, we see that there are a lot of Distribution formats like this: application/rdf+xml or application/rss+xml, application/gzip.
Then, when it harvest the datasets, the resource's formats are equals in ckan. So, when we try to previsualize or search the resources, the format does not look "good".
Could it be possible to make some kind of mapping for this formats? Or maybe ignore de "application/" and the "+xml" when inserts?
Thanks.
The text was updated successfully, but these errors were encountered: