Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution's format type in DCAT XML #17

Closed
montxo5 opened this issue Jun 26, 2014 · 2 comments
Closed

Distribution's format type in DCAT XML #17

montxo5 opened this issue Jun 26, 2014 · 2 comments

Comments

@montxo5
Copy link

montxo5 commented Jun 26, 2014

Hello again... Now that its all working great, we see that there are a lot of Distribution formats like this: application/rdf+xml or application/rss+xml, application/gzip.

Then, when it harvest the datasets, the resource's formats are equals in ckan. So, when we try to previsualize or search the resources, the format does not look "good".

Could it be possible to make some kind of mapping for this formats? Or maybe ignore de "application/" and the "+xml" when inserts?

Thanks.

@amercader
Copy link
Member

Mapping the formats provided in the remote DCAT document to the ones that CKAN previews understand is out of the scope of this extension for now, and is up to extensions to define them. I've created #18 to add this feature.

In this particular case the remote file is defining them "properly" as mime types, which is what CKAN core should eventually support (see ckan/ckan#1350 and ckan/ckan#1336).

In the meantime, is up to particular extensions to implement the mapping. You can do this by extending the DCATXMLHarvester class and implementing a modify_package_dict method on your own extension. Something like this should get you started:

# ckanext.myext.harvesters.py

from ckanext.dcat.harvesters import DCATXMLHarvester

class MyDCATHarvester(DCATXMLHarvester):

    def modify_package_dict(self, package_dict, dcat_dict, harvest_object):
        # See https://github.com/ckan/ckan/blob/master/ckan/config/resource_formats.json
        mapping = {
             'application/xml': 'xml',
             'application/gzip': 'gz',
             # ...

        }

        for resource in package_dict.get('resources', []):
            if resource.get('format') in mapping:
                resource['format'] = mapping[resource['format']]

        return package_dict

To register your own DCAT harvester, add it as normal plugin to the setup.py file of your extension:

   entry_points=\
    """ 
    [ckan.plugins]
    # Add plugins here, eg
    my_dcat_harvester=ckanext.myext.harvesters:MyDCATHarvester
    """,

And then add it to your ini file:

ckan.plugins = ... my_dcat_harvester

Hope this helps

@amercader
Copy link
Member

The RDF harvester (dcat_rdf_harvester) handles this, setting up resource['format'] and resource['mimetype'] properly from the DCAT fields. Please use this harvester instead of the XML based one (https://github.com/ckan/ckanext-dcat#xml-dcat-harvester-deprecated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants