Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch remote organization via action api #121

Merged
merged 7 commits into from Jan 15, 2015
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 21 additions & 1 deletion ckanext/harvest/harvesters/ckanharvester.py
Expand Up @@ -21,10 +21,14 @@ class CKANHarvester(HarvesterBase):
config = None

api_version = 2
action_api_version = 3

def _get_rest_api_offset(self):
return '/api/%d/rest' % self.api_version

def _get_action_api_offset(self):
return '/api/%d/action' % self.action_api_version

def _get_search_api_offset(self):
return '/api/%d/search' % self.api_version

Expand All @@ -48,6 +52,15 @@ def _get_group(self, base_url, group_name):
except Exception, e:
raise e

def _get_organization(self, base_url, org_name):
url = base_url + self._get_action_api_offset() + '/organization_show?id=' + org_name
try:
content = self._get_content(url)
content_dict = json.loads(content)
return content_dict['result']
except Exception, e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metaodi this should be avoided, as it will catch Exceptions that are not related (eg if the json parsing failed). I guess that if we are targeting an old CKAN instance we will get a 404 if the action is not available, so we should only catch that (probably a urllib2 execption)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning, but here it really doesn't matter why it fails. The implementation is analog to the implementation of _get_group, but of course it could be changed in both places. If - as you wrote - the parsing of the JSON failed, it should fail. Maybe it's better to remove the try/catch entirely, and delegate the exception handling to the caller.

In this particular case the caller catches all exception in an except block, as any exception simply means, that the remote organization could not be fetched.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also lots of programming errors that you can make, that cause different exceptions, which are nothing to do with the server's response. Not swallowing those exceptions helps fix the code. Hence why we have a CKAN policy of making the effort to identify the correct exception in except clauses. It's also mentioned in PEP8:

When catching exceptions, mention specific exceptions whenever possible instead of using a bare except: clause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed the line on 353. I still think that catch all exceptions are bad (even if wrote the _get_group one a long time ago 🙊 ) but I agree that we can let this one go in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then again @davidread is right and we shouldn't do it, so let's fix _get_group and _get_organization to be more explicit :)

raise e

def _set_config(self,config_str):
if config_str:
self.config = json.loads(config_str)
Expand Down Expand Up @@ -324,7 +337,14 @@ def import_stage(self,harvest_object):
log.info('Organization %s is not available' % remote_org)
if remote_orgs == 'create':
try:
org = self._get_group(harvest_object.source.url, remote_org)
try:
org = self._get_organization(harvest_object.source.url, remote_org)
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metaodi this is bad, see comment above

# fallback if remote CKAN exposes organizations as groups
# this especially targets older versions of CKAN
log.debug('_get_organization() failed, now trying _get_group()')
org = self._get_group(harvest_object.source.url, remote_org)

for key in ['packages', 'created', 'users', 'groups', 'tags', 'extras', 'display_name', 'type']:
org.pop(key, None)
get_action('organization_create')(context, org)
Expand Down