Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ckanext-datajson silently fails when it encounters an expired SSL certificate #1765

Closed
adborden opened this issue Jun 18, 2020 · 1 comment
Closed
Assignees
Labels
bug Software defect or bug component/catalog Related to catalog component playbooks/roles

Comments

@adborden
Copy link
Contributor

healtdata.gov data.json source shows 0 errors.

data.json: https://healthdata.gov/data.json

openssl log shows the certificate expired.

2020-06-17 23:39:05,356 DEBUG [ckanext.harvest.queue] Received harvest job id: fb7c1a84-ba62-4242-9396-b834c259ccb4

Traceback (most recent call last):
  File "/usr/bin/ckan", line 45, in <module>
    load_entry_point('PasteScript', 'console_scripts', 'paster')()
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 135, in command
    gather_callback(consumer, method, header, body)
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/queue.py", line 231, in gather_callback
    harvest_object_ids = harvester.gather_stage(job)
  File "/usr/lib/ckan-new/src/ckanext-datajson/ckanext/datajson/harvester_base.py", line 126, in gather_stage
    source_datasets, catalog_values = self.load_remote_catalog(harvest_job)
  File "/usr/lib/ckan-new/src/ckanext-datajson/ckanext/datajson/harvester_datajson.py", line 35, in load_remote_catalog
    datasets = json.loads(lstrip_bom(urllib2.urlopen(req).read()))
  File "/usr/local/lib/python2.7.10/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python2.7.10/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/local/lib/python2.7.10/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/local/lib/python2.7.10/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python2.7.10/lib/python2.7/urllib2.py", line 1240, in https_open
    context=self._context)
  File "/usr/local/lib/python2.7.10/lib/python2.7/urllib2.py", line 1197, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>

How to reproduce

  1. Login into catalog.data.gov as admin.
  2. Re-harvest Healthdata.gov data.json source
  3. Wait for completion

Expected behavior

Harvest job displays an error that it could not be harvested because the certificate is invalid.

Actual behavior

Job completes successfully with no updates.

@adborden adborden added bug Software defect or bug component/catalog Related to catalog component playbooks/roles labels Jun 18, 2020
@avdata99
Copy link
Contributor

More info about SSL here.

The current load_remote_catalog function should be cleaned and get the data.json font step by step checking for errors.

I started working with this function in the past.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software defect or bug component/catalog Related to catalog component playbooks/roles
Projects
None yet
Development

No branches or pull requests

3 participants