Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug adding dataset to group #5015

Open
Aaron-M opened this issue Oct 10, 2019 · 14 comments
Open

Bug adding dataset to group #5015

Aaron-M opened this issue Oct 10, 2019 · 14 comments
Assignees

Comments

@Aaron-M
Copy link

@Aaron-M Aaron-M commented Oct 10, 2019

Note: Are you submitting a security related issue that could be a potential vulnerability? Please send it to security@ckan.org instead.

CKAN Version if known (or site URL)

2.7.2

Please describe the expected behaviour

You add a dataset to a Group, that dataset is then listed under that Groups datasets.

Please describe the actual behaviour

When creating a dataset (manually, or using API), and adding the dataset to a group, the dataset gets created, on the groups tab its shows as being part of that group, BUT when you go to the group that dataset is not listed.

We have identified that a special character in one of our authors names appears to be the problem. The problem surname is "Schönberger". The problem also occurs if special characters appear in the description of the dataset or a resource.

If you then manage (edit) the dataset and change the authors ö to a plain o, ie. Schonberger
then the dataset shows under the group.

What steps can be taken to reproduce the issue?

See above. I was going to try and recreate on demo.ckan.org - but there is a problem registering (submit and it says bad captcha, but the registratin form doensn't contain captcha info).

@wardi wardi self-assigned this Oct 10, 2019
@sivang

This comment has been minimized.

Copy link
Member

@sivang sivang commented Oct 10, 2019

A general note, as it sounds like this issue has something to do with SOLR indexing, possibly due to the i18n character or so (some versions require extra config for that IIRC)

@wardi

This comment has been minimized.

Copy link
Contributor

@wardi wardi commented Oct 10, 2019

If this is related to a special character in an author's name I wonder if you're running into an encoding error when the dataset is being indexed? Do you have a traceback in your logs? That would give us more information to work from.

Please also try to reproduce the error without your extension and on the latest 2.7 patch release and let us know what happens.

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Oct 14, 2019

We updated to ckan version 2.7.6 on our test system and were able to replicate the error.

This is the error in the apache logs:

[Tue Oct 15 08:08:59.212727 2019] [wsgi:error] [pid 26115:tid 139634450482944] 2019-10-15 08:08:59,212 ERROR [ckan.lib.search] 'ascii' codec can't decode byte 0xe2 in position 5200: ordinal not in range(128)
[Tue Oct 15 08:08:59.212761 2019] [wsgi:error] [pid 26115:tid 139634450482944] Traceback (most recent call last):
[Tue Oct 15 08:08:59.212765 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/__init__.py", line 97, in dispatch_by_operation
[Tue Oct 15 08:08:59.212769 2019] [wsgi:error] [pid 26115:tid 139634450482944]     index.update_dict(entity)
[Tue Oct 15 08:08:59.212784 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 101, in update_dict
[Tue Oct 15 08:08:59.212788 2019] [wsgi:error] [pid 26115:tid 139634450482944]     self.index_package(pkg_dict, defer_commit)
[Tue Oct 15 08:08:59.212792 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 295, in index_package
[Tue Oct 15 08:08:59.212795 2019] [wsgi:error] [pid 26115:tid 139634450482944]     conn.add(docs=[pkg_dict], commit=commit)
[Tue Oct 15 08:08:59.212798 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/pysolr.py", line 891, in add
[Tue Oct 15 08:08:59.212802 2019] [wsgi:error] [pid 26115:tid 139634450482944]     overwrite=overwrite, handler=handler)
[Tue Oct 15 08:08:59.212805 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/pysolr.py", line 478, in _update
[Tue Oct 15 08:08:59.212809 2019] [wsgi:error] [pid 26115:tid 139634450482944]     return self._send_request('post', path, message, {'Content-type': 'text/xml; charset=utf-8'})
[Tue Oct 15 08:08:59.212812 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/pysolr.py", line 366, in _send_request
[Tue Oct 15 08:08:59.212816 2019] [wsgi:error] [pid 26115:tid 139634450482944]     timeout=self.timeout)
[Tue Oct 15 08:08:59.212819 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/requests/sessions.py", line 340, in post
[Tue Oct 15 08:08:59.212823 2019] [wsgi:error] [pid 26115:tid 139634450482944]     return self.request('POST', url, data=data, **kwargs)
[Tue Oct 15 08:08:59.212826 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/requests/sessions.py", line 279, in request
[Tue Oct 15 08:08:59.212830 2019] [wsgi:error] [pid 26115:tid 139634450482944]     resp = self.send(prep, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies)
[Tue Oct 15 08:08:59.212833 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/requests/sessions.py", line 374, in send
[Tue Oct 15 08:08:59.212837 2019] [wsgi:error] [pid 26115:tid 139634450482944]     r = adapter.send(request, **kwargs)
[Tue Oct 15 08:08:59.212840 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/requests/adapters.py", line 174, in send
[Tue Oct 15 08:08:59.212843 2019] [wsgi:error] [pid 26115:tid 139634450482944]     timeout=timeout
[Tue Oct 15 08:08:59.212847 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 422, in urlopen
[Tue Oct 15 08:08:59.212850 2019] [wsgi:error] [pid 26115:tid 139634450482944]     body=body, headers=headers)
[Tue Oct 15 08:08:59.212854 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/ckan/default/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 274, in _make_request
[Tue Oct 15 08:08:59.212858 2019] [wsgi:error] [pid 26115:tid 139634450482944]     conn.request(method, url, **httplib_request_kw)
[Tue Oct 15 08:08:59.212861 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/python2.7/httplib.py", line 1057, in request
[Tue Oct 15 08:08:59.212864 2019] [wsgi:error] [pid 26115:tid 139634450482944]     self._send_request(method, url, body, headers)
[Tue Oct 15 08:08:59.212867 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/python2.7/httplib.py", line 1097, in _send_request
[Tue Oct 15 08:08:59.212871 2019] [wsgi:error] [pid 26115:tid 139634450482944]     self.endheaders(body)
[Tue Oct 15 08:08:59.212874 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders
[Tue Oct 15 08:08:59.212881 2019] [wsgi:error] [pid 26115:tid 139634450482944]     self._send_output(message_body)
[Tue Oct 15 08:08:59.212885 2019] [wsgi:error] [pid 26115:tid 139634450482944]   File "/usr/lib/python2.7/httplib.py", line 895, in _send_output
[Tue Oct 15 08:08:59.212888 2019] [wsgi:error] [pid 26115:tid 139634450482944]     msg += message_body
@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Oct 14, 2019

I have configured a system that doesn't use the ckanext-en-lrnz extension. It also displays the same error. I uploaded a scrubbed production.ini. Hopefully that will help debugging.

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Oct 14, 2019

I have removed the ckanext-lcrnz extension as well as the ckanext-en-lrnz extension. This current build is much closer to a stock installation but still has the same ASCII error. The new scrubbed production.ini.

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Oct 15, 2019

I can confirm this is still an issue with ckan version 2.8.3. I have tried the same user name 'Schönberger'.

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Oct 21, 2019

I thought maybe the database locale's could be a problem. Can someone have a look to see if any adjustments need to be made regarding the encoding/locale?

psql -l
                                             List of databases
       Name        |    Owner     | Encoding |   Collate   |    Ctype    |        Access privileges         
-------------------+--------------+----------+-------------+-------------+----------------------------------
 ckan_default      | ckan_default | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =T/ckan_default                 +
                   |              |          |             |             | ckan_default=CTc/ckan_default
 datastore_default | ckan_default | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =T/ckan_default                 +
                   |              |          |             |             | ckan_default=CTc/ckan_default   +
                   |              |          |             |             | datastore_default=c/ckan_default
@mutantsan

This comment has been minimized.

Copy link
Contributor

@mutantsan mutantsan commented Nov 5, 2019

Unfortunately, i can't reproduce your problem on 2.7.6, 2.8.3 or master branches. Also, I tried it with your extension and everything still works well.

I can confirm this is still an issue with ckan version 2.8.3. I have tried the same user name 'Schönberger'.

Please, clarify. Did you create user with username Schönberger somehow? Or you just trying to provide the author name, that consist this substring?

@Aaron-M

This comment has been minimized.

Copy link
Author

@Aaron-M Aaron-M commented Nov 5, 2019

Hi @mutantsan , I have done a test on 2.7.6 with AUTHOR "Schönberger, Ines" and I can confirm it does not work. If you can get the demo.ckan.org site fixed (as per #5022) I can create a dataset there which you can see and check the behaviour - at the moment nobody cannot register to use the demo site due a missing captcha setting. Below are some screen shots confirming the behaviour from our 2.7.6 test environment:

image

The dataset "Test Ines name" has been assigned to group "TestGroup" - as can be seen below
image

Going to the Group, only one dataset is shown, and not dataset "Test Ines name"
image

Interestingly I noticed this time that the dataset is also not showing up in it's assigned Organisation (Labelled Collection in our instance).
image

Now if I go back and edit the dataset, changing the author name to "Schonberger, Ines" the dataset now appears under its Orgasniation (aka Collection)
image

and under the Group
image

FYI, this is our install setup:
{"ckan_version": "2.7.6", "site_url": "https://test-ckan.landcareresearch.co.nz", "site_description": "A shared environment for managing Landcare Research Data.", "site_title": "Landcare Research Test CKAN Repository", "error_emails_to": "datamanager@landcareresearch.co.nz", "locale_default": "en_LRNZ", "extensions": ["recline_grid_view", "recline_map_view", "recline_graph_view", "datastore", "resource_proxy", "datapusher", "webpage_view", "geo_view", "geojson_view", "wmts_view", "pdf_view", "repeating", "ldap", "lcrnz", "spatial_metadata", "spatial_query", "recline_view", "image_view", "text_view", "zip_view", "harvest", "ckan_harvester", "dcat", "dcat_rdf_harvester", "dcat_json_harvester", "dcat_json_interface", "structured_data", "officedocs_view"]}

@mutantsan

This comment has been minimized.

Copy link
Contributor

@mutantsan mutantsan commented Nov 7, 2019

Can you please provide me the versions of requests and pysolr libs that you are using? And by the way, which version of SOLR?

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Nov 8, 2019

Greetings, I am in the process of updating the ckan puppet module in order to isolate this problem. From my testing today, I was unable to reproduce the issue with a 'trimmed' version of our ckan installation. I will update this issue when I can reproduce in a more isolated environment.

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Nov 15, 2019

I have progressed this issue more. The ckanext-geoview extension seems to be causing the error. I will prepare a vagrant configuration that will demonstrate the issue next week. Note, I have been able to demonstrate the problem on a fairly stock install of ckan (no New Zealand Language or Landcare Research extensions). The only extension installed is the ckanext-geoview. I haven't had time to dig into the ckanext-geoview to see where it might be failing.

@Conzar

This comment has been minimized.

Copy link
Contributor

@Conzar Conzar commented Nov 17, 2019

I have updated the vagrant directory in the puppet-ckan repository. To verify the bug exists, please do a git pull on the master branch. Then do the following:

cd vagrant
scripts/run.bash

You will need vagrant installed.

Wait for the run.bash script to finish. Then do the following:

vagrant ssh
sudo ckan_create_admin.bash -u admin -e admin@local.com
(enter the password)

Then open a browser on your host computer and go to
http://192.168.33.10/

Login with your admin credentials. Create an organisation called test. Add a dataset with the Schönberger author name.

The log files are in /var/log/apache2

@Aaron-M

This comment has been minimized.

Copy link
Author

@Aaron-M Aaron-M commented Jan 7, 2020

Just to add that this bug also occurs when special characters are present in the description of the dataset, or of any of the resources within.

In demo.ckan.org i have created a dataset which demontrates this problem (has the author with the special character, and in the resource description)
https://demo.ckan.org/dataset/testing-ckan-issue-5015

However on demo it does show up in the group, probably because geoview is not enabled on that site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.