Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore_active is being rewritten due to race condition #3245

Closed
ZtF opened this issue Sep 20, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@ZtF
Copy link

commented Sep 20, 2016

CKAN Version if known (or site URL)

2.5.2

Please describe the expected behaviour

When file resource saved to FileStore, datapusher is notified about it, downloads the resource, process the file, create new table in DataStore and sets datastore_active flag for the particular resource.

Please describe the actual behaviour

It's working as described, though when uploading multiple resources (eg. through API), there is a race condition probably in ckan.lib.dictization.model_save.package_dict_save (or somewhere in the stack trace).
Sometimes the problem can even produce "Resource not found" in logs though it's most certainly there at the time. That's a case where datapusher tries to patch the resource and set datastore_active to True.

What steps can be taken to reproduce the issue?

In httpd conf file, the race condition can be "resolved" with forcing one process and one thread for CKAN.
Something like this in ckan httpd config file:
WSGIDaemonProcess ckan display-name=ckan processes=1 threads=1
If it's eg. 15 threads and 2 processes, the described issue appears.

@torfsen

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2016

This might be the same issue as ckan/datapusher#100.

@wardi

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2016

A nice solution would be to make package_update atomic, but that needs a new error class (and API version) for conflicts that should be retried.

A less-nice solutions is making sure datapusher only updates one resource at a time per dataset. Not a perfect solution, but will be much less likely to fail.

A quick hack would be to sneak the datastore_active change past package_patch/package_update so it doesn't update all the other dataset metadata fields, then make sure to re-index the dataset.

@amercader

This comment has been minimized.

Copy link
Member

commented Sep 20, 2016

to sneak the datastore_active change past package_patch/package_update

@wardi what do you mean by this?

@wardi

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2016

@amercader package_patch calls package_show+package_update to change the values passed. When the datastore just needs to set datastore_active=True on a resouce that's a lot of extra work and a much larger chance of conflict with action in progress.

Skipping all the validation and work done by package_update by changing the model directly would be a layering violation, but I could see the argument for it in this case, and it's unlikely to harm any real ckan instances. We are already doing something like this in our bulk actions that set the state to public/private.

@amercader

This comment has been minimized.

Copy link
Member

commented Sep 20, 2016

I see, sounds very sensible. +1

@wardi

This comment has been minimized.

Copy link
Contributor

commented Sep 22, 2016

I might not get to this very quickly, so if someone wants to submit a PR, here is the code that bulk updates the dataset private flag without going through package_update: https://github.com/ckan/ckan/blob/master/ckan/logic/action/update.py#L1138-L1149

And the place we need the resource-updating code is: https://github.com/ckan/ckan/blob/master/ckanext/datastore/logic/action.py#L147-L153

@smotornyuk

This comment has been minimized.

Copy link
Member

commented Nov 22, 2016

While i tried to reproduce issue, i wrote script that creates ~20 resource in different threads and it turned out that just one resource appears in package(as result of this issue). So i decided, that quickest fix won't work - it's solution just for datastore. And i implemented simplest(but, in other hand, there are less chances to break anything) decorator for atomic actions that uses conditional lock. How about this?

(but i don't know how to test it... creating threads inside CKAN test looks quite cumbersome...)

@wardi

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2016

@ZtF are you able to reproduce the issue with the #3331 applied?

@janaslo

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2017

Hi @wardi, #3331 solved the original issue, but we discovered that the same race condition appears when deleting a resource (and setting datastore_active flag to False). I took the simplest approach and applied the code from #3331 to this case too, see #3481.

@wardi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2017

mitigated by #3331 and #3481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.