-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow upload of invalid resource data #85
Comments
I was under the impression that if only asynchronous validation is used, then the desired behaviour is already the case; you can upload an invalid resource and just get a report on the failures. It's when you enable synchronous validation that uploads will be rejected. |
You're right, I've missed this, I was testing v2.0.0 in synchronous mode. Wouldn't it make more sense to have the same behavior for both async and sync mode?
|
I don't think so. They're not just performance differences, they actually give you different feedback. Synchronous validation can actually give you an error, whereas async just gives you a report. If you have chosen not to validate resources until after upload is complete, then clearly you are okay with potentially invalid resources existing in your system. If you have chosen to take the performance hit to validate potentially large resources on the spot, then clearly you care a lot about ensuring that they're valid, and it makes sense to deny invalid uploads. It's like the difference between having metal detectors and X-raying luggage, vs just having security cameras and doing background checks. One has the capability and expectation to actually deny entry to those who fail, the other doesn't. |
In our case we have users that are collaborating on the uploaded data. Some files can be incomplete (e.g. incomplete patient surveys etc.) and the gaps will be filled by other users. I understand this can be achieved with asynchronous mode, but I think this behavior should be available in both modes. |
Hello @ThrawnCA . I'm working together with @fulior on the issue. Thanks a lot for your valuable feedback and pointing out the differences between sync and async validation regarding data file deletion. This resolves one of our use cases.
|
How so? Wouldn't it be more important to replicate production behaviour? With a local database, I would expect that validation would complete in seconds at most, so there shouldn't be any particular difficulties in testing it.
Why couldn't this use asynchronous validation? All you need is the report to tell you when you've successfully fixed it, right? So asynchronous validation works fine. |
Currently ckanext-validation doesn't allow uploading resources which don’t pass validation. If the validation fails the extension tries to remove a failed resource file. But there are cases where CKAN should be allowed to store invalid data. In cases like asynchronous bulk upload or large tabular data compiled by multiple parties the validation report is important for data curators to know what’s going on in the system.
This could be solved by adding extra configuration option:
This would let us to modify logic in ckanext/validation/logic.py (https://github.com/frictionlessdata/ckanext-validation/blob/1073c80dace453a404df3d5f1ac1ae86a88b5029/ckanext/validation/logic.py#L665) to preserve data file in case of validation error e.g:
I’ll gladly provide a PR for this if you think this is something useful. It’s a requirement of our users, so we’re currently using it in a forked version of ckanext-validation
The text was updated successfully, but these errors were encountered: