Action API is inconsistent #473

Merged
merged 32 commits into from Jun 13, 2013

Projects

None yet

4 participants

@domoritz

This pr is not supposed to make everything correct but to make it better that it is at the moment and refactor some code to a point from which we can do more refactoring.


What has changed so far:

  • Many actions do not produce a server error any more if limit and offset is a negative integer or no integer at all. Also validation for other parameters. Kind of breaking because instead of a 500, you get a 409.
  • ParameterError was removed in favour of ValidationError breaking
  • _get_or_bust goes through the validation code (common code paths)
  • _get_or_bust returns a list instead of a single string for a missing field (consistent with validate) breaking
  • Updated docstrings
    • Datastore docs indicate optional parameters
  • current_package_list_with_resources supports offset, page is deprecated
  • Auto complete actions get consistent parameters, q is now required in all auto complete actions breaking, all actions support limit
  • Added tests for changes
  • [package|resource]search have proper parameter validation
  • small fixes...

There are still many functions that have no proper validation. Often bad input is not tested and adding validation takes time because new validation definitions have to be written.

One thing to discuss is whether we want to have basic validation for all action functions in controllers/api instead of in every single action. This would require a parameter schema for every action function.

@domoritz domoritz was assigned Feb 26, 2013
@tobes

@domoritz I think ux label is really for front-end syuff so the api would fall outside that even if it is part of the ckan experience

domoritz added some commits Feb 27, 2013
@domoritz domoritz [#473] get_or_bust uses schema validation so that we use common code …
…paths and similar validation procedures
70ce14f
@domoritz domoritz [#473] related_list validates whether we have either an id or a name 0a6411e
@domoritz domoritz [#473] datastore_create should rely on a valid context
Otherwise errors occur when doing a deep copy of the context in validate
c229bc4
@domoritz domoritz [#473] indicate optional parameters in datastore API c490c17
@domoritz domoritz [#473] Test that wrong record type raises Integrity Error 19b5388
@domoritz domoritz [#473] Return IntegrityError as json from API e6fc5d9
@domoritz domoritz [#473] related_list should return Query if neither an id nor a name a…
…re provided
c977f39
@domoritz domoritz [#473] Validate parameters for current_package_list_with_resources e028159
@domoritz domoritz [#473] Add offset to current_package_list_with_resources and mark pag…
…e deprecated
4fc9644
@domoritz domoritz [#473] Unify validation for autocompletion a0d84a1
@domoritz domoritz [#473] Refactor get_or_bust, we do not have to give validate a fake c…
…ontext
433e77d
@domoritz domoritz [#473] Validate package_search parameters against a schema bd96b7e
@domoritz domoritz [#473] Validate parameters for resource_search 0acfba7
@domoritz domoritz [#473] Fix issue with empty schema in context of package_search f0cf9ca
@domoritz domoritz [#473] Add missing rtype to status_show 2939873
@domoritz domoritz [#473] Validate dashboard_activity_list, dashboard_activity_list_html…
…, {group, dataset}_followee_list,
e9512c4
@domoritz domoritz Merge branch 'master' into 473-consistent-api-validation
Conflicts:
	ckan/tests/logic/test_action.py
be5f37f
@domoritz domoritz Merge branch 'master' into 473-consistent-api-validation d6cd920
@domoritz domoritz [#473] Validate package_search input with schema and use the validate…
…d data_dict
a62367d
@domoritz domoritz [#473] Fix package_search validation.
The problem was that `facet.field` is not always a list of strings but sometimes a string of a python list of strings. Furthermore, needless parameters are reported in search and not in the validation step so that they are reported as SearchQueryErrors and not ValidationErrors.
f0abac7
@domoritz

I think this is ready for review now.

@johnglover johnglover was assigned Mar 26, 2013
@johnglover johnglover commented on an outdated diff May 29, 2013
ckan/logic/__init__.py
if isinstance(keys, basestring):
keys = [keys]
- for key in keys:
- try:
- value = data_dict[key]
- values.append(value)
- except KeyError:
- errors[key] = _('Missing value')
+
+ import ckan.logic.schema as schema_module
@johnglover
johnglover May 29, 2013

'schema_module' seems like an unusual way to import this, 'schema' can be used here as it is not defined anywhere else.

@johnglover johnglover and 1 other commented on an outdated diff May 29, 2013
ckan/logic/__init__.py
if isinstance(keys, basestring):
keys = [keys]
- for key in keys:
- try:
- value = data_dict[key]
- values.append(value)
- except KeyError:
- errors[key] = _('Missing value')
+
+ import ckan.logic.schema as schema_module
+ schema = schema_module._create_schema_for_required_keys(keys)
@johnglover
johnglover May 29, 2013

I think that we should make create_schema_for_required_keys a public function in the schema module if we're going to call it from here. It doesn't actually seem to be used anywhere else so it shouldn't be a problem.

@tobes
tobes May 29, 2013

yes also @domoritz never ever ever use private functions outside the file they are defined - it is evil and it will always end up breaking stuff.

@johnglover johnglover and 1 other commented on an outdated diff May 29, 2013
ckan/logic/action/get.py
- page = int(data_dict.get('page', 1))
+ schema = context.get('schema', logic.schema.default_package_list_schema())
+ data_dict, errors = _validate(data_dict, schema, context)
+ if errors:
+ raise ValidationError(errors)
+
+ limit = data_dict.get('limit')
+ offset = int(data_dict.get('offset', 0))
+
+ if not 'offset' in data_dict and 'page' in data_dict:
+ log.warning('"page" parameter is deprecated. '
+ 'Use the "offset" parameter instead')
+ page = int(data_dict['page'])
+ if page < 1:
+ raise ValidationError(_('Must be larger than 0'))
+ offset = (page - 1) * limit
@johnglover
johnglover May 29, 2013

limit may be None at this point, perhaps a default value should be specified on line 105

@domoritz
domoritz May 30, 2013

That won't work because it would break the check whether limit is None below these lines. But you are right that limit could be None, while page is defined (even though the docs say you shouldn't do that).

@johnglover johnglover commented on the diff May 29, 2013
ckan/logic/action/get.py
model = context['model']
session = context['session']
_check_access('format_autocomplete', context, data_dict)
- q = data_dict.get('q', None)
- if not q:
- return []
-
+ q = data_dict['q']
@johnglover
johnglover May 29, 2013

This does slightly change the behaviour of the autocomplete APIs but I think seems ok, I'm not sure why anyone would expect to be able to call them with no q parameter and get a empty list...

@johnglover johnglover and 1 other commented on an outdated diff May 29, 2013
ckan/logic/action/get.py
@@ -1237,6 +1261,20 @@ def package_search(context, data_dict):
query cannot be changed. CKAN always returns the matched datasets as
dictionary objects.
'''
+ # sometimes context['schema'] is None
+ schema = context.get('schema')
+ if not schema:
+ schema = logic.schema.default_package_search_schema()
@johnglover
johnglover May 29, 2013

These 4 lines can be replaced with schema = context.get('schema', logic.schema.default_package_search_schema()), the comment is not necessary here.

@domoritz
domoritz May 30, 2013

Thanks. Found some other places that did similar things and fixed them as well.

@domoritz
domoritz May 30, 2013

Actually, this doesn't work because context has a key schema but the value is None.

@johnglover johnglover and 1 other commented on an outdated diff May 29, 2013
ckan/logic/action/get.py
@@ -1237,6 +1261,20 @@ def package_search(context, data_dict):
query cannot be changed. CKAN always returns the matched datasets as
dictionary objects.
'''
+ # sometimes context['schema'] is None
+ schema = context.get('schema')
+ if not schema:
+ schema = logic.schema.default_package_search_schema()
+ if isinstance(data_dict.get('facet.field'), basestring):
+ data_dict['facet.field'] = ast.literal_eval(data_dict['facet.field'])
@johnglover
johnglover May 29, 2013

If we say that any string passed through the api has to be quoted with double strings we can just use json.loads here instead, which seems nicer than doing an eval.

@domoritz
domoritz May 30, 2013

I haven't had the time to look deeper into this but changing this already breaks https://github.com/okfn/ckan/blob/473-consistent-api-validation/ckan/tests/logic/test_action.py#L1287

@johnglover johnglover commented on an outdated diff May 29, 2013
ckan/logic/action/get.py
@@ -1237,6 +1261,20 @@ def package_search(context, data_dict):
query cannot be changed. CKAN always returns the matched datasets as
dictionary objects.
'''
+ # sometimes context['schema'] is None
+ schema = context.get('schema')
+ if not schema:
+ schema = logic.schema.default_package_search_schema()
+ if isinstance(data_dict.get('facet.field'), basestring):
+ data_dict['facet.field'] = ast.literal_eval(data_dict['facet.field'])
+ data_dict, errors = _validate(data_dict, schema, context)
+ # put the extras back into the data_dict so that the search can
+ # report needless parameters
@johnglover
johnglover May 29, 2013

we should discuss this further, it seems like we really should be doing validation here and not calling SearchQuery.run directly as we do in a few places

@johnglover johnglover commented on an outdated diff May 29, 2013
ckan/logic/validators.py
@@ -54,6 +54,12 @@ def int_validator(value, context):
except (AttributeError, ValueError), e:
raise Invalid(_('Invalid integer'))
+def pos_int_validator(value, context):
@johnglover
johnglover May 29, 2013

We should probably rename this to avoid later confusion as it allows 0 to pass and technically 0 is not a positive integer.

@johnglover johnglover commented on an outdated diff May 29, 2013
ckan/logic/validators.py
@@ -54,6 +54,12 @@ def int_validator(value, context):
except (AttributeError, ValueError), e:
raise Invalid(_('Invalid integer'))
+def pos_int_validator(value, context):
+ value = int_validator(value, context)
+ if value < 0:
+ raise Invalid(_('Must be positive integer'))
@johnglover
johnglover May 29, 2013

I think that we decided recently that we were not going to translate API error messages (this would also apply to 'list_of_strings below').

@johnglover johnglover and 2 others commented on an outdated diff May 29, 2013
ckan/tests/logic/test_action.py
+
+ postparams = '%s=1' % json.dumps({
+ 'limit': 1,
+ 'offset': 1})
+ res = json.loads(self.app.post(url, params=postparams).body)
+ assert res['success']
+ assert len(res['result']) == 1
+
+ postparams = '%s=1' % json.dumps({
+ 'limit': '5'})
+ res = json.loads(self.app.post(url, params=postparams).body)
+ assert res['success']
+
+ postparams = '%s=1' % json.dumps({
+ 'limit': -2})
+ res = json.loads(self.app.post(url, params=postparams, status=StatusCodes.STATUS_409_CONFLICT).body)
@johnglover
johnglover May 29, 2013

I know that there are probably PEP8 violations in this file already but it would be nice not to add any more :) Can you check the line lengths here please.

@tobes
tobes May 29, 2013

@domoritz just to say only pep8 your new lines not the whole file (unless it is like 2 lines) pep8 fixes should be done as there own pull requests to make them readable

@domoritz
domoritz May 29, 2013

@tobes Okay, will keep that in mind.

@johnglover

Github seems to lose comments when you commit which is annoying, but anyway...

Only one test breaks when this is changed to json.loads, and that is because the urlencoded string is not valid JSON. It should pass if the strings are double quoted. I don't think it's unreasonable to require JSON if doing this through a GET request.

@domoritz

@johnglover I removed the ast code and use json instead. Are there any things left that prevent the merge (apart form getting master into this branch, of course)?

@domoritz

Concerning the breaking changes: http://xkcd.com/1172/

@johnglover

@domoritz No I don't think so, I'll merge.

@johnglover

@domoritz Merged, tests pass locally but have a few failures on Travis, any ideas?
https://travis-ci.org/okfn/ckan/jobs/7978032

@domoritz

Not without further looking into the individual issues.

@johnglover

Yes, it seems to be specific to this branch though, the last time I merged it was also passing in master but failing here. Perhaps something strange is going on with this branch and Travis...

@domoritz

I restarted the build. Let's see what happens.

@johnglover

The current test failures are triggered by calling get_or_bust to get the model object from the context dict. The model is going through the validation functions and causing an exception when we try to deepcopy it in augment_data.

@domoritz is going to have a look at this.

@tobes

@johnglover when you ran the tests locally did you run them all? or just the ckan ones?

try nosetests --ckan --with-pylons=test-core.ini --nologcapture ckan ckanext I've run into this problem and it was due to the datastore tests polluting ckan

specifically

import ckanext.datastore.plugin as plugin in ckanext/datastore/tests/test_configuration.py to be honest that whole file is hopelessly broken

@johnglover

@tobes This is with just the ckan tests, but with the datastore loaded as a plugin. I think we have a fix for this, which is just to keep get_or_bust as a special case like it currently is (doesn't go through validate) but raises a proper ValidationError.

@domoritz

@tobes Can you elaborate a little bit on "hopelessly broken"?

@tobes

To be honest I'm not sure where to start.

Essentially you get the plugin and then go in and replace the methods and then test these new methods. It is so broken it makes me cry. I had looked at fixing it but the whole thing is too confused.

The plugins are a mess which I'm trying to fix in #549 but some of the stuff done to get around these problems is dubious and I think that a lot of what is being tested may not be the same as you think is being tested and more specifically often I don't think you actually are testing the plugin but instead a highly modified version of the plugin that the user will never encounter in the wild.

but things like importing a plugin in a test for example causes major issues and should never happen. in this example you permanently have the datastore plugin at least partially enabled for tests when it should not have been present.

@domoritz

Well, this file has unit tests and we had to mock functions so that we only test a single function and not the whole thing. Obviously these tests do not run in a realistic environment since we don't ever talk to the db. If you think that this is not necessary, I'd be happy to delete the whole file.

domoritz added some commits Jun 12, 2013
@domoritz

Replacing deepcopy with copy resolves all issues. @kindly has to decide whether this is okay.

@johnglover

I also think it's fine (and probably preferable) for get_or_bust to not go through the validate function but create a proper ValidationError. Using validate seems like adding unnecessary overhead to what should be a very simple function.

@kindly

Can not see obvious problem with using copy. The data is flattened and most values are strings so it should not make much difference.
I think deepcopy was used to make sure that if anyone manipulated the input data after it would not effect the contents of result. I do not see that happening though.
So should be fine.

@johnglover

Ok well if @kindly and @domoritz are both happy with this then I'll merge.

@johnglover johnglover merged commit 349e1cb into master Jun 13, 2013

1 check passed

Details default The Travis CI build passed
@johnglover johnglover deleted the 473-consistent-api-validation branch Jun 13, 2013
@domoritz

Wohooo 💃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment