Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If ForeignKey relation has no matches, all results are returned #84

Closed
whitews opened this issue Mar 25, 2013 · 36 comments
Closed

If ForeignKey relation has no matches, all results are returned #84

whitews opened this issue Mar 25, 2013 · 36 comments

Comments

@whitews
Copy link
Contributor

whitews commented Mar 25, 2013

When filtering on a foreign key relationship, if a non-existent foreign key is specifed, the qs method in BaseFilterSet receives a ValidationError on line 259.

https://github.com/alex/django-filter/blob/master/django_filters/filterset.py#L259

This results in the queryset not getting filtered. Shouldn't one expect this case to return 0 results? The reasoning is that if no object matches the foreign key specified, then nothing in the queryset could match, so perhaps return an empty list? Maybe as simple as the following:

@property
def qs(self):
    if not hasattr(self, '_qs'):
        qs = self.queryset.all()
        for name, filter_ in self.filters.iteritems():
            if self.is_bound:
                data = self.form[name].data
            else:
                data = self.form.initial.get(name, self.form[name].field.initial)
            try:
                val = self.form.fields[name].clean(data)
            except forms.ValidationError:
                qs = []
            else:
                qs = filter_.filter(qs, val)
        if self._meta.order_by:
            try:
                value = self.form.fields[self.order_by_field].clean(self.form[self.order_by_field].data)
                if value:
                    qs = qs.order_by(value)
            except forms.ValidationError:
                pass
        self._qs = qs
    return self._qs
@apollo13
Copy link
Contributor

Hi,

could you please open an issue on github?

Regards,
Florian

@apollo13
Copy link
Contributor

Hehe, to slow I was, I am reading my emails from oldest to newest :/

@nkryptic
Copy link
Contributor

@whitews

The issue you raise is a valid concern, but returning an empty list on validation errors would mean a change to the default FilterSet behavior. Unfortunately, there is only one test which currently expects this behavior, but I have it on my list of areas to add tests for.

I'd like to hear more about your specific use case and why/how invalid values for a foreign key would be submitted, as I don't see how that would be if using the rendered form and user submitted data. But that doesn't negate the need to address the behavior when validation does fail.

Looking at other applications which provide listings that can be filtered (I looked at amazon and newegg's advanced search), if you enter a value for "min" price of 'abc', you will still get the same results as if you didn't specify anything. In those cases, the validation error is ignored and no additional filtering is done.

What you have proposed (in #85) is to always return zero results if any filter is invalid and I'm not sure that's the best approach to take. I think the behavior of skipping the filter is the appropriate "default" and feel that's justified by the behavior I've seen with other search filters, but I realize other possibilities exist. I don't mind considering the alternatives and I could also be wrong in some of my assumptions here, but I think justification of any change is in order (as I don't see this as a bug).

One possibility improvement is for the rendered form, after an invalid value is used, to reflect that fact (by resetting the value to the default or blank value), so the user can see the value submitted was ignored. Another might be providing a way for invalid values to trigger zero results, as you propose. Additionally, the form could be cleaned and errors displayed.

So, if you could describe how you're coming across this scenario, that might help in identifying what tack to take here.

Man, I need more hours in a day so I can find some time to work on this.

@whitews
Copy link
Contributor Author

whitews commented Mar 26, 2013

@nkryptic

I ran across the issue because I'm using a REST api. Imagine a Project model where a project can have multiple Site model instances (e.g. locations). There are many projects among many users and a simple project name is not unique across all the projects. Getting the primary key for a project and then filtering on all sites('site__project') for that project should return all that project's sites, which it does. However, if the project ID specified doesn't exist, I would expect the site results to be 0, else it implies the non-existent project actually exists and has all the sites returned.

I also considered not passing on the ValidationError and throwing it up the chain so the application using django-filter can decide on what to do. However, this seems far more disruptive, even though it would be arguably more useful since it contains the reason why no results were returned.

I'm not sure why Amazon or Newegg would return results for 'abc', but I'd happily give them a few letters for anything in their inventory. As a counter example, and closer to what I'm implementing is searching for a PubMed journal article like so:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=123456789

This doesn't return all the PubMed articles, and in fact throws an error.

I'd be happy to help make my proposed changes more inline with your project's goals if you can point me in the right direction.

Thanks,
Scott

@nkryptic
Copy link
Contributor

Ah, a REST api makes sense.

django-filter was designed for end-user filtering via submitting a web form. Generation of the form and appropriate fields to display is somewhat core to the current design. It's the reason the ModelChoiceFilter exists, really. But, with an API you aren't using the forms and you probably don't care about generating select widgets with the acceptable choices. You want the form because it converts the data to the correct type for filtering, but don't care if the submitted value isn't in the generated choices, right?

Wouldn't using a NumberFilter (probably even a CharFilter) accomplish what you want?

class MySiteFilterSet(FilterSet):
    project = NumberFilter(name='project__pk')

Or, if the project is a basic expectation, you could specify the project in the initial queryset:

OtherSiteFilterSet(queryset=Site.objects.filter(project__pk=project_id))

On a more general note, there seem to be two different types of "invalid" at work here. There's "invalid" data where the value isn't the correct type (alpha characters, when a number is expected - or specifying a date in an unknown format). Then there are "invalid" submissions that contain data of the correct type, but don't meet form field expectations (values aren't acceptable unless they are in the list of known choices...).

When dealing with end-users who are filtering with a web form, I think it viable to ignore the "invalid" data that can't be coerced (as I described in my previous comment with price='abc'). But, it should evident to them it is ignored - like clearing the form input - which isn't currently done). When using the web form, it isn't likely you'd see the second type of "invalid" data, unless you change the URL query string manually.

I think the option to return 0 results is also plausible, but as it changes the current behavior I don't know if it should simply be changed. Also, the current practice when using the web form isn't to call is_valid on it, so that would need to change in order to display errors for the form. Finally, it raises the issue of error rendering and customizing, which haven't needed to be addressed due to skipping filters and which are difficult to control because the form is automatically generated.

Perhaps providing an option to allow a switch from skipping filters to returning 0 results could work, but the form feedback would need to be dealt with I think.

I'm all in favor of making things better for everyone wanting to use django-filter. Maybe @tomchristie will weigh in with insight on django-rest-framework usage and what could be changed to make it better or maybe @apollo13 might have a opinion here that will help with the decision.

@whitews
Copy link
Contributor Author

whitews commented Mar 27, 2013

@nkryptic

Thanks for the detailed explanation, I didn't quite understand the context of the django-filter project. As you stated, I do have workarounds, but thought the behavior was a little odd so I figured I'd try to contribute.

I just tried playing around to see if I could separate the 2 cases you describe, hoping to find a way to tease out a ValueError for the string vs int cases and still get a ValidationError or something else for a 'that doesn't exist' case. It was a dead end. If you know of any foo I can use, I can keep trying.

The more I think about this the more I lean towards just throwing the validation error instead of passing it by. This seems to satisfy both our conditions. I am perfectly happy with getting a validation error returned, as in my case it stops any automated process from continuing to work on false information. And for an actual user submitted form, couldn't this same validation error get presented to the user so they know what happened, then they can fix their entry?

If this is agreeable, I can fix my pull request. I'd also like to hear from @tomchristie or anyone else on their thoughts.

Thanks again,
Scott

@apollo13
Copy link
Contributor

Hmm, basically I think it should always be clear what's happening, so showing all objects is fine as long as we make clear that one filter was not applied.

So my current proposal would be:

  • Don't filter using the broken parts of the data (but apply the rest)
  • Make clear that some parts of the filter weren't applied (error message etc…)

@whitews
Copy link
Contributor Author

whitews commented Mar 28, 2013

@apollo13

But, in my case, the supplied data isn't really broken per se, the value is the right 'kind' it's just that the foreign key value asked for doesn't exist. I guess I don't really understand the usefulness of returning results for a broken filter in any case, but I may be missing something. Why not just let the ValidationError get thrown and show it on the form?

This way the user knows what the problem is and can attempt to fix it. Maybe this alerts them that the date format was incorrect or something else that they did need to filter on. It seems that returning results could easily allow the error to get overlooked.

Thanks again to both of you for taking the time to respond,
Scott

@apollo13
Copy link
Contributor

I think I agree with you, I basically use filters from time to time without running through a form (and as such not having any good way to display the errors). So I support changing the default to return an empty queryset.

I have to look how django-filter generates the form and make sure that errors show up there like they should, I'll play with it tomorrow.

EDIT:// Hrmpf, this issue is ugly either way, the only thing I can come up with is that we at least should provide the users with clear feedback, I am slightly in favor of returning no results on invalid/broken data.

@apollo13
Copy link
Contributor

@nkryptic I think we should return zero objects in the case of invalid and/or broken data, since:

Errors should never pass silently.

and

In the face of ambiguity, refuse the temptation to guess.

Since we can't pass any error information in non-form solutions returning zero results would be more fitting (I don't think raising an server error is an option ;)). Any objections? We will just mention it in the release notes, I don't think that the behavioral change is to bad.

@whitews Please make sure to update the release notes and in the qs method do the same dance for the order_by stuff.

EDIT:// Another argument for the zero result approach would also be that actually filtering for the broken data would return no results either…

@nkryptic
Copy link
Contributor

@apollo13 and @whitews

Pull-request #86 is a bit of an experiment to make better use of the form and offer a solution to this issue.

I think changing the default is fine, so no objection there. I do however think that in certain use cases a more forgiving tack may be desired and wonder if we'll just have other issues posted here if we don't provide the option. One of those cases is likely sales, where it is generally considered better to show any results - even if not what the user intended - as it visibility generally leads to higher sales.

So, in the pull-request, I added a flag for strict which defaults to True and will generate 0 results on error. If not strict, then the filtering will be performed, but only on the valid data. Errors will be present in the form in either case.

I think we'll have to see whether anyone else desires the behavior for the form to simply 'reset' invalid fields, instead of displaying errors.

disclosure: I have only used filtersets with forms submitted by users.

@apollo13
Copy link
Contributor

I think the strict flag would be fine (although a bit of bloat, but I think both usecases are valid), I have some issues with the implementation of the rest though (see my comments on the other ticket).

@nkryptic
Copy link
Contributor

An alternative approach, which may be a better choice for supporting API style usage vs end-user form display usage, could be to offer two different classes. An API style filterset might use different filters as the defaults for model fields (e.g. a PositiveNumberFilter for a ForeignKey field).

By using the different filters, the primary benefit would be that the filter/field wouldn't populate choices for a select field and verify the value is in it (potentially expensive), but simply validate it is a number. It would be used in filtering the queryset, but it would mean no results if the value wasn't in the database and the query would be much lighter than populating all choices for the FK (sql + model instantiations).

All the extra work being done for form display purposes doesn't make a lot of sense when there's no user form interaction. It could be the API style FilterSet is the foundation and a display style FilterSet would inherit it (adding in display-specific functionality)

@apollo13
Copy link
Contributor

That would also work, but I am not sure if it's worth it. Let's summarize:

Strict mode:

  • The initial list can be empty cause some required fields are not set via initial.
  • If you submit the initial form you will get an error and see why it doesn't work.
  • If you submit a form with errors you will get zero results and an error message.
  • If you use stuff via an API you get zero results, the API could also serialize form.errors to give a bit of information

Lenient mode:

  • The initial list applies filters where possible, so you will usually show more fields than the filters would require.
  • If you submit the form you'd still get an form error asking you to fill in some stuff but you'd get the initial list back (eg with some of the valid filters applied)
  • If you use an API you might get more results than wanted and the API should provide form.errors to tell you something went wrong.

All in all I think strict/lenient parsing would be enough to cover all needs, aside from "heavier" validation in the sense of db queries. Can we go ahead with the "lighter" approach instead of adding more classes? I prefer the strict/lenient variant cause one might use the lenient variant even in the context of webforms like we already found out (eg amazon).

@apollo13
Copy link
Contributor

All the extra work being done for form display purposes doesn't make a lot of sense when there's no user form interaction.

I think APIs can and should provide validation and clear errors in many cases (even if it's just to make live easier for developers ;))

@nkryptic
Copy link
Contributor

@apollo13

Here are a few proof-of-concepts: https://gist.github.com/nkryptic/5275724

They address behavior you summarized above and also enforce ordering whenever a non-none queryset is returned. We'll need updates for the docs and tests to cover the enforced ordering and the strict/forgiving + bound/unbound details.

@apollo13
Copy link
Contributor

I like approach two, the only thing I'd change would be using a sentinel of object() -- a tuple seems somewhat confusing and a field might return a tuple (where it never can return the newly generated object). I suggest defaulting strict to False to keep current behavior.

Oh and another thing: The nested ifs in line 29ff and (probably) 38ff can be rewritten to just one:

if value is not sentinel: # Valid data
    qs = filter_.filter(qs, value)
else if self.strict: # invalid and in strict mode -> bail out
    self._qs = self.queryset.none()
    return self._qs

The issue with the nested if is that it's not immediately clear if the nesting can be removed by anding them together (you can't in that case).

@nkryptic
Copy link
Contributor

I updated PR #86 with approach number two and integrated your suggestions. I use a class-level attribute to set the default for strict (currently False), but then allow conditional assignment in the __init__ method. Does that work for you or is this a case where you'd want it the default set via FilterSet.Meta class?

@whitews
Copy link
Contributor Author

whitews commented Mar 30, 2013

@nkryptic Just looking through the Django REST Framework code, it looks like a Meta option may be preferable:

https://github.com/tomchristie/django-rest-framework/blob/master/rest_framework/filters.py#L46

I'll post to the DRF google group and see if I can get @tomchristie to weigh in.

@apollo13
Copy link
Contributor

@nkryptic Given that strict is something people usually don't change per FilterSet invocation I think Meta would suit better (especially if that's easier for REST Framework).

@apollo13
Copy link
Contributor

Btw, since strict=False by default, why did ordering change in your PR change?

@nkryptic
Copy link
Contributor

@apollo13 - the code in master doesn't order the queryset when unbound.

The PR introduces a few changes to the qs method:

  1. Always ensure order_by is called when the filtered result is not a none queryset. will use bound data or initial data as appropriate and then fall back to the first option from the order_by field's choices.
  2. Add strict mode which returns a none queryset on any validation error
  3. Always ensure the form has been validated and it's errors are populated when bound.

Items 1 and 3 introduce changes of the current default behavior - so are basically backwards-incompatible. If we keep them as is, we could also change strict to default to True, since we've already broken backwards compatibility.

@whitews
Copy link
Contributor Author

whitews commented Jul 15, 2013

@nkryptic @apollo13

Sorry to be a pest, but is there anyway this can get implemented? Anything I can do to help or test?

@philipdouglas
Copy link

I haven't figured out why, but I have found that explicitly defining the fields (rather than letting them be automatically created from a fields list), causes those fields to return empty instead of the full queryset when filtered by a non-existent key.

@whitews
Copy link
Contributor Author

whitews commented Jul 16, 2013

@FreakyDug

Hmm, I'm not seeing this. Perhaps it is because I'm explicitly specifying the filter as a ModelMultipleChoiceFilter. Can you post a gist of your solution? Thanks!

@philipdouglas
Copy link

So my FilterSet looks like this:

class GroupProjectRoleFilter(django_filters.FilterSet):
    project = django_filters.CharFilter(name='project__identifier')
    # For some reason explictly defining these fields fixes the default to all bug :S
    role = django_filters.CharFilter()
    group = django_filters.CharFilter()

    class Meta:
        model = GroupProjectRole
        fields = ['group', 'project', 'role']

Like this, I can filter any of the fields using an non existent key and it works. If I comment out any one of them, the exact same filter request will return everything.

@apollo13
Copy link
Contributor

@whitews Certainly; you could check the current state of the PR and see if it addresses everything etc, and if yes tell me and I'll have a look over it.

@whitews
Copy link
Contributor Author

whitews commented Jul 26, 2013

@FreakyDug I see, that works because you are using a CharFilter. By default, ForeignKey relationships use a ModelChoiceFilter which will raise a ValidationError on non-existent or non-integer values. Thanks, I may end up using this in the short term.

@apollo13 Ok, in the process of testing now.

@whitews
Copy link
Contributor Author

whitews commented Jul 26, 2013

@apollo13

For cases where the filter uses only a single field the PR works, however the problem persists when using a combination of filter fields where one of them is a ModelChoiceField. In this case, a non-existent FK will not exclude the results.

@apollo13
Copy link
Contributor

@whitews Patches welcome :)

@philipdouglas
Copy link

@whitews Ah, that explains it. Thanks.

@whitews
Copy link
Contributor Author

whitews commented Jul 29, 2013

@apollo13 I was wrong, pull request #86 is working correctly. I had installed the pull request but had not switched to the issue-84 branch. I just discovered this after debugging to investigate. My apologies.

Do you think this can be accepted? If so, what about defaulting to True as @nkryptic mentioned above?

@apollo13
Copy link
Contributor

I'll try to give it a last review this weekend

@apollo13
Copy link
Contributor

apollo13 commented Aug 4, 2013

@whitews So, finally managed to review this; I'd ask two things of you if you have a bit of time:

  • Docs
  • Try to get rid of the SENTINEL stuff and replace it with try/except, I have a feeling that this might look nicer.
  • One or two tests demonstrating the use of strict=True vs strict=False
  • I think we can switch strict to True by default and mention it in the release notes.

@whitews
Copy link
Contributor Author

whitews commented Aug 6, 2013

@apollo13 @nkryptic

Ok, working on it now. I'll comment in #86

@apollo13
Copy link
Contributor

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants