Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort instances query stably, if applicable to underlying DB implementation #114

Merged

Conversation

boydgreenfield
Copy link
Contributor

@boydgreenfield boydgreenfield commented Jan 31, 2017

I'm not 100% thrilled by the naming/implementation here (happy to take suggestions @lyschoening), but the current Flask-Potion code can lead to odd bugs for databases that do not guarantee the return order for queries without an ORDER BY clause (e.g., Postgres).

Specifically, we were seeing issues where we'd get duplicate records in the paginated results. The underlying issue is (depending on one's perspective) either a bug or (under/mis)documentation in Flask-SQLAlchemy's paginate method on the query.

This PR fixes this for the RelationalManager by ordering by the primary key if no other order by clause is provided.

Note that it may be necessary to also add the primary key to the order clause in the general _query_order_by call for Postgres. I'll try to investigate this more tomorrow Pacific time.

@coveralls
Copy link

coveralls commented Jan 31, 2017

Coverage Status

Coverage decreased (-0.01%) to 89.082% when pulling b1b7630 on boydgreenfield:fix_unstable_pagination into d106bd0 on biosustain:master.

@lyschoening
Copy link
Contributor

lyschoening commented Jan 31, 2017

Thanks for bringing this up. This is a valid problem and it needs a fix. There are multiple issues to consider though;

Using SQLAlchemy, the user can already specify an order with order_by as a mapper arg. Perhaps it is possible to read that in if no other value is specified?

Inspection of the model with every request is not the right approach. The sort clause should be cached in SQLAlchemyManager._init_model(). In fact the primary key is already cached as SQLAlchemyManager.id_column.

We could also reuse Meta.sort_attribute, which is already used when sorting by a relationship (e.g. sort books by authors, authors being sorted by last name). Possibly cache this as SQLAlchemyManager.sort_column as is done with id_attribute and id_column. Of course if sort_attribute is not unique then this would fail to resolve the issue.

Finally, there is the underlying issue:

Specifically, we were seeing issues where we'd get duplicate records in the paginated results.

Your fix would address pagination without sort, but it does not yet address that issue when a user retrieves items in sorted order. The user might sort by a column that is not unique and then the sorted items would not be stable.

A solution could be to have Manager._query_order_by(query, sort) always order by the primary key (in last position) unless it is already specified in sort. Then there is also no need for Manager. _query_order_by_stable(), as that would be equivalent to calling Manager._query_order_by() with an empty sort.

@boydgreenfield
Copy link
Contributor Author

@lyschoening Agreed that this is a partial/WIP fix, and I like your proposed approach (I just wanted to get something up before leaving for the day yesterday).

Let me take a look at: (1) modifying the _query_order_by clause to handle being called without a sort; and (2) getting the primary key into all order by clauses to address the issue of sorting by a non-unique column.

Will push here later today hopefully (jamming a few things in before heading off on vacation tomorrow unfortunately).

@coveralls
Copy link

coveralls commented Jan 31, 2017

Coverage Status

Coverage decreased (-0.02%) to 89.077% when pulling f585e2f on boydgreenfield:fix_unstable_pagination into d106bd0 on biosustain:master.

@coveralls
Copy link

coveralls commented Jan 31, 2017

Coverage Status

Coverage decreased (-0.02%) to 89.073% when pulling 21c52db on boydgreenfield:fix_unstable_pagination into d106bd0 on biosustain:master.

@boydgreenfield
Copy link
Contributor Author

@lyschoening I've updated my PR per your suggestions. LMK if this works for you (I can take a look at adding a test case for sort stability, but may not be able to get to that before taking off for a ~week).

@lyschoening
Copy link
Contributor

@boydgreenfield I'll have a look at it soon. It's evening here, so I won't be able to get back to you before you leave. Have a good vacation!

@lyschoening lyschoening merged commit 9e1e4f1 into biosustain:master Feb 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants