Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Alphabetical ordering is not consistent between DBs #6104

Closed
ErisDS opened this issue Nov 20, 2015 · 10 comments
Closed

Bug: Alphabetical ordering is not consistent between DBs #6104

ErisDS opened this issue Nov 20, 2015 · 10 comments
Labels
affects:api Affects the Ghost API bug [triage] something behaving unexpectedly server / core Issues relating to the server or core of Ghost

Comments

@ErisDS
Copy link
Member

ErisDS commented Nov 20, 2015

Take the following strings: 'photo', 'Audio', 'Video'.
Note that the first one is not capitalised, but the other two are.

When ordering by names in SQL this can be done in either a case sensitive or case insensitive way:

Case sensitive order is:

  • Audio
  • Video
  • photo

This is used by MySQL and Postgres by default

Case insensitive order is:

  • Audio
  • photo
  • Video

This is used by Sqlite3 by default

I believe that the latter is the desired output in most cases, and that case sensitive ordering is not the expected outcome. I don't have much to base this on, other than common sense - the case of a letter doesn't really change what letter that is, so if I'm searching for something in an alphabetical list, I look for A and a in the same place.

This is slightly problematic for two reasons:

  1. sqlite3 (e.g. the development environment) does the right thing - or at least does what I was expecting, whereas MySQL & pg do not. It is likely other developers using the API will fall into the same trap I did - thinking it does what you expect in one env and not realising the result will be different in other databases. They really should be consistent 😊
  2. To force MySQL/PG to do an insensitive order by, we need to pass order by LOWER(name) ASC. We could do this by default for all orders, or figure out which orders are strings and need to be made case insensitive. However, if we do this by default, there's no obvious/well understood way to turn it back off.

Solutions

I don't have a clear idea what the best way to solve this is, but I have a couple of ideas:

Idea 1: make sqlite3 match mysql & postgres

By default force 'COLLATE NOCASE' for SQLite so that all 3 dbs behave the same and use case sensitive ordering. All resources have a slug as well as a name, which could be used for ordering by something case insensitive. Not ideal, but simple-ish.

Long term, we can support adding lower and upper functions to our order syntax.

Idea 2: sensible defaults are best

By default, force case insensitive matches because this is what makes sense to anyone who isn't a computer scientist 😁. If almost noone wants case sensitive alphabetical order, then it makes more sense to make it harder to do that, rather than making the sensible thing hard.

So, if all of our queries go off with LOWER automatically wrapped around attributes which are known to be test/strings, we need to create a way for a user to say 'don't do case insensitive matches'.

A couple of different ideas for that are:

  • Use a MySQL function

Assuming we add support for all the mysql functions that could be used in an order statement, and that specifying one would override our default application of LOWER() then we can provide for using a mysql function which doesn't change the case.

Unfortunately there is no MySQL function which means 'original case', but using a no-op function like TRIM() would work because it would get used instead of LOWER() and would not change the string. That's a bit unintuitive though, so we could also just make up our own MySQL function NOCASE() or whatever we want to call it, and use that to mean "don't apply lower but don't apply this either cos it's nonsense :)".

  • Use a character or symbol to specify 'case sensitive'

Something like order="^name ASC" or order="!name ASC" could be used to indicate that case sensitivity should be used.

  • Use the case of the property

order="name ASC" - does a case insensitive order using the LOWER() function
order="NAME ASC" - does a case insensitive order using the UPPER() function
order="Name ASC" - does a case sensitive order without applying a function

  • Do something else?

I'd love to hear more ideas

Conclusion

I am very much leaning towards 'sensible defaults are best' I think doing a lower case ordering is the only thing that actually makes any sense, which means figuring out how to turn it off is all we're left with.

I'm genuinely not sure what the best mechanism for specifying turning it off would be. Using the case of the property seems elegant but a bit side-effect-y.

@ErisDS ErisDS added bug [triage] something behaving unexpectedly affects:api Affects the Ghost API labels Nov 20, 2015
@kevinansfield
Copy link
Contributor

May not help but at least for postgres it sounds like the default collation is OS-dependent? http://dba.stackexchange.com/questions/106964/why-is-my-postgresql-order-by-case-insensitive

@kevinansfield
Copy link
Contributor

One workaround I've seen before is to add a virtual column to the select (e.g. SELECT posts.*, LOWER(name) AS ci_name) so that it's always possible to order by name or ci_name.

I've seen the ci_x columns implemented both through the select as above and as an actual column that's populated either through database triggers or by the applications data layer.

One downside is that postgres from what I can tell will still use the OS' collation so case-sensitive ordering will still be ABab on OSX, AaBb on other systems.

@hwhelchel
Copy link

Postgres has an extension called citext which provides a case insensitive string type.

Benefits:

  • Don't have to remember to call LOWER()
  • Easier to create indexes for. One doesn't have to create functional indexes using LOWER()
  • Allows a primary key to be case insensitive

Limitations:

  • Not as efficient as text but more efficient than LOWER()
  • citext behavior is a function of the LC_CTYPE of the database. From postgres docs:

If you have data in different languages stored in your database, users of one language may find their query results are not as expected if the collation is for another language.

http://www.postgresql.org/docs/9.4/static/citext.html

I have less experience with MySQL but it appears that nonbinary strings are case insensitive and binary strings are case sensitive. So ensuring the nonbinary string types are used would give the desired behavior.

http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html

@ErisDS
Copy link
Member Author

ErisDS commented Nov 30, 2015

One workaround I've seen before is to add a virtual column to the select (e.g. SELECT posts.*, LOWER(name) AS ci_name) so that it's always possible to order by name or ci_name.

This is an interesting idea, Bookshelf has some support for calculated columns. The trick would be only adding the column when an order by was requested (else you need duplicates for all alpha columns). Should be doable.

@hwhelchel Do you have any idea how to turn on that postgres extension within our bookshelf/knex/pg layers? Converting collations in MySQL is something we need to do in order to fix #5945, however this is a pretty hefty task. Your suggestions sound like the 'right' way to do things but are possibly going to be hard to manage because Ghost is distributed amongst so many different environments, databases and database versions.

@hwhelchel
Copy link

@ErisDS this would be done at the knex layer using the raw api

knex/knex#208

Hmm yes postgres citext would works in 9.1+. I'll have to take a look at the Ghost db layer to understand more what the challenges are.

You might also find this issue on knex with multiple databases relevant: knex/knex#778

@ErisDS ErisDS added the later [triage] Things we intend to work but are not immediate priority label Sep 20, 2016
@ErisDS
Copy link
Member Author

ErisDS commented Sep 20, 2016

I'm closing all OAuth and most API issues temporarily with the later label.

RE: OAuth, for the next 2-3 months we'll be implementing an official Ghost OAuth login system, providing global access to all Ghost blogs with a single login. We'll be opening issues around this system soon, and I don't want to cause confusion with OAuth for the API.

JSON API Overhaul & OAuth access are currently scheduled next on the roadmap

@ErisDS ErisDS closed this as completed Sep 20, 2016
@naz naz reopened this Jan 15, 2019
@naz naz added server / core Issues relating to the server or core of Ghost and removed later [triage] Things we intend to work but are not immediate priority labels Jan 15, 2019
@naz
Copy link
Member

naz commented Jan 15, 2019

As discussed in #10371 (comment), we will try the approach with overriding standard Bookshelf/knex order options with our own orderRaw which will use order by lower(FIELD) DIRECTION clause to ensure ordering consistency between dbs.

@naz naz self-assigned this Jan 15, 2019
naz added a commit that referenced this issue Jan 15, 2019
refs #10354

- Added ordering on input serialization layer for posts, pages, tags, and authors
- At the moment ordering is dependent on DB engine which will be resolved with #6104
@naz naz removed their assignment Apr 19, 2019
@stale
Copy link

stale bot commented Jul 18, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale [triage] Issues that were closed to to lack of traction label Jul 18, 2019
@stale stale bot closed this as completed Jul 25, 2019
naz added a commit that referenced this issue Aug 31, 2020
refs #12167

- The reason for failing tests was  #6104 . As we already test for ordering elsewhere, didn't include more "if/else" logic in tests just for the sake of passing. Refactored them to be order-independent instead
@naz naz added pinned [triage] Ignored by stalebot and removed stale [triage] Issues that were closed to to lack of traction labels Sep 20, 2020
@naz
Copy link
Member

naz commented Sep 20, 2020

This problem still exists and should not have been closed by stale-bot.

@naz naz reopened this Sep 20, 2020
@ErisDS ErisDS removed the pinned [triage] Ignored by stalebot label Jan 27, 2021
@ErisDS
Copy link
Member Author

ErisDS commented Jan 27, 2021

Closing as this hasn't really impacted anyone.

@ErisDS ErisDS closed this as completed Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects:api Affects the Ghost API bug [triage] something behaving unexpectedly server / core Issues relating to the server or core of Ghost
Projects
None yet
Development

No branches or pull requests

4 participants