New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with sort order (UTF8 locales don't work) #216

Open
carlosfrodriguez opened this Issue Oct 6, 2014 · 30 comments

Comments

Projects
None yet
7 participants
@carlosfrodriguez

Hello,

The pg_config points to a local directory that does not exists

LOCALEDIR = /Applications/Postgres.app/Contents/Versions/9.3/share/locale

Can you include it on further versions?

((enjoy))
cr

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 8, 2014

Member

Could you elaborate what exactly you expect to find in this directory? Do you want localized server messages?

Member

jakob commented Oct 8, 2014

Could you elaborate what exactly you expect to find in this directory? Do you want localized server messages?

@carlosfrodriguez

This comment has been minimized.

Show comment
Hide comment
@carlosfrodriguez

carlosfrodriguez Oct 8, 2014

Hi @jakob

I'm sorry I'm not sure, Im having problems with locales in PostgreSQL and i saw that this directory is missing, so i tough that could be the reason.

((enjoy))
cr

Hi @jakob

I'm sorry I'm not sure, Im having problems with locales in PostgreSQL and i saw that this directory is missing, so i tough that could be the reason.

((enjoy))
cr

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 9, 2014

Member

I think this directory is only used for localized server messages.

Locales are provided by the system. PostgreSQL usually chooses your default locale when initialising the database cluster. You can choose a different locale by manually calling initdb:

initdb -D DATA_DIRECTORY -EUTF-8 --locale=XX_XX.UTF-8
Member

jakob commented Oct 9, 2014

I think this directory is only used for localized server messages.

Locales are provided by the system. PostgreSQL usually chooses your default locale when initialising the database cluster. You can choose a different locale by manually calling initdb:

initdb -D DATA_DIRECTORY -EUTF-8 --locale=XX_XX.UTF-8
@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 9, 2014

Member

You can type locale -a to list available locales on your system.

Member

jakob commented Oct 9, 2014

You can type locale -a to list available locales on your system.

@jakob jakob closed this Oct 9, 2014

@tbussmann

This comment has been minimized.

Show comment
Hide comment
@tbussmann

tbussmann Oct 9, 2014

Im having problems with locales in PostgreSQL

what kind of problems do you have? If it's about a strange ordering, like I encountered, you will have to realize that collations will not work on any BSD-ish OS (incl. OSX) for an UTF8 encoding. You will never get the sorting right, unfortunately.

Im having problems with locales in PostgreSQL

what kind of problems do you have? If it's about a strange ordering, like I encountered, you will have to realize that collations will not work on any BSD-ish OS (incl. OSX) for an UTF8 encoding. You will never get the sorting right, unfortunately.

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 9, 2014

Member

@tbussmann I just tried this, and I am shocked, but it seems you are right. German strings are sorted incorrectly with a locale setting of de_DE.UTF-8. The only way I could get PostgreSQL to sort the strings correctly, was to create a database that uses latin1 as character encoding:

create database latin1_test WITH encoding 'LATIN1' LC_COLLATE='de_DE.ISO8859-1' LC_CTYPE='de_DE.ISO8859-1' TEMPLATE=template0

Then the strings are sorted correctly. However, using latin1, it is no longer possible to include foreign characters in the table... this is ridiculous. Is this behaviour documented anywhere?

Member

jakob commented Oct 9, 2014

@tbussmann I just tried this, and I am shocked, but it seems you are right. German strings are sorted incorrectly with a locale setting of de_DE.UTF-8. The only way I could get PostgreSQL to sort the strings correctly, was to create a database that uses latin1 as character encoding:

create database latin1_test WITH encoding 'LATIN1' LC_COLLATE='de_DE.ISO8859-1' LC_CTYPE='de_DE.ISO8859-1' TEMPLATE=template0

Then the strings are sorted correctly. However, using latin1, it is no longer possible to include foreign characters in the table... this is ridiculous. Is this behaviour documented anywhere?

@jakob jakob reopened this Oct 9, 2014

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 9, 2014

Member

And I also just found out that the sort command in the Terminal has the same problem. Does this mean that BSD systems don't support sorting UTF-8 strings at all?

Member

jakob commented Oct 9, 2014

And I also just found out that the sort command in the Terminal has the same problem. Does this mean that BSD systems don't support sorting UTF-8 strings at all?

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 9, 2014

Member

The solution seems to be that PostgreSQL should use ICU rather the BSD functions for sorting. See this wiki page: https://wiki.postgresql.org/wiki/Todo:ICU

Possibly I can build PostgreSQL using a patch made by Palle Girgenson for FreeBSD, but I'm not sure how well-supported this patch is:
http://people.freebsd.org/~girgen/postgresql-icu/README.html

Member

jakob commented Oct 9, 2014

The solution seems to be that PostgreSQL should use ICU rather the BSD functions for sorting. See this wiki page: https://wiki.postgresql.org/wiki/Todo:ICU

Possibly I can build PostgreSQL using a patch made by Palle Girgenson for FreeBSD, but I'm not sure how well-supported this patch is:
http://people.freebsd.org/~girgen/postgresql-icu/README.html

@tbussmann

This comment has been minimized.

Show comment
Hide comment
@tbussmann

tbussmann Oct 9, 2014

you can see the reason for this with ls -l /usr/share/locale/de_DE.UTF-8 you see that LC_COLLATE only symlinks to la_LN.US-ASCII. You get the same if you sort something on the shell, so it's a OS specific, not PG specific problem. AFAIR this affects all BSD OS.
Some ML posts of Tom Lane in that topic:
http://www.postgresql.org/message-id/16510.1263450305@sss.pgh.pa.us
http://www.postgresql.org/message-id/22721.1264203310@sss.pgh.pa.us
http://www.postgresql.org/message-id/23053.1337036410@sss.pgh.pa.us
It seems we will have to wait for http://wiki.postgresql.org/wiki/Todo:ICU , use a different OS or use a function for sorting, that can internally use ICU form pl/perl, unaccent contrib, or your own implementation.

To me this is the biggest drawback of using PostgresApp (PostgreSQL on OSX in general) and something that should be clearly highlighted in the documentation.

you can see the reason for this with ls -l /usr/share/locale/de_DE.UTF-8 you see that LC_COLLATE only symlinks to la_LN.US-ASCII. You get the same if you sort something on the shell, so it's a OS specific, not PG specific problem. AFAIR this affects all BSD OS.
Some ML posts of Tom Lane in that topic:
http://www.postgresql.org/message-id/16510.1263450305@sss.pgh.pa.us
http://www.postgresql.org/message-id/22721.1264203310@sss.pgh.pa.us
http://www.postgresql.org/message-id/23053.1337036410@sss.pgh.pa.us
It seems we will have to wait for http://wiki.postgresql.org/wiki/Todo:ICU , use a different OS or use a function for sorting, that can internally use ICU form pl/perl, unaccent contrib, or your own implementation.

To me this is the biggest drawback of using PostgresApp (PostgreSQL on OSX in general) and something that should be clearly highlighted in the documentation.

@jakob jakob changed the title from Locale dir to Problems with sort order (UTF8 locales don't work) Oct 9, 2014

@tbussmann

This comment has been minimized.

Show comment
Hide comment
@tbussmann

tbussmann Oct 9, 2014

funny, you were quicker while I was still googleing and typing :)
didn't know about the FreeBSD patch existing. Would be great if you can give it a try!

funny, you were quicker while I was still googleing and typing :)
didn't know about the FreeBSD patch existing. Would be great if you can give it a try!

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 9, 2014

Member

Thanks for your explanations. I'll try building with the patch. Don't have time right now, but I'll post here when I know more.

Member

jakob commented Oct 9, 2014

Thanks for your explanations. I'll try building with the patch. Don't have time right now, but I'll post here when I know more.

@macarthy

This comment has been minimized.

Show comment
Hide comment
@macarthy

macarthy Oct 29, 2014

So if I want to add thai TH locale to do Thai string sorting in postgresapp, I need to get the TH locale installed to OSX, so that is listed in locale -a

Is that correct?

So if I want to add thai TH locale to do Thai string sorting in postgresapp, I need to get the TH locale installed to OSX, so that is listed in locale -a

Is that correct?

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Oct 29, 2014

Member

@macarthy in principle, yes. You'd just need to create a new database with the collation. However, as a word of caution, UTF-8 locales seem to be fundamentally broken on OSX. Postgres uses the strcoll API, which unfortunately does not support multibyte encodings on OSX.

Member

jakob commented Oct 29, 2014

@macarthy in principle, yes. You'd just need to create a new database with the collation. However, as a word of caution, UTF-8 locales seem to be fundamentally broken on OSX. Postgres uses the strcoll API, which unfortunately does not support multibyte encodings on OSX.

@macarthy

This comment has been minimized.

Show comment
Hide comment
@macarthy

macarthy Oct 31, 2014

Looks like strcoll is a mess on OsX. Thanks for your work.

Looks like strcoll is a mess on OsX. Thanks for your work.

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Nov 23, 2014

Member

I have built the 9.4 release candidate with the patch from Palle Girgensohn. I also had to bundle ICU, which is rather large, but now sorting seems to work as expected!

If you encountered issues with text sorting, please download the latest prerelease and see if it works: https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1

I'd really appreciate feedback; if everything works I'll include the patch in the final release of Postgresapp 9.4.

Member

jakob commented Nov 23, 2014

I have built the 9.4 release candidate with the patch from Palle Girgensohn. I also had to bundle ICU, which is rather large, but now sorting seems to work as expected!

If you encountered issues with text sorting, please download the latest prerelease and see if it works: https://github.com/PostgresApp/PostgresApp/releases/tag/9.4rc1

I'd really appreciate feedback; if everything works I'll include the patch in the final release of Postgresapp 9.4.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 23, 2014

Wondering; is the built-in version of PostgreSQL on OS X also patched with ICU? Perhaps a ticket should be created in the OS X issue tracker?

Wondering; is the built-in version of PostgreSQL on OS X also patched with ICU? Perhaps a ticket should be created in the OS X issue tracker?

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Nov 24, 2014

Member

After reading this blog post i came to the conclusion that Apple doesn't care the least about their PostgreSQL installation. Feel free to file a radar, but I don't feel like wasting my time reporting issues that will probably be ignored.

You can check if PostgreSQL is build with the ICU patch by running pg_config (look for the --with-icu flag in the configure parameters).

Or you can try sorting text. With my locale (de_AT.UTF-8) I run the following query:

SELECT unnest(array['a','ä','b']) ORDER BY 1;

On a broken system, this will return

a
b
ä

With ICU it will be sorted correctly:

a
ä
b
Member

jakob commented Nov 24, 2014

After reading this blog post i came to the conclusion that Apple doesn't care the least about their PostgreSQL installation. Feel free to file a radar, but I don't feel like wasting my time reporting issues that will probably be ignored.

You can check if PostgreSQL is build with the ICU patch by running pg_config (look for the --with-icu flag in the configure parameters).

Or you can try sorting text. With my locale (de_AT.UTF-8) I run the following query:

SELECT unnest(array['a','ä','b']) ORDER BY 1;

On a broken system, this will return

a
b
ä

With ICU it will be sorted correctly:

a
ä
b
@macarthy

This comment has been minimized.

Show comment
Hide comment
@macarthy

macarthy Nov 24, 2014

Quick question. if I rename this build PostgresWithICU.app, and run it it
will use the previous I was using with Postgres.app DBs etc ?
Or do I need to move the data dirs etc.

I'd like to try this out with Thai and Lao sorting.

Thanks

On 24 November 2014 at 14:09, Jakob Egger notifications@github.com wrote:

After reading this blog post
http://blog.2ndquadrant.com/ware-yosemite-possible-postgresql-upgrade-issues-os-x-10-10/
i coame to the conclusion that Apple doesn't care the least about their
PostgreSQL installation. Feel free to file a radar, but I don't feel like
wasting my time reporting issues that will probably be ignored.

You can check if PostgreSQL is build with the ICU patch by running
pg_config (look for the --with-icu flag in the configure parameters).

Or you can try sorting text. With my locale (de_AT.UTF-8) I run the
following query:

SELECT unnest(array['a','ä','b']) order by 1;

On a broken system, this will return

a
b
ä

With ICU it will be sorted correctly:

a
ä
b

Reply to this email directly or view it on GitHub
#216 (comment)
.

Quick question. if I rename this build PostgresWithICU.app, and run it it
will use the previous I was using with Postgres.app DBs etc ?
Or do I need to move the data dirs etc.

I'd like to try this out with Thai and Lao sorting.

Thanks

On 24 November 2014 at 14:09, Jakob Egger notifications@github.com wrote:

After reading this blog post
http://blog.2ndquadrant.com/ware-yosemite-possible-postgresql-upgrade-issues-os-x-10-10/
i coame to the conclusion that Apple doesn't care the least about their
PostgreSQL installation. Feel free to file a radar, but I don't feel like
wasting my time reporting issues that will probably be ignored.

You can check if PostgreSQL is build with the ICU patch by running
pg_config (look for the --with-icu flag in the configure parameters).

Or you can try sorting text. With my locale (de_AT.UTF-8) I run the
following query:

SELECT unnest(array['a','ä','b']) order by 1;

On a broken system, this will return

a
b
ä

With ICU it will be sorted correctly:

a
ä
b

Reply to this email directly or view it on GitHub
#216 (comment)
.

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Nov 24, 2014

Member

Postgres.app always needs to be named "Postgres.app" otherwise it won't find the shared libraries. However, if you start it, it will offer to rename the old app automatically.

You should really dump & restore the database because all the indices will be incorrect due to the changed sort order. If you've previously been using 9.3, you have to do that anyway, and you can use both versions in parallel, the new version will create a new database for 9.4 in a separate directory.

If you've already been on one of the 9.4 betas, you'll have to rename the data directory before switching Postgres.app.

Member

jakob commented Nov 24, 2014

Postgres.app always needs to be named "Postgres.app" otherwise it won't find the shared libraries. However, if you start it, it will offer to rename the old app automatically.

You should really dump & restore the database because all the indices will be incorrect due to the changed sort order. If you've previously been using 9.3, you have to do that anyway, and you can use both versions in parallel, the new version will create a new database for 9.4 in a separate directory.

If you've already been on one of the 9.4 betas, you'll have to rename the data directory before switching Postgres.app.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 24, 2014

After reading this blog post i came to the conclusion that Apple doesn't care the least about their PostgreSQL installation. Feel free to file a radar, but I don't feel like wasting my time reporting issues that will probably be ignored.

Thanks for that link, that's quite nasty (even though it wasn't a "supported" approach). And agreed, activity on the developer-forums is quite low and little to no response from Apple employees.

After reading this blog post i came to the conclusion that Apple doesn't care the least about their PostgreSQL installation. Feel free to file a radar, but I don't feel like wasting my time reporting issues that will probably be ignored.

Thanks for that link, that's quite nasty (even though it wasn't a "supported" approach). And agreed, activity on the developer-forums is quite low and little to no response from Apple employees.

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Nov 26, 2014

Member

Unfortunately including this patch isn't as straightforward a decision as I thought. I'd like to invite anybody affected by this issue to discuss whether to include the patch in the 9.4 release in issue #233.

@macarthy any feedback yet on how the new build works with thai sorting?

Member

jakob commented Nov 26, 2014

Unfortunately including this patch isn't as straightforward a decision as I thought. I'd like to invite anybody affected by this issue to discuss whether to include the patch in the 9.4 release in issue #233.

@macarthy any feedback yet on how the new build works with thai sorting?

@Chocksy

This comment has been minimized.

Show comment
Hide comment
@Chocksy

Chocksy May 20, 2015

@jakob any solution to this using the stable version instead of the release candidate??

Chocksy commented May 20, 2015

@jakob any solution to this using the stable version instead of the release candidate??

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob May 21, 2015

Member

I'm just preparing the 9.4.2 builds. I'll see if I can get a build with ICU working.

Member

jakob commented May 21, 2015

I'm just preparing the 9.4.2 builds. I'll see if I can get a build with ICU working.

@Chocksy

This comment has been minimized.

Show comment
Hide comment

Chocksy commented May 21, 2015

Thanks @jakob

@orbatec

This comment has been minimized.

Show comment
Hide comment
@orbatec

orbatec Nov 20, 2015

I don't quite understand why the correct order should be a ä b .... when I look at the codepoint values in UTF-8 for those characters, I see the values 97 228 98 .... Sorting them in ascending order results in 97 98 228 or a b ä

What am I missing?

--Quote--
Or you can try sorting text. With my locale (de_AT.UTF-8) I run the following query:

SELECT unnest(array['a','ä','b']) ORDER BY 1;
On a broken system, this will return

a
b
ä
With ICU it will be sorted correctly:

a
ä
b
--End Quote--

orbatec commented Nov 20, 2015

I don't quite understand why the correct order should be a ä b .... when I look at the codepoint values in UTF-8 for those characters, I see the values 97 228 98 .... Sorting them in ascending order results in 97 98 228 or a b ä

What am I missing?

--Quote--
Or you can try sorting text. With my locale (de_AT.UTF-8) I run the following query:

SELECT unnest(array['a','ä','b']) ORDER BY 1;
On a broken system, this will return

a
b
ä
With ICU it will be sorted correctly:

a
ä
b
--End Quote--

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Nov 20, 2015

Member

@orbatec What you describe is the expected behavior when using the POSIX locale. The locale defines how text should be sorted. In the Austrian locale, the correct sort order is a ä b. However, on OS X, strcoll is broken and therefore PostgreSQL ignores the locale and just sorts the strings by comparing their UTF-8 representation (ie. it effectively always uses the POSIX locale).

Also, the codepoint 228 is just one possible representation of ä. You could also represent it as 97 776, which is an a followed by the combining character ¨. These two representations would be sorted differently in the POSIX locale, but for the Austrian locale it shouldn't matter which codepoints are chosen to represent a letter.

Please also note that there is a difference between unicode code points and their encoding (eg. UTF-8).

Text handling is a very complex topic. If you are interested, I recommend to spend a few days reading about Unicode, code points, characters, graphemes, encodings, normal forms and collations. It will change the way you think about text.

Member

jakob commented Nov 20, 2015

@orbatec What you describe is the expected behavior when using the POSIX locale. The locale defines how text should be sorted. In the Austrian locale, the correct sort order is a ä b. However, on OS X, strcoll is broken and therefore PostgreSQL ignores the locale and just sorts the strings by comparing their UTF-8 representation (ie. it effectively always uses the POSIX locale).

Also, the codepoint 228 is just one possible representation of ä. You could also represent it as 97 776, which is an a followed by the combining character ¨. These two representations would be sorted differently in the POSIX locale, but for the Austrian locale it shouldn't matter which codepoints are chosen to represent a letter.

Please also note that there is a difference between unicode code points and their encoding (eg. UTF-8).

Text handling is a very complex topic. If you are interested, I recommend to spend a few days reading about Unicode, code points, characters, graphemes, encodings, normal forms and collations. It will change the way you think about text.

@orbatec

This comment has been minimized.

Show comment
Hide comment
@orbatec

orbatec Nov 20, 2015

@jakob Thanks! Yep...just spent an hour reading up on this and by god...why do we always make things so complex :-) Unfortunately I am working on Mac with Postgres 9.4 ... my colleague has found a "patch" for 9.4 (acually rc01 or rc02) that seems to fix the behaviour. How come an rcxx fixes something that doesn't seem fixed in the final postgres? (sorry, when I see fix, I mean workaround for a mac os problem)

orbatec commented Nov 20, 2015

@jakob Thanks! Yep...just spent an hour reading up on this and by god...why do we always make things so complex :-) Unfortunately I am working on Mac with Postgres 9.4 ... my colleague has found a "patch" for 9.4 (acually rc01 or rc02) that seems to fix the behaviour. How come an rcxx fixes something that doesn't seem fixed in the final postgres? (sorry, when I see fix, I mean workaround for a mac os problem)

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Nov 20, 2015

Member

@orbatec There has been a lot of discussion about the ICU patch here on Github and on the PostgreSQL hackers mailing list. The conclusion was that I didn't want the default build of PostgreSQL to use an unofficial patch. Including separate builds of PostgreSQL with ICU is a lot of work, and also dangerous, since people switching between the two builds might corrupt their database.

There has been discussion on the mailing list about adding official support for ICU in 9.6; I hope this will fix the issue for good.

Member

jakob commented Nov 20, 2015

@orbatec There has been a lot of discussion about the ICU patch here on Github and on the PostgreSQL hackers mailing list. The conclusion was that I didn't want the default build of PostgreSQL to use an unofficial patch. Including separate builds of PostgreSQL with ICU is a lot of work, and also dangerous, since people switching between the two builds might corrupt their database.

There has been discussion on the mailing list about adding official support for ICU in 9.6; I hope this will fix the issue for good.

@tbussmann

This comment has been minimized.

Show comment
Hide comment
@tbussmann

tbussmann Jul 26, 2017

now that PostgreSQL 10 is approaching, I'd like to get back to this topic.

In the recent cycle patches landed upstream which support icu as an alternative collation provider (thus different from the FreeBSD patch discussed and experimented with previously in 9.4rc01/02). I have done continuous tests with Git master and beta builds with active ICU support on macOS. As far as I see Postgres.app 2.1beta1 does also include this. My tests were quite promising. Unfortunately at the current state the icu-provided collations cannot be used as cluster or database default collation (LC_COLLATE). So, to make use of it, users of PostgresApp either need to attach a COLLATE clause on the fly to each comparison and order statement, or for a easier use to columns, indexes or domains. To my tests this works nicely. With the example from above:

# SELECT unnest(array['a','ä','b']) COLLATE "de-AT-u-co-phonebk-x-icu" ORDER BY 1;
┌────────┐
│ unnest │
├────────┤
│ a      │
│ ä      │
│ b      │
└────────┘

To make this a bit more clear, I'd recommend to add a note in documentation and I'd consider to explicitly stating the missing default collation with a --lc-collate=C in initdb in initDatabaseSync which is actually is:

$ ls -l /usr/share/locale/de_AT.UTF-8/LC_COLLATE
lrwxr-xr-x  1 root  wheel  28 27 Jan 22:48 /usr/share/locale/de_AT.UTF-8/LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE

this would likely avoid further confusion and would provide a consistent behaviour again in case a database is moved to a different system supporting UTF8 libc collations.

now that PostgreSQL 10 is approaching, I'd like to get back to this topic.

In the recent cycle patches landed upstream which support icu as an alternative collation provider (thus different from the FreeBSD patch discussed and experimented with previously in 9.4rc01/02). I have done continuous tests with Git master and beta builds with active ICU support on macOS. As far as I see Postgres.app 2.1beta1 does also include this. My tests were quite promising. Unfortunately at the current state the icu-provided collations cannot be used as cluster or database default collation (LC_COLLATE). So, to make use of it, users of PostgresApp either need to attach a COLLATE clause on the fly to each comparison and order statement, or for a easier use to columns, indexes or domains. To my tests this works nicely. With the example from above:

# SELECT unnest(array['a','ä','b']) COLLATE "de-AT-u-co-phonebk-x-icu" ORDER BY 1;
┌────────┐
│ unnest │
├────────┤
│ a      │
│ ä      │
│ b      │
└────────┘

To make this a bit more clear, I'd recommend to add a note in documentation and I'd consider to explicitly stating the missing default collation with a --lc-collate=C in initdb in initDatabaseSync which is actually is:

$ ls -l /usr/share/locale/de_AT.UTF-8/LC_COLLATE
lrwxr-xr-x  1 root  wheel  28 27 Jan 22:48 /usr/share/locale/de_AT.UTF-8/LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE

this would likely avoid further confusion and would provide a consistent behaviour again in case a database is moved to a different system supporting UTF8 libc collations.

@jakob

This comment has been minimized.

Show comment
Hide comment
@jakob

jakob Jul 26, 2017

Member

Ugh — I didn’t know that using ICU collations as default isn’t possible yet – that’s a bummer.

Using –lc-collate=C is a nice idea to get consistent behavior when migrating the db to a different system. And maybe when people look at that setting, it becomes a bit mire obvious that they need to change something if they want sensible sorting.

Member

jakob commented Jul 26, 2017

Ugh — I didn’t know that using ICU collations as default isn’t possible yet – that’s a bummer.

Using –lc-collate=C is a nice idea to get consistent behavior when migrating the db to a different system. And maybe when people look at that setting, it becomes a bit mire obvious that they need to change something if they want sensible sorting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment