Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Unicode characters don't show up in list names #2120

Closed
soliloquist-tatoeba opened this issue Feb 3, 2020 · 3 comments · Fixed by #2475
Closed

Some Unicode characters don't show up in list names #2120

soliloquist-tatoeba opened this issue Feb 3, 2020 · 3 comments · Fixed by #2475
Assignees
Labels
bug Issue that describes a problem with a feature that doesn't work as expected.
Projects

Comments

@soliloquist-tatoeba
Copy link

soliloquist-tatoeba commented Feb 3, 2020

Some Unicode characters can't be used in list names while some others can be used. They all appear on the edit screen, but some of them don't show up and turn into question marks when the list is saved.

list

The ones that don't show up are from Unicode 6.0 and further versions. There seems to be no problem with previous versions. I noticed this when trying to create a list about basketball. Soccer ball and baseball show up fine, but basketball doesn't. This is not an important issue, but I just wanted to report it anyway.

@soliloquist-tatoeba soliloquist-tatoeba changed the title Some unicode characters don't show up in list names Some Unicode characters don't show up in list names Feb 3, 2020
@jiru jiru added the bug Issue that describes a problem with a feature that doesn't work as expected. label Feb 3, 2020
@AndiPersti
Copy link
Contributor

AndiPersti commented Feb 4, 2020

The reason is that we use utf8 for storing the list name in the database (*), but this character set only uses 3 bytes in MariaDB, i.e. we can only store code points up to FFFF. We would need to use utf8mb4 for storing higher code points.

(*) Actually that is just an educated guess, because I can't reproduce this problem in my dev environment. My sentences_lists table still uses the latin1 character set because it was created with the outdated docs/database/tables/sentences_lists.sql. But 4d05f00 changed the character set for the name column.

@ftumsh
Copy link
Contributor

ftumsh commented Jul 18, 2020

Having looked into this, I can confirm that changing sentences_lists.name to utf8mb4_general_ci fixes the problem.
How do I submit the SQL to change the column's collation?

@jiru
Copy link
Member

jiru commented Jul 18, 2020

@ftumsh Thanks for investigating! 😄 The best way is to create a migration if possible. First, execute cake migrations create FooBar (replace FooBar with an appropriate name) to generate the migration file. Then, edit the file to make it does what you want. Have a look at the migrations documentation.

@trang trang linked a pull request Aug 4, 2020 that will close this issue
@trang trang added this to To do in Kodoeba #1 via automation Aug 4, 2020
@trang trang closed this as completed Aug 4, 2020
Kodoeba #1 automation moved this from To do to Done Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue that describes a problem with a feature that doesn't work as expected.
Projects
Kodoeba #1
  
Done
Development

Successfully merging a pull request may close this issue.

5 participants