Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change mysql collation and charset from utf8 -> utf8mb4 #183

Merged
merged 4 commits into from
Jun 17, 2021

Conversation

Charl1996
Copy link
Contributor

@Charl1996 Charl1996 commented May 31, 2021

The test passes locally with my changes, but for some reason I'm inclined to feel that this PR still needs a migration to alter the existing MySQL columns' CHARACTER SET and COLLATION settings to utf8mb4 and utf8mb4_unicode_ci respectively...or is my hunch wrong?

See ticket

@snopoke
Copy link
Contributor

snopoke commented May 31, 2021

I think it depends what you're desired outcome is - should this only apply to new columns or also to existing.

Please update the PR with some more info about this change and what the goal is.

@Charl1996
Copy link
Contributor Author

@snopoke I forgot to link the ticket. I updated the initial comment.

The idea is essentially just to be able to save emoji's in MySQL, but because MySQL's default utf8 only allows storage of up to a maximum of 3 bytes I needed to update it to use utf8mb4 which allows storage of up to 4 bytes. I found this StackOverflow answer quite helpful.

I guess the idea is to apply it to existing columns also, since the issue came up when a partner reported it (see ticket for more info).

@Charl1996
Copy link
Contributor Author

@snopoke Ok, so I had a chance now to look at this again. Turns out my understanding of what was happening was wrong, so please ignore my previous comment. I don't need any migrations - only need to change the collation for future tables since the tables will be created as the export happens, so that's fine.

QA:

  • Tested locally by doing export
  • Unit test (which @proteusvacuum wrote, I just used it)

@proteusvacuum
Copy link
Contributor

Do you know if that collation works for all versions of MySQL? Also, we suggest mysql+pymysql://<username>:<password>@<host>/<database name>?charset=utf8 as the connection string for mysql. I'm guessing that needs to be changed in the docs?

@Charl1996
Copy link
Contributor Author

Charl1996 commented Jun 8, 2021

@proteusvacuum Yes the docs should also update, but I want to update it only after this has been merged.
Just a note: there are some additional steps a user might have to take w.r.t the MySQL configurations, as seen here. I guess it'd be good to mention this also somewhere?

utf8mb4_unicode_ci is supported from version 5.5.3 (based on this comment) which was released back around 2010 when utf8mb4 was also released.

@proteusvacuum
Copy link
Contributor

Do you know what happens if someone were to use the old mysql+pymysql://<username>:<password>@<host>/<database name>?charset=utf8 with the new collation?

@Charl1996
Copy link
Contributor Author

@proteusvacuum When I tried to use charset=utf8 it didn't work. If I remember correctly it complained about the same issue which raised the ticket to this PR.
I'm thinking in addition to updating the docs we can check the charset and maybe let the user know in the terminal if their character set is wrong (not utf8mb4)?

@proteusvacuum
Copy link
Contributor

@Charl1996 that sounds like a great idea

@Charl1996
Copy link
Contributor Author

@proteusvacuum @snopoke Any other concerns regarding this PR? (docs will update just after merge)

@snopoke
Copy link
Contributor

snopoke commented Jun 17, 2021

@proteusvacuum @snopoke Any other concerns regarding this PR? (docs will update just after merge)

Looks good to me. Since this will 'break' any existing mysql usages that use a different charset param we should also send out a notification on the forum: https://forum.dimagi.com/c/platform-announce/8

@Charl1996 Charl1996 merged commit d0d21b5 into master Jun 17, 2021
@Charl1996 Charl1996 deleted the cs/SC-1315-mysql-collation-issue branch April 5, 2022 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants