Description
When using gh-ost to migrate a table from latin1 to utf8mb3 (and likely other utf8) character encoding, the initial data copy works correctly, but new data inserted during the migration via binlog replication can cause the migration to fail if it contains characters that have different encodings between mysql latin1 and utf8mb3, for example: é à ç ñ Ä ß ø ÿ þ æ. ERROR Error 1366 (HY000): Incorrect string value:
is thrown repeatedly.
These characters are all valid in latin1, but when gh-ost attempts to apply binary log events containing these characters to the ghost table without proper encoding conversion, errors can occur. It seems gh-ost is not converting these characters to the target character set when replacing from the binlog.
I have created a repo containing a reproduction: https://github.com/mattbooks/gh-ost-latin1-utf8-bug/tree/main
which includes a script that can reproduce the bug locally (my reproduction was with gh-ost 1.1.7 and mysql 8.0.42 on macos)
I have also included there example output.txt and error.txt from my local run which demonstrate that data is inserted correctly when copied, but if it is inserted after the migration is started, raises an error until the migration ultimately fails.