Skip to content

Conversion of character set from latin1 to utf8mb3 does not work #1568

Open
@mattbooks

Description

@mattbooks

When using gh-ost to migrate a table from latin1 to utf8mb3 (and likely other utf8) character encoding, the initial data copy works correctly, but new data inserted during the migration via binlog replication can cause the migration to fail if it contains characters that have different encodings between mysql latin1 and utf8mb3, for example: é à ç ñ Ä ß ø ÿ þ æ. ERROR Error 1366 (HY000): Incorrect string value: is thrown repeatedly.

These characters are all valid in latin1, but when gh-ost attempts to apply binary log events containing these characters to the ghost table without proper encoding conversion, errors can occur. It seems gh-ost is not converting these characters to the target character set when replacing from the binlog.

I have created a repo containing a reproduction: https://github.com/mattbooks/gh-ost-latin1-utf8-bug/tree/main

which includes a script that can reproduce the bug locally (my reproduction was with gh-ost 1.1.7 and mysql 8.0.42 on macos)

I have also included there example output.txt and error.txt from my local run which demonstrate that data is inserted correctly when copied, but if it is inserted after the migration is started, raises an error until the migration ultimately fails.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions