Mangled Unicode characters in yellow message after matching using "search for match" dialog #6063
Labels
encoding
Selection of encoding at import time, or encoding issues in data cleaning
reconciliation
Related to the reconciliation operations and other features
Type: Bug
Issues related to software defects or unexpected behavior, which require resolution.
Milestone
After matching a value, there's a yellow message shown at the top of the page. If the match was done using the "Search for match" dialog, many Unicode characters are turned into "?".
To Reproduce
Steps to reproduce the behavior:
Current Results
The yellow message shown after matching shows Māori as "M?ori" and "Omaha–Ponca" as "Omaha?Ponca", but displays "Võro" correctly.
All of the names are displayed correctly if you click on the tick to accept the match instead of using "Search for match".
Expected Behavior
The Unicode characters should not turn into question marks.
Screenshots
After using "Search for match":
After clicking on the tick:
(taken from a longer list of names, so the row numbers don't match)
Versions
Datasets
Additional context
The non-ASCII characters in these three names are:
It seems to only affect characters beyond U+00FF, so something is probably trying to use ISO 8859-1 (Latin-1).
Looking at the network requests seems to confirm that:
When using "Search for match", the browser sends the data to
/command/core/recon-judge-similar-cells
, and when clicking on the tick, it sends it to/command/core/recon-judge-one-cell
.In both cases it uses the header
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
./command/core/recon-judge-similar-cells
gives a response with the headerContent-Type: application/json;charset=iso-8859-1
while/command/core/recon-judge-one-cell
gives a response withContent-Type: application/json;charset=utf-8
.The text was updated successfully, but these errors were encountered: