Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeLabel truncates in the middle of unicode surrogate pairs #11697

Closed
wildlyinaccurate opened this issue Nov 23, 2020 · 1 comment
Closed

nodeLabel truncates in the middle of unicode surrogate pairs #11697

wildlyinaccurate opened this issue Nov 23, 2020 · 1 comment

Comments

@wildlyinaccurate
Copy link
Contributor

wildlyinaccurate commented Nov 23, 2020

Provide the steps to reproduce

  1. Run LH on https://www.yfood.eu/ with JSON output

What is the current behavior?

The label audit for the .c-regionswitch__select--footer element has a badly-truncated nodeLabel:

{
"node": {
    "type": "node",
    "selector": ".c-regionswitch__select--footer",
    "path": "1,HTML,1,BODY,1,DIV,4,DIV,0,FOOTER,1,DIV,0,DIV,0,SELECT",
    "snippet": "<select class=\"c-regionswitch__select c-regionswitch__select--footer\">",
    "explanation": "Fix any of the following:\n  aria-label attribute does not exist or is empty\n  aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty\n  Form element does not have an implicit (wrapped) <label>\n  Form element does not have an explicit <label>\n  Element has no title attribute or the title attribute is empty",
    "nodeLabel": "🇩🇪 Deutschland\n🇬🇧 United Kingdom\n🇵🇱 Polska\n🇳🇱 Nederland\n🇫🇷 France\n🇨\ud83c"
}

Not all JSON parsers are able to parse this correctly. For example PHP's json_decode function fails with an error: "Single unpaired UTF-16 surrogate".

Edit: golang's unmarshall also has problems.

What is the expected behavior?

Unicode surrogate pairs should be retained when truncating strings for better compatibility with JSON parsers.

Environment Information

  • Affected Channels: CLI
  • Lighthouse version: 6.4.1
  • Operating System: Ubuntu 20.10 (Linux 5.8.0)
@patrickhulce
Copy link
Collaborator

Thanks @wildlyinaccurate!

Once #11698 lands there are many other potential areas of Lighthouse that are susceptible to this same issue. Remaining work is to investigate all usages for triage and potential fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants