Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: avoid splitting node labels in the middle of unicode surrogate pairs #11698

Merged
merged 12 commits into from
Dec 6, 2020
Merged
7 changes: 7 additions & 0 deletions .github/scripts/git-get-shared-history.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ set -euxo pipefail

# We can always use some more history
git -c protocol.version=2 fetch --deepen=100

# Find out if the PR is coming from a fork
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulirish how do you feel about this skipping this check entirely if it's from a fork for now instead and we can sort out the fix later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works for me.. though determining that would probably have required almost this much iteration as well... so i think it's all moot.

ALL MOOT!

base_clone_url=$(jq --raw-output '.pull_request.head.repo.clone_url' $GITHUB_EVENT_PATH)
# If it is, we need the fork's history, too.
if [[ $base_clone_url != "GoogleChrome/lighthouse.git" ]]; then
git -c protocol.version=2 fetch --deepen=100 "$base_clone_url"
fi
echo "History is deepened."

if git merge-base HEAD origin/master > /dev/null; then
Expand Down
4 changes: 3 additions & 1 deletion lighthouse-core/lib/page-functions.js
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,9 @@ function getNodeLabel(node) {
if (str.length <= maxLength) {
return str;
}
return str.slice(0, maxLength - 1) + '…';
// Take advantage of string iterator multi-byte character awareness.
// Regular `.slice` will ignore unicode character boundaries and lead to malformed text.
return Array.from(str).slice(0, maxLength - 1).join('') + '…';
wildlyinaccurate marked this conversation as resolved.
Show resolved Hide resolved
}
const tagName = node.tagName.toLowerCase();
// html and body content is too broad to be useful, since they contain all page content
Expand Down
8 changes: 8 additions & 0 deletions lighthouse-core/test/lib/page-functions-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,14 @@ describe('Page Functions', () => {
assert.equal(pageFunctions.getNodeLabel(el).length, 80);
});

it('Truncates long text containing unicode surrogate pairs', () => {
const el = dom.createElement('div');
// `getNodeLabel` truncates to 80 characters internally.
// We want to test a unicode character on the boundary.
el.innerText = Array(78).fill('a').join('') + '💡💡💡';
wildlyinaccurate marked this conversation as resolved.
Show resolved Hide resolved
assert.equal(pageFunctions.getNodeLabel(el), Array(78).fill('a').join('') + '💡…');
});

it('Uses tag name for html tags', () => {
const el = dom.createElement('html');
assert.equal(pageFunctions.getNodeLabel(el), 'html');
Expand Down