Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(full-page-screenshot): resolve node rects during emulation #11536

Merged
merged 41 commits into from
Nov 30, 2020

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented Oct 7, 2020

Fixes #11118

Before: https://misc-hoten.surge.sh/lh-fullpagescreenshots/noresolve2.report.html
After: https://misc-hoten.surge.sh/lh-fullpagescreenshots/resolve2.report.html

Look at most any audit in a11y. You'll find many inaccuracies in "Before", and they'll be correct in "After". Just a couple examples:

Before

image

image

image

After

image

image

image

Note, there are still many audits that Look wrong, but I think all are either: the element is 1px (ugh...) or is out-of-frame (to the left or right of viewport, such as a carousel).

@connorjclark connorjclark requested a review from a team as a code owner October 7, 2020 22:36
@connorjclark connorjclark requested review from adamraine and removed request for a team October 7, 2020 22:36
@connorjclark connorjclark marked this pull request as draft October 7, 2020 22:36
@google-cla google-cla bot added the cla: yes label Oct 7, 2020
value.clientRect = resolvedNode.newBoundingRect;
}

value.devtoolsNodePath = resolvedNode.newDevtoolsNodePath;
Copy link
Collaborator Author

@connorjclark connorjclark Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this could break things if updating the path (which is the lookup key...) as we go. Perhaps should create a key -> artifact object mapping before doing the updating.

@brendankenny
Copy link
Member

Isn't this still going to have issues if the viewport isn't the same dimensions as in full-page-screenshot when collecting updated positions? Otherwise (speculating without actually running it, sorry :), this seems like it might mostly just be correcting for the axe scroll.

  • should full-page-screenshot be some kind of baseArtifact instead of a regular artifact that's collected after afterPass, and updated bounds can be produced inside there?
  • should artifacts be saving something besides NodeDetails, e.g. just a nodeId and have a separate place for dom element details, and the report can do the lookup from the first to the second? I believe all of this is just for the report's sake, right (e.g. linkifying nodes to the element's panel, correctly highlighting them in the full page screenshot, etc)?

@brendankenny
Copy link
Member

I'm also curious how often the devtoolsNodePath changes, because I assume that's not from changing the viewport and shifting layout, that's from JS in the page rearranging things? It seems like updating might fix some of the broken linkifications that can happen, but in many cases you arguably you aren't linking to the offending element anymore if it's not where it was when it was offending :)

@connorjclark
Copy link
Collaborator Author

connorjclark commented Oct 8, 2020

Isn't this still going to have issues if the viewport isn't the same dimensions as in full-page-screenshot when collecting updated positions? Otherwise (speculating without actually running it, sorry :), this seems like it might mostly just be correcting for the axe scroll.

The width and the DPR remain constant, what changes is the height, so there ought to be no content reflow. This is correcting for injected elements, or images loading too late, anything that causes late-page-load CLS.

  • should full-page-screenshot be some kind of baseArtifact instead of a regular artifact that's collected after afterPass, and updated bounds can be produced inside there?

Perhaps. You're thinking we apply the full-page emulation, run the node resolving code, and then restore the emulation, right?

Maybe the current gatherer code could move to a driver-utility class (like Fetcher).

experimental base artifacts aren't a thing, so this would have to be done just before we are good w/ shipping it.

  • should artifacts be saving something besides NodeDetails, e.g. just a nodeId and have a separate place for dom element details, and the report can do the lookup from the first to the second? I believe all of this is just for the report's sake, right (e.g. linkifying nodes to the element's panel, correctly highlighting them in the full page screenshot, etc)?

I think gatherers should continue to collect all the information on a node ASAP, just in case the node is completely deleted by the time the "resolving" code runs.

I'm also curious how often the devtoolsNodePath changes, because I assume that's not from changing the viewport and shifting layout, that's from JS in the page rearranging things? It seems like updating might fix some of the broken linkifications that can happen, but in many cases you arguably you aren't linking to the offending element anymore if it's not where it was when it was offending :)

FWIW updating the node path was a last-minute decision. It seems like it'd be more accurate and would help linkifying in CDT, but I haven't confirmed. Maybe not, by the time CDT does the linkifying, the associated element may not have "shifted" into place...

oh, and most linkfiying errors are b/c the DOM is totally different when LH does a mobile run vs when DevTools is in desktop mode (the default configuration).

@brendankenny
Copy link
Member

  • should artifacts be saving something besides NodeDetails, e.g. just a nodeId and have a separate place for dom element details, and the report can do the lookup from the first to the second? I believe all of this is just for the report's sake, right (e.g. linkifying nodes to the element's panel, correctly highlighting them in the full page screenshot, etc)?

I think gatherers should continue to collect all the information on a node ASAP, just in case the node is completely deleted by the time the "resolving" code runs.

Yes, I'm suggesting gatherers keep collecting anything necessary for the audits they support (which in many cases don't need the element coordinates at all), but that maybe coordinates in the full screenshot are something different and should be collected separately.

One approach could be the equivalent of LighthouseRunWarnings, where node ids are passed back to gather-runner as an indicator of "here's an element I was interested in", with the same id stored in the artifact the gatherer produces. GatherRunner or a special full-page-screenshot gatherer could then collect coordinates at the end of things.

This could also avoid some possible inconsistencies of updating only parts of artifacts. e.g. TapTarget extends NodeDetails so would get updated at the top level, but all the clientRects that make up each tapTarget would not get updated, so if an audit tried to use both it could find them not overlapping at all. If instead the artifact was not updated but the screenshot coords were kept separately, audit calculations would work out as they do today, but full-page-screenshot would still be able to show things correctly.

@connorjclark
Copy link
Collaborator Author

Yes, I'm suggesting gatherers keep collecting anything necessary for the audits they support (which in many cases don't need the element coordinates at all), but that maybe coordinates in the full screenshot are something different and should be collected separately.

I think I understand, but let me try fleshing it out to make sure we're on the same page.

  • Some gatherers will collect nodes, using getNodeDetails, which also gets a rect (already does this today).
  • A new step at the end of afterPass in gather runner will emulate a tall page
  • While page is tall, we can walk the artifacts and create a new BaseArtifact (let's just store it in FullPageScreenshot w/ the screenshot data, eh?) that contains a mapping of node id (either devtools node path or something new we generate just for this) to "resolved final rect".
    • I understand your LighthouseWarnings idea, making a gatherer declare what nodes it wants to track. I feel strongly that we shouldn't introduce such bookkeeping. I think it's important to have this be automatic: any gatherer that collects node details is probably gonna be used in an audit that displays a node detail type, so let's just collect final rect info for all of it. If it's not used, nbd.
  • Gathering wraps up w/o modifying any original artifact objects (except perhaps adding a new nodeId property)

return details;
}

const getNodeDetailsString = `function getNodeDetails(element) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just put getNodeDetails.toString() in the exports and remove this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i considered that as well, though this fn is a little different from the rest in that it depends on all the others. so we need those stringified functions inlined here within this one.

when i worked on this with @adrianaixba it seems the most practical to just define this fn as a string and skip the conversion in order to make sure all the deps were available. (otherwise i figured whereever we used getNodeDetails in gatherers, we'd also need to inject each of its dependency fns as well.

i still feel pretty decent this was the right compromise for DX, but if there's another good solution we can def explore it.

Copy link
Collaborator Author

@connorjclark connorjclark Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add non trivial code to this function so I split it out, but am keeping the stringified dep export.

if (value.boundingRect) {
value.boundingRect = resolvedNode.newBoundingRect;
} else if (value.clientRect) {
value.clientRect = resolvedNode.newBoundingRect;
Copy link
Member

@paulirish paulirish Oct 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just leaving this for posterity...

the _taptarget artifact_ has a `clientRects` prop (note the plural).. this can have multiple client rects per node even tho it has just a single bounding rect.

image

the iframe and image element artifacts have a clientRect prop. and yes those are identical in definition to boundingRect.

@paulirish
Copy link
Member

paulirish commented Oct 10, 2020

I had an idea for updating each artifacts nodeDetails without the magical walk-object approach . (This idea is basically moot if we go towards Brendan's idea of not touching the artifacts and only keeping the updated details local to Full Page Screenshot.)

the basic idea...
  • just like in this PR, we store stuff in window.__nodes so we can maintain element references for later
  • but we let the artifact dictate how it should be updated.

Here's the first iteration of that... a resolveNodes method for traceElements:
Screen Shot 2020-10-10 at 3 03 53 PM

But nearly all artifacts that share nodeDetails have very similar shape.. so we can refactor and push this method back to the base class:

image

It can be automatic for most gatherers, though funny artifact shapes like Accessibility can manage their own updates:
image

@paulirish
Copy link
Member

paulirish commented Oct 10, 2020

  • I understand your LighthouseWarnings idea, making a gatherer declare what nodes it wants to track.

I'm not sure we all have the same interpretation. My understanding of the idea is.. we use a nodeId of some sort (perhaps UUID-ish?) and both the artifacts/audits and the FPSS share those id's. For example, an audit has a node item.. it has all its own path/snippet/etc that was collected (and never updated). And it also knows this item is nodeId 149. The FPSS has its own dict of (post-afterPass) nodeDetails keyed by nodeId. So... the details renderer calls renderElemScreenshot(149) or w/e.

Implementation-wise: I think this would require a Map<nodeId: string, HTMLElement> as window.__nodes in the page or w/e. All artifacts add their nodes from the getNodeDetails() fn. (Also... caching bonus!) And then the FPSS would just be able to operate from that object on its own and no nothing about any artifacts.


I'll try to capture some open questions:

  1. Do we make FPSS a baseArtifact?
  2. When do we collect updated details? Immediately after all afterPass's OR before FPSS OR after FPSS OR during FPSS?
  3. Where does the updated info reside? Update the artifacts themselves OR keep updated nodeDetails local to FPSS (full page screenshot)?
    1. (If updating artifacts..)
      • How to update? Walk the artifact objects generically OR manage more explicitly?
      • Can we use devtoolsNodePath as our key? What if it changes?
      • If devtoolsNodePath changes which of the two paths should we use for devtools linkifying?
    2. (If joining on nodeId...)
      • ?
  4. Do we do anything about differing results? If there was a image-size-responsive issue during gathering, but not when we collected FPSS... what then?
  5. What about intra-audit inconsistencies? If taptarget's boundingRect is updated but not the clientRects? Or in font-size if an element has way more/less text during FPSS. Or in image-aspect-ratio if a broken aspect ratio was fixed just before FPSS.

@connorjclark
Copy link
Collaborator Author

@adamraine @patrickhulce @paulirish anything else?

Copy link
Member

@adamraine adamraine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@patrickhulce
Copy link
Collaborator

just the naming decision I don't get :) but I trust @adamraine 's final review.

We'll probably need to revisit this for FR support anyhow (most likely it just won't support it though)

Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything else lgtm. just this one simplification i still have faith in

window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.elementToLhId.size,
element.tagName,
].join('-');
window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.lhIdToElements.set(lhId, element);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya but I'm assuming not performant. and this code is simpler/fewer lines.

hmmm.. maybe you're thinking of something else?
what i'm proposing is definitely less work. and its fewer lines

diff --git a/lighthouse-core/gather/gatherers/full-page-screenshot.js b/lighthouse-core/gather/gatherers/full-page-screenshot.js
index 474eca578..5806de603 100644
--- a/lighthouse-core/gather/gatherers/full-page-screenshot.js
+++ b/lighthouse-core/gather/gatherers/full-page-screenshot.js
@@ -82,9 +82,8 @@ class FullPageScreenshot extends Gatherer {
       /** @type {LH.Artifacts.FullPageScreenshot['nodes']} */
       const nodes = {};
       if (!window.__lighthouseNodesDontTouchOrAllVarianceGoesAway) return nodes;
-
-      const {lhIdToElements} = window.__lighthouseNodesDontTouchOrAllVarianceGoesAway;
-      for (const [id, node] of lhIdToElements.entries()) {
+      const elementToLhId = window.__lighthouseNodesDontTouchOrAllVarianceGoesAway;
+      for (const [node, id] of elementToLhId.entries()) {
         // @ts-expect-error - getBoundingClientRect put into scope via stringification
         const rect = getBoundingClientRect(node);
         if (rect.width || rect.height) nodes[id] = rect;
diff --git a/lighthouse-core/lib/page-functions.js b/lighthouse-core/lib/page-functions.js
index 9769877a9..a71bfe497 100644
--- a/lighthouse-core/lib/page-functions.js
+++ b/lighthouse-core/lib/page-functions.js
@@ -452,10 +452,7 @@ function wrapRequestIdleCallback(cpuSlowdownMultiplier) {
 function getNodeDetailsImpl(element) {
   // This bookkeeping is for the FullPageScreenshot gatherer.
   if (!window.__lighthouseNodesDontTouchOrAllVarianceGoesAway) {
-    window.__lighthouseNodesDontTouchOrAllVarianceGoesAway = {
-      lhIdToElements: new Map(),
-      elementToLhId: new Map(),
-    };
+    window.__lighthouseNodesDontTouchOrAllVarianceGoesAway = new Map();
   }
 
   // Create an id that will be unique across all execution contexts.
@@ -465,17 +462,16 @@ function getNodeDetailsImpl(element) {
   // We also dedupe this id so that details collected for an element within the same
   // pass and execution context will share the same id. Not technically important, but
   // cuts down on some duplication.
-  let lhId = window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.elementToLhId.get(element);
+  let lhId = window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.get(element);
   if (!lhId) {
     lhId = [
       window.__lighthouseExecutionContextId !== undefined ?
         window.__lighthouseExecutionContextId :
         'page',
-      window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.elementToLhId.size,
+      window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.size,
       element.tagName,
     ].join('-');
-    window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.lhIdToElements.set(lhId, element);
-    window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.elementToLhId.set(element, lhId);
+    window.__lighthouseNodesDontTouchOrAllVarianceGoesAway.set(element, lhId);
   }
 
   const htmlElement = element instanceof ShadowRoot ? element.host : element;
diff --git a/types/externs.d.ts b/types/externs.d.ts
index a04a09b08..a4727bb45 100644
--- a/types/externs.d.ts
+++ b/types/externs.d.ts
@@ -324,11 +324,8 @@ declare global {
   }
 
   interface Window {
-    /** Used by FullPageScreenshot gatherer. */
-    __lighthouseNodesDontTouchOrAllVarianceGoesAway: {
-      lhIdToElements: Map<string, HTMLElement>;
-      elementToLhId: Map<HTMLElement, string>;
-    };
+    /** Map of Element to lhID. Used by FullPageScreenshot gatherer. */
+    __lighthouseNodesDontTouchOrAllVarianceGoesAway: Map<HTMLElement, string>;
     __lighthouseExecutionContextId?: number;
   }
 }

@connorjclark connorjclark changed the title core(full-page-screenshot): resolve node rects during full page emulation core(full-page-screenshot): resolve rects during full page emulation Nov 30, 2020
@connorjclark connorjclark changed the title core(full-page-screenshot): resolve rects during full page emulation core(full-page-screenshot): resolve node rects during emulation Nov 30, 2020
@connorjclark connorjclark merged commit 26074a7 into master Nov 30, 2020
@connorjclark connorjclark deleted the resolve-nodes branch November 30, 2020 20:19
@paulirish
Copy link
Member

@connorjclark congrats on landing this badboy. 🎉

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, congratulations on shipping! Some polishing comments, but we can hold them for when full-page-screenshot moves to the default config

data: string;
width: number;
height: number;
fullPageScreenshot: LH.Artifacts['FullPageScreenshot'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there an advantage to nesting this? It becomes audits['full-page-screenshot'].details.fullPageScreenshot.screenshot instead of audits['full-page-screenshot'].details.screenshot? The details object is mostly empty and it seems like it just makes more work for report-renderer without saving work elsewhere?

*/
function makeMocksForGatherRunner() {
jest.mock('../lib/stack-collector.js', () => () => Promise.resolve([]));
jest.mock('../gather/gatherers/full-page-screenshot.js', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full-page-screenshot isn't going to be a base artifact, so doesn't seem necessary to specially mock it now?

@@ -203,6 +201,8 @@ declare global {
*/
export interface NodeValue {
type: 'node';
/** Unique identifier. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in our public facing LHR type, so it may be worth giving more of an explanation, and/or possibly a warning not to depend on the format of the id.

})()`;

// Collect nodes with the page context (`useIsolation: false`) and with our own, reused
// context (useIsolation: false). Gatherers use both modes when collecting node details,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// context (useIsolation: false). Gatherers use both modes when collecting node details,
// context (useIsolation: true). Gatherers use both modes when collecting node details,

@@ -85,6 +85,7 @@ class ExecutionContext {
expression: `(function wrapInNativePromise() {
const __nativePromise = window.__nativePromise || Promise;
const URL = window.__nativeURL || window.URL;
window.__lighthouseExecutionContextId = ${contextId};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context id ends up being a binary signal in the lhIds anyways (either page or a single execution context id, since if there had been other contexts in the pass they wouldn't still be around to query anyways :), so maybe consider a more generalized window._isLighthouseIsolatedContext = boolean.

(Ideally we could get away without using it at all, but I can't think of a way to do that while keeping the original IDs in each artifact untouched)

left: 8,
height: '>40',
},
'5-1-P': {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should move screenshot-config off of experimental-config and to a stable set of gatherers or folks messing with experimental gatherers/audits are going to have to deal with changes to these IDs unrelated to what they're up to. (e.g. there's no reason the "The following 2" comment will still apply to these two elements)

*/
function getNodeDetailsImpl(element) {
// This bookkeeping is for the FullPageScreenshot gatherer.
if (!window.__lighthouseNodesDontTouchOrAllVarianceGoesAway) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I also don't understand what __lighthouseNodesDontTouchOrAllVarianceGoesAway is trying to communicate :) It might be worth switching to something more straightforward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make full-page-screenshot resistant to layout shifts
6 participants