Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(js-usage): normalize url key #11302

Merged
merged 17 commits into from
Sep 2, 2020
Merged

core(js-usage): normalize url key #11302

merged 17 commits into from
Sep 2, 2020

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented Aug 22, 2020

For some reason the url for script coverage is relative to the document. This messes up some assumptions downstream:

EDIT: the reason is because of magic sourceURL comments.

fixes #11261

@connorjclark connorjclark requested a review from a team as a code owner August 22, 2020 00:13
@connorjclark connorjclark requested review from paulirish and removed request for a team August 22, 2020 00:13
@patrickhulce
Copy link
Collaborator

woah nice find!

@connorjclark connorjclark changed the title misc core(js-usage): normalize url key Aug 22, 2020
@connorjclark
Copy link
Collaborator Author

we really missed the mark here :o
image

@patrickhulce
Copy link
Collaborator

Hm where is that coming from @connorjclark ? we do pickup most of the inline script and the unused bytes doesn't change much

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 22, 2020

for the bytes fixture, this is what master currently uses for keys in JS Usage:

image

and a lil' bit of what store.google.com looks like:

image

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 22, 2020

Hm where is that coming from @connorjclark ? we do pickup most of the inline script and the unused bytes doesn't change much

bytes smoke test. Right, but we overlook some of it because the url "key" doesn't match any network record urls, because they aren't normalized. missed in L89 linked to above

@connorjclark
Copy link
Collaborator Author

(totally unsure about any of the changes in expectations, verifiying ...)

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 22, 2020

By a manual count, there is only ~15K bytes of inline JS in tester.html. However, given this change we overcount in UnusedJavaScriptSummary.request({url, scriptCoverages, bundle}, context):

image

It's curious that some script coverages have a url of '', but others have the document url.

@connorjclark
Copy link
Collaborator Author

..... wow.

We are collecting coverage data of the code we run ourselves over the protocol.

I bet all script coverages of url '' are protocol evals.

image

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 22, 2020

@@ -147,7 +147,7 @@ const expectations = [
// the specific ms value here is not meaningful for this smoketest
// *some* savings should be reported
overallSavingsMs: '>0',
overallSavingsBytes: '>=25000',
overallSavingsBytes: '35000 +/- 1000',
Copy link
Collaborator Author

@connorjclark connorjclark Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ / - is better, and good to be closer to the actual value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about adding some assertions against the JsUsage artifact in smokes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, like that?

@patrickhulce
Copy link
Collaborator

sooooo, full circle this really had no impact on unused-javascript but it's nice to clean up the artifacts for other use cases and clearer understanding anyway? :)

@connorjclark
Copy link
Collaborator Author

sooooo, full circle this really had no impact on unused-javascript but it's nice to clean up the artifacts for other use cases and clearer understanding anyway? :)

not exactly. the smoke tests don't change, but we were ignoring coverage (total and wasted bytes) for scripts that for whatever reason had relative urls.

@connorjclark
Copy link
Collaborator Author

so it's a relative url when it comes from an iframe's inline scripts:

/_/gstore/_/js/k=gstore.gs.en_US.wT0--8C61Io.O/d=1/ct=zgms/rs=AF2QpWz4sGCjLARARhHywMDWhQy1Mxuxgg/m=base,main_app

or via an xhr (i don't get this. shows as an XHR in devtools, and I guess it gets eval'd, but how would CDT know to assign that script execution to this xhr URL?):

//www.gstatic.com/_/gstore/_/js/k=gstore.gs.en_US.wT0--8C61Io.O/d=0/ct=zgms/rs=AF2QpWz4sGCjLARARhHywMDWhQy1Mxuxgg/m=i50cKd,IytKEc,KtUhlb,sy2j,b4NnPb,sy3s,OiSVpc,Gph6af,NpD4ec,ws9Tlc,sy2o,mDYeOb,sy34,sy3g,IN5sze,vBnY1b,sy2r,sy4a,OZLtzd,AmCo0,tEPM8d,X63hgd,rFHKrb,ftfIab,ht3Cde,siEdTe,PqBip,eqAa9b,sy4y,sy5o,ns7pce,BH9pn,sy40,sy42,sy41,sy43,sy45,sy44,sy46,spsgvd,Xs5gRb,sy77,ZxjeB,q2fDU,gH5dsc,HV7egb,sy3u,sy58,sy3w,syy,sy3i,sy3t,sy3v,sy5x,sy59,sy6p,sy6q,sy76,sy7f,sy6r,sy7g,BlKz5e,zj4b1e,sy3y,A3htC,sy3z,ocC1vd,sy9b,sy9e,sy9f,xQtZb,sy2n,sy2q,L1AAkb,Q7UYhe,sy33,waWync,VNGsPd,rHjpXd,aW3pY,NN0gd,sy6w,WP5mZd,sy3x,AX3Okd,sy9c,uiNkee,sy9d,SM1lmd,sy6,sy2c,syg,sy2b,sy2t,syj,sy2g,sy3a,sy8,sy9,syb,syc,syd,sye,syk,sy2s,sy2y,sy2d,sy2e,sy2f,sy2l,sy3b,sy3c,sy6k,syac,fgj8Rb,BNLg8,sy57,qFbipd,RlUmc,I7Tk7e,sy72,sy73,M3lKNb,sy5,syq,sy4l,sy3l,sy4n,sy3m,sy4i,CLTtpb,sy4s,sy6z,OvhPc,sy3n,GniU2c,sy74,sy75,pAGKFb,Hchzpe,y86Xpd,pIZMdc,sy5a,vm9TOc,uaNaCf,zJdDwc,zJRogb,EVfple,rswQ7b,sy51,f1cdZb,syab,syf,K99qY,Jdbz6e,sy1x,sy9p,sy7,syl,syu,Mq9n0c,sya,syx,cQ3Jxd,sy4z,J3qlCf,bVZOGe,sy4f,WZ8Gbd,sy3h,zT0wnf,knK7bb,YiQRpc,sy54,lFKx5,KpROUe

image

@connorjclark
Copy link
Collaborator Author

anyway, let's hold this PR until I add to the bytes fixture to test coverage of JS loaded in an iframe + loaded via xhr+eval.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 26, 2020

Update: nothing to do with iframes or xhr. It's all because of //# sourceURL=. This overrides the actual network request url in ScriptCoverage.url. ex: normally the coverage object for an inline script uses html document for the url. but not if there's a sourceURL comment.

We have the scriptId, so we can use that to get a more proper URL.

IMO there really should be a distinction in the protocol between network urls and the magic comment urls. This has already bit us before.

@connorjclark
Copy link
Collaborator Author

in a gatherer, how might I turn a scriptId into a URL (if applicable)?

@patrickhulce
Copy link
Collaborator

in a gatherer, how might I turn a scriptId into a URL (if applicable)?

AFAIK Debugger.scriptParsed is the way to get this, so if the URL it provides is not the real URL then we might be stuck. DevTools folks/@paulirish would know better though.

I have this problem in DevTools where it's impossible to get back the non-sourcemap comment URL of what I'm looking at, so that leads me to believe it might be a limitation with the current protocol setup. I hope I'm wrong 🤞

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 26, 2020

AFAIK Debugger.scriptParsed is the way to get this, so if the URL it provides is not the real URL then we might be stuck. DevTools folks/@paulirish would know better though.

yup, the url there is also overriden by magic url.

however, embedderName seems to be the "original" url ...

(bytes fixture)
image

(store.google.com, highlighted is a known inline script w/ a comment url)
image

(store.google.com, sometimes url is present but embedderName is blank ... I think this is eval from an xhr?)
image

@connorjclark
Copy link
Collaborator Author

connorjclark commented Aug 27, 2020

this seems inconsistent with our treatment of sourcemaps. :/

I agree, but resolving will require larger changes within unused-javascript (for one, making network record not necessary). And given it's just to give a better name for inline scripts using sourceURL, I'm not inclined to prioritize the work.

Thinking out loud, an easy resolution might be to replace the url in unused-javascript items with JsUsage[realurl][0].url, which should either be the real url or the sourceURL (LinkValue is probably better so we can link to the real url, and not end up with the same discoverability issues that exist in DevTools)

@paulirish
Copy link
Member

our treatment of sourcemaps

recapping.....

  • sourceURL is a way to tell tools that the current script has a url, mostly it's used for improved debuggability on eval() or inline scripts. used cleverly, it can mimic sourcemaps... one of webpack's many devtool settings uses sourceURL+eval. (Oh and you can change a script's real URL to a fake one with sourceURL.)
  • source maps tell tools that the current script is comprised of 1+ source files, and shows how. the URLs it includes in its sources property can be arbitrary too, so you could wreak similar kinds of havoc if you wanted to.

(fun fact: you can actually use both sourceURL and sourcemappingurl on the same anonymous script to give it identity and describe it's provenance)


for sourcemaps we use subitems to show how a generated/compiled/network source is comprised of its original mapped sources:
image

and in #10930 we are showing mapped source locations instead of the generated/compiled/network location.
image


this seems inconsistent with our treatment of sourcemaps. :/

I agree, but resolving will require larger changes within unused-javascript (for one, making network record not necessary). And given it's just to give a better name for inline scripts using sourceURL, I'm not inclined to prioritize the work.

Thinking out loud, an easy resolution might be to replace the url in unused-javascript items with JsUsage[realurl][0].url, which should either be the real url or the sourceURL (LinkValue is probably better so we can link to the real url, and not end up with the same discoverability issues that exist in DevTools)

K yeah. it sounds like we're mostly on the same page here then.

at first blush that resolution seems pretty good. i agree the complete solution is quite complicated and changes a lot of ways that we've been modeling scripts vs requests. i need some more time to noodle on the problem. and perhaps we can bring this discussion to the eng sync.

@connorjclark
Copy link
Collaborator Author

Made an issue: #11334

I think this particular PR could land, though. Keying these coverage data by the network url is the most natural thing to do here. I think any usage of urls from sourceURL comments should be done explicitly within the audits.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to discussing further in eng sync but still landing this great progress in the meantime :)

lighthouse-core/gather/gatherers/js-usage.js Show resolved Hide resolved
@@ -147,7 +147,7 @@ const expectations = [
// the specific ms value here is not meaningful for this smoketest
// *some* savings should be reported
overallSavingsMs: '>0',
overallSavingsBytes: '>=25000',
overallSavingsBytes: '35000 +/- 1000',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about adding some assertions against the JsUsage artifact in smokes?

@@ -34,9 +43,23 @@ class JsUsage extends Gatherer {
/** @type {Record<string, Array<LH.Crdp.Profiler.ScriptCoverage>>} */
const usageByUrl = {};
for (const scriptUsage of scriptUsages) {
const scripts = usageByUrl[scriptUsage.url] || [];
// ScriptCoverages without a URL are from code we run over the protocol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go a bit further and also continue if scriptParsedEvent.embedderName === ''

yup sg

lighthouse-core/gather/gatherers/js-usage.js Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LightHouse HTML report generated in Debian not rendering in browser
5 participants