Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse performance results are inconsistent when tested multiple times #5064

Closed
bhargavkonkathi opened this issue Apr 28, 2018 · 11 comments

Comments

@bhargavkonkathi
Copy link

commented Apr 28, 2018

No description provided.

@bhargavkonkathi

This comment has been minimized.

Copy link
Author

commented Apr 28, 2018

Lighthouse failing to show accurate results for performance section when tested multiple times.
I have tested https://hn-polymer-2.firebaseapp.com/ three times in my browser. I got 3 different results for performance section which are not even close enough though. Performance is an important feature for any PWA app. It would be nice to see consistent results or close enough results when a particular website tested several times. Otherwise it is very difficult to understand how PWA app is performance and improve performance. Or If Internet speed is not good enough to analyse interactive speed or first paint, it would nice to see information like Slow Network connection performance might be varying.

Expected
Performance calculated should same even if report generated multiple times for a same website

Actual
Performance results are varying and not same every time when reports are generated multiple times.

Attached some of lighthouse screenshots for https://hn-polymer-2.firebaseapp.com, when tested multiple times.
image
image
image

Thanks.

@bhargavkonkathi bhargavkonkathi changed the title Light house results are varying not constant. Lighthouse performance results are varying and not constant when tested multiple times Apr 28, 2018

@bhargavkonkathi bhargavkonkathi changed the title Lighthouse performance results are varying and not constant when tested multiple times Lighthouse performance results are inconsistent when tested multiple times Apr 28, 2018

@patrickhulce

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2018

@bhargavkonkathi have you followed all the guidance for reducing variability in our scoring docs? 2 of those 3 runs look like external interference. If you have other software on your system that makes pages load irregularly that is what Lighthouse will observe and report.

Also note that if you're running with DevTools/Chrome Extension, the tab will need to be a visible window the entire duration of the audit or results will be skewed by backgrounding.

@sudara

This comment has been minimized.

Copy link

commented Apr 30, 2018

I very much appreciate what Lighthouse is attempting to accomplish. I have many clients who I help with frontend performance. Lighthouse has the potential to make frontend timing more available and understandable.

That being said, the wide variance degrades the usefulness of the tool. Are there plans to allow user to increase n? Even more so than backend benchmarking, frontend benchmarking varies incredibly from request to request, depending on local CPU, network congestion, the stars aligning, etc. An n of requests less than 20 seems unideal. An n of 1 is asking for wildly varying results...

Lighthouse should probably also clearly say "hey, don't touch your computer while we benchmark" — I've noticed that if I keep browsing and using my computer, I can slow down and skew a result by 1.5-3x...

For the moment, I'm cautioning my clients when they come to me with Lighthouse runs. People are also not really clear about the throttling (that Lighthouse is showing a "worst case" scenario, which is awesome). I tell them to run it multiple times, to not touch their computer, and that webpagetest.org is their best bet for stable results (even with 9 runs the variance can be kinda crazy!)

In short, I love the user friendliness of the tool. I'd love to see support for multiple requests in Lighthouse so we can also see some stability in results!

@sudara

This comment has been minimized.

Copy link

commented Apr 30, 2018

Also note that if you're running with DevTools/Chrome Extension, the tab will need to be a visible window the entire duration of the audit or results will be skewed by backgrounding.

This might be super valuable information to tell users immediately, right after hitting "Run Audit"...

@bhargavkonkathi

This comment has been minimized.

Copy link
Author

commented May 1, 2018

@patrickhulce Yes, I have followed the scoring docs. Also I didn't closed window for entire duration of audit. I have tested many times same websites, I saw variation in results.
Thanks.

@patrickhulce

This comment has been minimized.

Copy link
Collaborator

commented May 1, 2018

@sudara the initiative you're describing is tracked by #4204 (codename smoothhouse) which will run LH multiple times and combine the results.

In v3 our default mode of operation will be using a different performance engine that should cut variance by about 50%. Note there will always be a trade-off between realism and stability, the real world has a high amount of variance and so do all observed methods of performance measurement. Removing the variance is therefore removing an element of realism, and we're attempting to balance the two concerns thoughtfully.

If you'd like to learn more about this, there's a detailed document where you can read all about our findings.


@bhargavkonkathi of the three runs you pasted only one of them seems valid, the other two are clear error cases (1st one is a classic example of what happens when the window is obscured until the 28s mark and Chrome finally paints, 3rd one has an error in metric computation). If you're seeing variance on a particular URL beyond what's outlined in the document for non-error cases, we'd be happy to hear about it. If a particular URL errors frequently, a separate issue is very much appreciated :)

@sudara

This comment has been minimized.

Copy link

commented May 2, 2018

Thanks @patrickhulce!

That's fantastic that you guys have something planned. As you say, there's so much real life variance that it's important to capture and present that variance (vs. provide only 1 possible datapoint). Sounds like you guys plan on including stddev in some shape or form which will be helpful to guide people's expectations.

Wow, that detailed document is above and beyond. Thanks for sharing it. Looking forward to smoothhouse!

@paulirish

This comment has been minimized.

Copy link
Member

commented Aug 29, 2018

Resolved in Lighthouse 3.x. We don't have the ability to do multiple runs (choosing the median), but its less important as 3.x is more consistent.

Please open a new issue if you're seeing significant variability now.

@paulirish paulirish closed this Aug 29, 2018

@fusiondesigner

This comment has been minimized.

Copy link

commented Jul 10, 2019

I removed a large payload of vids from the tested URL, it went DOWN 1 point. Total payload was 14MB, surely removing these and then retesting would boost the score???

@patrickhulce

This comment has been minimized.

Copy link
Collaborator

commented Jul 10, 2019

@fusiondesigner generally speaking the complete size of videos does not affect the performance metrics. You should have seen a significant change to the "Total Byte Weight" diagnostic, but the first paint time for example is not really affected by the duration/size of a video clip.

Also, down a single point is not a meaningful measurement. Performance scores are expected to fluctuate within a few points. Learn more.

@fusiondesigner

This comment has been minimized.

Copy link

commented Jul 10, 2019

@patrickhulce thanks Patrick, we were confused as it highlighted a large payload from two videos in the source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.