New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production Web Apps Performance Study Q4/16 - Q1/17 #1

Open
addyosmani opened this Issue Jan 20, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@addyosmani
Contributor

addyosmani commented Jan 20, 2017

Goals

  • Understand the cost of JS Parse/Compile/FunctionCall times on apps
  • Discover how quickly apps become interactive on average mobile hardware
  • Learn how much JavaScript apps are shipping down on desktop + mobile

Sample information
6000+ production sites using one of React, Angular 1.0, Ember, Polymer, jQuery or Vue.js. Site URLs were obtained from a combination of Libscore.io, BigQuery, BuiltWith-style sites, framework wikis. Sample sets were 10% eye-balled to verify usage of frameworks. Sets not reliable were discarded from the final study.

URLs: https://docs.google.com/a/google.com/spreadsheets/d/1_gqtaEwjoJGbekgeEaYLbUyR4kcp5E7uZuMHYgLJjGY/edit?usp=sharing

Trivia: All in all, 85,000 WebPageTest results were generated as part of this study. Yipes.

Tools used in study
WebPageTest.org (with enhancements such a JS cost, TTI, aggregated V8 statistics added thanks to Pat Meenan as the project progressed), Catapult (internal Google tool), Chrome Tracing.

Summary observations

metrics comparison

breakdowns

mobile-desktop-stury

screen shot 2017-02-06 at 5 19 56 pm

This data may be useful to developers as it shows:

  • Real, production apps using their favorite stacks can be much more expensive on mobile than they might think.
  • Closer attention to lower parse times and time-to-interactive points is likely required if you're choosing something off the shelf.
  • Some, but not all, apps are shipping larger bundles. Where this is the case invest in code-splitting & reducing how much JS is used.

Where are the medians and aggregates?

The primary goals of this study were to highlight trends looking at the different data-sets available to me as a whole. Initially, I focused on summarizing this data at a per-framework level (e.g React apps in set 1 exhibited characteristic A). After reviewing this with the Chrome team, we decided presenting per-framework breakdowns were more susceptible to the takeaway being "oh, so I should just use framework X over Y because it is 2% better" instead of the important takeaways about parse/compile being a problem we all face.

To that end, the below charts are generated locally by fetching each of the WebPageTest reports for data-sets, iterating over a particular dimension (e.g Time-to-interactive, JS parse time) and getting the medians for different sets that are then plumbed into either Google Sheets or Numbers for charting. If you wish to recreate that setup yourself, you can grab the CSVs from the below reports.

Raw WebPageTest runs - Round 2 (January, 2017)

Raw WebPageTest runs - Round 1 - Older study (December, 2016)

I put together this graphic when internally sharing the first version of this study. I decided to redo it as at least the network throttling setup from this study wasn't the same between the 2-3 web perf tooling systems used. This meant that while overall weight, time in script (parse/eval), FMP and load time were fine, the TTI numbers could not be concretely confirmed as 100% accurate. Instead, I redid the study once we added support for TTI to WebPageTest and I'd trust the numbers there (Round 2) a lot more.

screen shot 2016-12-12 at 9 18 38 am

Other data sets generated (Dec, 2016)

Note: many of the below data sets were generated before we installed Moto G4s in WebPageTest and had to use the Moto G1 instead. Some of the data sets will also be using earlier versions of the time-to-interactive metric and should not be directly compared in most cases to the latest data from 2017. This is historical data that's interesting and may be worth reexploring where particular data-sets didn't end up making it to the final study results.

@digitarald

This comment has been minimized.

Show comment
Hide comment
@digitarald

digitarald Feb 10, 2017

@addyosmani nice work. Could you explain what Sets not reliable were discarded means? Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

I requested access to the page sets document, not sure it is intentionally private.

digitarald commented Feb 10, 2017

@addyosmani nice work. Could you explain what Sets not reliable were discarded means? Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

I requested access to the page sets document, not sure it is intentionally private.

@addyosmani

This comment has been minimized.

Show comment
Hide comment
@addyosmani

addyosmani Feb 12, 2017

Contributor

nice work.

Thanks!

Could you explain what Sets not reliable were discarded means? Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

High variance was indeed a problem with certain URLs. One of the challenges with studying web performance at scale is data sets are susceptible to varying quantities of noise. Some of the sets I originally used (prior to filtering) were only using a framework through a transitive dependency.

One example of this were sites that pulled in all of Angular for just an ad, and so, even if the page would have otherwise had a decent TTI, their third-party includes were pushing their TTIs out very heavily. Some of the other data sets I used had URLs that suffered from the same problem so I manually removed them after some manual tracing. This study was mostly looking at pages using a framework for their core content.

Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

The current TTI metric we've implemented in Lighthouse and WebPageTest will very occasionally return infinity for URLs (especially if they keep the main thread busy for a long time). I locally filtered out -1s/infinity values when looking at medians to account for this. My hope is eventually TTI will be reliable enough for filtering like this to not be required.

Contributor

addyosmani commented Feb 12, 2017

nice work.

Thanks!

Could you explain what Sets not reliable were discarded means? Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

High variance was indeed a problem with certain URLs. One of the challenges with studying web performance at scale is data sets are susceptible to varying quantities of noise. Some of the sets I originally used (prior to filtering) were only using a framework through a transitive dependency.

One example of this were sites that pulled in all of Angular for just an ad, and so, even if the page would have otherwise had a decent TTI, their third-party includes were pushing their TTIs out very heavily. Some of the other data sets I used had URLs that suffered from the same problem so I manually removed them after some manual tracing. This study was mostly looking at pages using a framework for their core content.

Are those issues where TTI is infinity and isn't reported or just sets with a high variance?

The current TTI metric we've implemented in Lighthouse and WebPageTest will very occasionally return infinity for URLs (especially if they keep the main thread busy for a long time). I locally filtered out -1s/infinity values when looking at medians to account for this. My hope is eventually TTI will be reliable enough for filtering like this to not be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment