Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Averages suck, why compare to average? #8

Closed
igrigorik opened this issue Nov 5, 2014 · 10 comments
Closed

Averages suck, why compare to average? #8

igrigorik opened this issue Nov 5, 2014 · 10 comments

Comments

@igrigorik
Copy link

Filesize is long tail distribution, using average is at best meaningless, and at worst, completely misleading.. since a single outlier can skew the value. At a minimum, we should be using medians.

Even better, we should be capturing quantiles - e.g. you're in the 90% quantile, your site is performing worse than 90% of other sites when it comes to image weight, vs. you're in the 10% quantile, your site is in the top 10% percent for image size.

@addyosmani
Copy link
Owner

Thanks Ilya. I agree there have to be better metrics available to compare against - the median is a good first step in moving away from the average. I'm happy to dive into the quantile approach you suggested too.

On the technical side, it appears there are a few existing modules for parsing data from bigquery, so it should in theory be relatively trivial to get the median in place. I may have to ask some follow up questions on reasoning quantiles for individual user queries.

@igrigorik
Copy link
Author

@addyosmani the numbers don't change that much, I think we can just import + version a CSV with results. Here's a quick query:

SELECT * FROM
  (SELECT 'desktop' type,
    round(avg(round(bytesImg/1024))) average,
    NTH(10, quantiles(round(bytesImg/1024))) p10,
    NTH(20, quantiles(round(bytesImg/1024))) p25,
    NTH(30, quantiles(round(bytesImg/1024))) p30,
    NTH(50, quantiles(round(bytesImg/1024))) p40,
    NTH(60, quantiles(round(bytesImg/1024))) p60,
    NTH(70, quantiles(round(bytesImg/1024))) p70,
    NTH(75, quantiles(round(bytesImg/1024))) p75,
    NTH(80, quantiles(round(bytesImg/1024))) p80,
    NTH(85, quantiles(round(bytesImg/1024))) p85,
    NTH(90, quantiles(round(bytesImg/1024))) p90,
    NTH(95, quantiles(round(bytesImg/1024))) p95
    FROM [httparchive:runs.2014_10_15_pages]),
  (SELECT 'mobile' type,
    round(avg(round(bytesImg/1024))) average,
    NTH(10, quantiles(round(bytesImg/1024))) p10,
    NTH(20, quantiles(round(bytesImg/1024))) p25,
    NTH(30, quantiles(round(bytesImg/1024))) p30,
    NTH(50, quantiles(round(bytesImg/1024))) p40,
    NTH(60, quantiles(round(bytesImg/1024))) p60,
    NTH(70, quantiles(round(bytesImg/1024))) p70,
    NTH(75, quantiles(round(bytesImg/1024))) p75,
    NTH(80, quantiles(round(bytesImg/1024))) p80,
    NTH(85, quantiles(round(bytesImg/1024))) p85,
    NTH(90, quantiles(round(bytesImg/1024))) p90,
    NTH(95, quantiles(round(bytesImg/1024))) p95
  FROM [httparchive:runs.2014_10_15_pages_mobile]);

image

type,average,p10,p25,p30,p40,p60,p70,p75,p80,p85,p90,p95
desktop,1207.0,47.0,144.0,269.0,614.0,853.0,1189.0,1411.0,1706.0,2112.0,2762.0,4073.0
mobile,660.0,17.0,59.0,114.0,302.0,439.0,639.0,816.0,994.0,1232.0,1639.0,2513.0

@addyosmani
Copy link
Owner

That approach works for me. I've signed up for BigQuery and am happy to check-in and update the CSV periodically. Will let you know once we've updated to support medians. Shouldn't take long.

@addyosmani
Copy link
Owner

Implemented in branch. How do you feel about this for the output?

@igrigorik
Copy link
Author

I think you're still average in your calculation, and I just realized that I omitted p50 in above query - doh.

What's your thinking behind asking the site author to get to a median? Seems like we should, instead, ask them to make it as small as possible to meet their use case: no reason to stop at p50, and sometimes (due to business requirements, etc), it may not be possible either. I'd suggest something like...

  • Your image weight: 3.31MB
    • Desktop: 90th percentile (median site: 700KB)
    • Mobile: 95th percentile (median site: 350KB)
  • needs some call to action at the end...

@addyosmani
Copy link
Owner

It's my turn to doh! :) I completely misread the field heading. You're right - we're still reading the average here.

What's your thinking behind asking the site author to get to a median?

The thought here was that getting them to quantify their image weight compared to how well/badly the rest of the web is doing, you had some incentive to work on trimming down your overall size of said images. Percentiles make more sense to use than averages however.

I'd suggest something like...

I'll get a version done that works as suggested. The revised query should be:

SELECT * FROM
  (SELECT 'desktop' type,
    round(avg(round(bytesImg/1024))) average,
    NTH(10, quantiles(round(bytesImg/1024))) p10,
    NTH(20, quantiles(round(bytesImg/1024))) p25,
    NTH(30, quantiles(round(bytesImg/1024))) p30,
    NTH(40, quantiles(round(bytesImg/1024))) p40,
    NTH(50, quantiles(round(bytesImg/1024))) p50,
    NTH(60, quantiles(round(bytesImg/1024))) p60,
    NTH(70, quantiles(round(bytesImg/1024))) p70,
    NTH(75, quantiles(round(bytesImg/1024))) p75,
    NTH(80, quantiles(round(bytesImg/1024))) p80,
    NTH(85, quantiles(round(bytesImg/1024))) p85,
    NTH(90, quantiles(round(bytesImg/1024))) p90,
    NTH(95, quantiles(round(bytesImg/1024))) p95
    FROM [httparchive:runs.2014_10_15_pages]),
  (SELECT 'mobile' type,
    round(avg(round(bytesImg/1024))) average,
    NTH(10, quantiles(round(bytesImg/1024))) p10,
    NTH(20, quantiles(round(bytesImg/1024))) p25,
    NTH(30, quantiles(round(bytesImg/1024))) p30,
    NTH(40, quantiles(round(bytesImg/1024))) p40,
    NTH(50, quantiles(round(bytesImg/1024))) p50,
    NTH(60, quantiles(round(bytesImg/1024))) p60,
    NTH(70, quantiles(round(bytesImg/1024))) p70,
    NTH(75, quantiles(round(bytesImg/1024))) p75,
    NTH(80, quantiles(round(bytesImg/1024))) p80,
    NTH(85, quantiles(round(bytesImg/1024))) p85,
    NTH(90, quantiles(round(bytesImg/1024))) p90,
    NTH(95, quantiles(round(bytesImg/1024))) p95
  FROM [httparchive:runs.2014_10_15_pages_mobile]);

does that look right?

@igrigorik
Copy link
Author

Re, query: yep. Let's drop the average column, not even sure why I included it -- only thing it does is promote the bad practice.

The thought here was that getting them to quantify their image weight compared to how well/badly the rest of the web is doing, you had some incentive to work on trimming down your overall size of said images. Percentiles make more sense to use than averages however.

For the call to action, how about...

  • Your image weight: 3.31MB
    • Your site delivers more image bytes than 90% of desktop sites:
      • +2.3MB compared to a site in 75th percentile
      • +2.7MB compared to a site in 50th percentile
      • +3.1MB compared to a site in 25th percentile
    • Your site delivers more image bytes than 95% of mobile sites:
      • ...

@addyosmani
Copy link
Owner

Current progress:

Next: Your site delivers more image bytes than X% of mobile sites:

@igrigorik
Copy link
Author

image

@addyosmani
Copy link
Owner

Support for this has landed in master and should be in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants