-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Averages suck, why compare to average? #8
Comments
Thanks Ilya. I agree there have to be better metrics available to compare against - the median is a good first step in moving away from the average. I'm happy to dive into the quantile approach you suggested too. On the technical side, it appears there are a few existing modules for parsing data from bigquery, so it should in theory be relatively trivial to get the median in place. I may have to ask some follow up questions on reasoning quantiles for individual user queries. |
@addyosmani the numbers don't change that much, I think we can just import + version a CSV with results. Here's a quick query: SELECT * FROM
(SELECT 'desktop' type,
round(avg(round(bytesImg/1024))) average,
NTH(10, quantiles(round(bytesImg/1024))) p10,
NTH(20, quantiles(round(bytesImg/1024))) p25,
NTH(30, quantiles(round(bytesImg/1024))) p30,
NTH(50, quantiles(round(bytesImg/1024))) p40,
NTH(60, quantiles(round(bytesImg/1024))) p60,
NTH(70, quantiles(round(bytesImg/1024))) p70,
NTH(75, quantiles(round(bytesImg/1024))) p75,
NTH(80, quantiles(round(bytesImg/1024))) p80,
NTH(85, quantiles(round(bytesImg/1024))) p85,
NTH(90, quantiles(round(bytesImg/1024))) p90,
NTH(95, quantiles(round(bytesImg/1024))) p95
FROM [httparchive:runs.2014_10_15_pages]),
(SELECT 'mobile' type,
round(avg(round(bytesImg/1024))) average,
NTH(10, quantiles(round(bytesImg/1024))) p10,
NTH(20, quantiles(round(bytesImg/1024))) p25,
NTH(30, quantiles(round(bytesImg/1024))) p30,
NTH(50, quantiles(round(bytesImg/1024))) p40,
NTH(60, quantiles(round(bytesImg/1024))) p60,
NTH(70, quantiles(round(bytesImg/1024))) p70,
NTH(75, quantiles(round(bytesImg/1024))) p75,
NTH(80, quantiles(round(bytesImg/1024))) p80,
NTH(85, quantiles(round(bytesImg/1024))) p85,
NTH(90, quantiles(round(bytesImg/1024))) p90,
NTH(95, quantiles(round(bytesImg/1024))) p95
FROM [httparchive:runs.2014_10_15_pages_mobile]);
|
That approach works for me. I've signed up for BigQuery and am happy to check-in and update the CSV periodically. Will let you know once we've updated to support medians. Shouldn't take long. |
I think you're still average in your calculation, and I just realized that I omitted p50 in above query - doh. What's your thinking behind asking the site author to get to a median? Seems like we should, instead, ask them to make it as small as possible to meet their use case: no reason to stop at p50, and sometimes (due to business requirements, etc), it may not be possible either. I'd suggest something like...
|
It's my turn to doh! :) I completely misread the field heading. You're right - we're still reading the average here.
The thought here was that getting them to quantify their image weight compared to how well/badly the rest of the web is doing, you had some incentive to work on trimming down your overall size of said images. Percentiles make more sense to use than averages however.
I'll get a version done that works as suggested. The revised query should be:
does that look right? |
Re, query: yep. Let's drop the average column, not even sure why I included it -- only thing it does is promote the bad practice.
For the call to action, how about...
|
Support for this has landed in master and should be in the next release. |
Filesize is long tail distribution, using average is at best meaningless, and at worst, completely misleading.. since a single outlier can skew the value. At a minimum, we should be using medians.
Even better, we should be capturing quantiles - e.g. you're in the 90% quantile, your site is performing worse than 90% of other sites when it comes to image weight, vs. you're in the 10% quantile, your site is in the top 10% percent for image size.
The text was updated successfully, but these errors were encountered: