Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage_global data for Firefox is implausible #6733

Open
foolip opened this issue Jun 3, 2023 · 14 comments
Open

usage_global data for Firefox is implausible #6733

foolip opened this issue Jun 3, 2023 · 14 comments

Comments

@foolip
Copy link
Contributor

foolip commented Jun 3, 2023

This is the usage_global data from https://raw.githubusercontent.com/Fyrd/caniuse/cc3ccd6ca9c69dd811d2952d873abcd0750e34a2/fulldata-json/data-2.0.json:

Firefox usage by version
version usage
2 0.004118
3 0.004271
3.5 0.008786
3.6 0.00487
4 0.011703
5 0.004879
6 0.020136
7 0.005725
8 0.004525
9 0.00533
10 0.004283
11 0.008882
12 0.004471
13 0.004486
14 0.00453
15 0.008322
16 0.004417
17 0.004425
18 0.004161
19 0.004443
20 0.004283
21 0.008322
22 0.013698
23 0.004161
24 0.008786
25 0.004118
26 0.004317
27 0.004393
28 0.004418
29 0.008834
30 0.008322
31 0.008928
32 0.004471
33 0.009284
34 0.004707
35 0.009076
36 0.004081
37 0.004783
38 0.003929
39 0.004783
40 0.00487
41 0.005029
42 0.0047
43 0.022205
44 0.004441
45 0.003867
46 0.004525
47 0.004293
48 0.004081
49 0.004538
50 0.008282
51 0.011601
52 0.039969
53 0.011601
54 0.004441
55 0.004441
56 0.004441
57 0.011601
58 0.003939
59 0.004441
60 0.003929
61 0.004356
62 0.004425
63 0.008322
64 0.00415
65 0.004267
66 0.003801
67 0.004267
68 0.004081
69 0.00415
70 0.004293
71 0.004425
72 0.013323
73 0.00415
74 0.00415
75 0.004318
76 0.004356
77 0.003974
78 0.031087
79 0.004081
80 0.004081
81 0.004081
82 0.003861
83 0.004441
84 0.003929
85 0.004268
86 0.003801
87 0.008882
88 0.004441
89 0.003943
90 0.003943
91 0.008882
92 0.003801
93 0.007722
94 0.017764
95 0.003773
96 0.007886
97 0.003901
98 0.003901
99 0.004081
100 0.003861
101 0.004081
102 0.097702
103 0.017764
104 0.004441
105 0.008882
106 0.008882
107 0.008882
108 0.013323
109 0.022205
110 0.048851
111 1.00367
112 0.905964
113 0.008882
114 0
115 0

There is something quite implausible about these numbers. The sum is ~2.8%, but versions 2-71 together make up ~0.48%. Firefox 72 was released on 2020-01-07, so that would mean that ~17% of Firefox users are on a version more than 3.5 years old.

It also seems suspect that usage is relatively evenly spread among all those old versions.

@Fyrd can you say anything about the source of this data and what might be going on here? Might it also affect the numbers for other browsers?

@foolip
Copy link
Contributor Author

foolip commented Jun 3, 2023

As for the overall usage, ~2.8% is quite different from the ~5.1% om https://radar.cloudflare.com/adoption-and-usage. But for the analysis I'm doing it's the version breakdown that matters most. I'm sure there are problems with the overall browser breakdown as well, but that's not what this issue is about.

@Fyrd
Copy link
Owner

Fyrd commented Jun 3, 2023

Looked into this issue and I found the problem, it seems the importer script wasn't resetting values before importing data for a new month. As a result that last time StatCounter would report data for a version that amount was still being used.

So it would usually be a small amount but those certainly add up. I've fixed the script to reset all amounts to 0 before importing now which should fix the issue.

The issue was specific to the global usage data which is processed a little differently from the regional data, which shouldn't have this issue. For example the standalone worldwide data file (generated like other regions) seen here.

Thanks for bringing this up @foolip! Let me know if you run into any other oddities like this.

@foolip
Copy link
Contributor Author

foolip commented Jun 4, 2023

Thank you @Fyrd, I see the data was updated in 87ff4c1.

Did this issue affect all browsers or only Firefox?

https://gist.github.com/foolip/1f6b3538aeeb03223f78c3742f3f3e07 is the new usage. All versions now add up to ~2.36%, was that an intended change?

I also see that 0.00477 is common usage number, and that this is the smallest number. 0.00954 also appears and is 2×0.00477. Is there some kind of quantization going on here?

These small numbers still add up to make the results an bit implausible. These usage numbers suggest that you need to support Firefox 78 and later to support >95% of Firefox users, because 4% of users are on versions 11+36+43+44+52+56+72.

Is the underlying data for this, at least current if not historical, available for inspection?

@Fyrd
Copy link
Owner

Fyrd commented Jun 4, 2023

This was affecting all browsers.

The source of the data is the StatCounter usage data for May, see https://gs.statcounter.com/browser-version-market-share/desktop-console/worldwide/#monthly-202305-202305-bar then "Download data (.csv)". It's multiplied by the Desktop/mobile ratio to result in the numbers you see. There might be some other tweaks but I think that's mostly it.

Seems SC rounds usage numbers to two significant digits so 0.01 is the lowest it gets, resulting in values like 0.00477 after multiplying by the desktop/mobile ratio.

@foolip
Copy link
Contributor Author

foolip commented Jun 5, 2023

Seems SC rounds usage numbers to two significant digits so 0.01 is the lowest it gets, resulting in values like 0.00477 after multiplying by the desktop/mobile ratio.

Ah, this is interesting. Do you know if it's rounding, or if it might be flooring/truncating? If it's rounding, then 0.01 means the real usage is in 0.005≤x<0.015. If it's flooring, it means the real usage is 0.01≤x<0.02.

Even without knowing the answer to this, perhaps it would be possible to scale the numbers so that they add up to the overall usage of that browser claimed by SC?

@foolip
Copy link
Contributor Author

foolip commented Jun 5, 2023

OK, so I've tried it myself. I downloaded the CSV data from these two pages:
https://gs.statcounter.com/browser-market-share/desktop-console/worldwide#monthly-202305-202305-bar
https://gs.statcounter.com/browser-version-market-share/desktop-console/worldwide/#monthly-202305-202305-bar

The numbers in the CSV files don't match exactly what's shown on the pages, but I've compared the percentage from the first CSV file against the sum of versions in the second. The latter is smaller in the cases I've checked:

  • Chrome is 65.11% vs. 64.96%
  • Edge is 10.95% vs. 10.84%
  • Firefox is 5.23% vs. 4.93%
  • Safari is 12.7% vs. 12.62%

So it seems like it will always undercount. As long as one is aware of this that's fine. @Fyrd are the overall numbers also in fulldata-json/data-2.0.json so that one can take it into account when running an analysis?

@Fyrd
Copy link
Owner

Fyrd commented Jun 6, 2023

So it seems like it will always undercount. As long as one is aware of this that's fine. @Fyrd are the overall numbers also in fulldata-json/data-2.0.json so that one can take it into account when running an analysis?

Interesting! They are not, I hadn't ever dug quite this deep into the data to notice this discrepancy.

@foolip
Copy link
Contributor Author

foolip commented Jun 6, 2023

Is this the data used when computing the global availability show on caniuse.com? If so, then fixing this somehow should move the numbers closer to 100%.

@atjn
Copy link
Contributor

atjn commented Jun 6, 2023

You can use DNSSEC as an easy test for this. All browsers have partial support, so the number should be 100%, but it is currently 98.32%

@foolip
Copy link
Contributor Author

foolip commented Jun 7, 2023

Yep, looks like the sum of all of the usage_global numbers is 98.362%, as of commit fbcdca0. Before 87ff4c1 it was 99.987%, so quite close to 100%. (But was that just luck, and due to the bug just fixed?)

I actually wouldn't expect these numbers to add up to 100% since there is no "other" category, but I also don't know if 98.362% is the right number.

@Fyrd
Copy link
Owner

Fyrd commented Jun 9, 2023

Is this the data used when computing the global availability show on caniuse.com? If so, then fixing this somehow should move the numbers closer to 100%.

Yes, I'll definitely look into doing so.

@jensimmons
Copy link
Contributor

Screenshot 2023-12-29 at 10 30 15 AM

It's problematic that something that's universally supported says "98.21%" support.

This makes developers believe most technology isn't supported enough yet to use. I'm sure there are people thinking "If CSS Grid is only supported by 97% of browsers, gosh I probably should not use it unless I have a fallback plan. 3% of users is a lot of users on this very popular site." When the truth is 99% of users have Grid.

I'd love for this bug to be prioritized. It impacts the purpose of Can I Use.

@andersk
Copy link

andersk commented Jan 5, 2024

In caniuse-db 1.0.30001574, the US usage of Firefox 11 (released in 2012) has jumped to 0.20214%, which is pretty implausible and surprising for those using a > 0.2% in US browserslist query.

@Fyrd
Copy link
Owner

Fyrd commented Jan 5, 2024

@andersk Can confirm this is coming from the StatCounter data (download CSV for the full data). Occasionally there are some strange anomalies. You can use their feedback form to inquire further, I can update the caniuse data if they make a change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants