Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothetical feature analysis #3

Open
dfabulich opened this issue May 30, 2023 · 13 comments · May be fixed by #5
Open

Hypothetical feature analysis #3

dfabulich opened this issue May 30, 2023 · 13 comments · May be fixed by #5

Comments

@dfabulich
Copy link
Owner

@foolip wrote in web-platform-dx/web-features#174 (comment)

I'm wondering if instead of computing a time to availability for each feature, if it's possible to compute the time to 90/95/98% availability at different points in time for a hypothetical feature that is enabled at the same time in all browsers? This is the worst case for a new feature, because any time a feature lands at different times, there's a "head start" for the early browsers when it lands in the final browser and we start counting.

In other words, for every month of the past 5 years, assuming a feature is available in every subsequent browser release, how many months does it take before that hypothetical features is available to 90/95/98% of users? Both a per-browser number of months and an overall number would be interesting.

My expectation is that per-browser 95% answers will roughly match web-platform-dx/web-features#190 (comment), and that the overall 95% availability delay in months will be very sensitive to the precise inputs for the browsers with the slowest upgrade rates, both what its market share is and how fast users upgrade.

@foolip
Copy link
Contributor

foolip commented May 30, 2023

Given the issue title I'm not sure I've successfully communicated the idea. I'm not looking to identify the slowest browser, but to estimate (using browser+version data from statcounter/rumarchive/something) how long it will take a new feature released in all browsers to get to a certain level of availability. Additionally, I suspect that number has been decreasing in the past years, which is why I suggest computing that number for every month over the past 5 years, assuming the historical data required exists.

@dfabulich
Copy link
Owner Author

Yeah, I still don't get what you're going for, but I just published a new cohort analysis on https://github.com/dfabulich/baseline-calculator/ showing how the numbers have changed over time.

Focusing just on the "80% of features" column, here are the results for all cohort years:

Year 80% share 90% share 95% share 97% share 98% share 99% share
2015 19 months 42 months 54 months 83 months never never
2016 19 months 36 months 48 months never never never
2017 10 months 24 months 44 months never never never
2018 4 months 16 months 36 months never never never
2019 2 months 14 months 40 months never never never
2020 0 months 8 months 30 months never never never
2021 2 months 10 months never never never never
2022 2 months never never never never never
2023 never never never never never never

Does this cohort analysis capture the question you were asking? In particular, it confirms your suspicion that the number has been decreasing in past years.

(If you're happy, LMK and I'll close the issue. If you want more analysis, I think it would help if you explained an example using fake data.)

@foolip
Copy link
Contributor

foolip commented May 31, 2023

Does the cohort analysis take all of the features for a given year and treat them as a group? If so that might approximate what I'm looking for, but it seems fundamentally like an unnecessary indirection to consider real features when trying to answer the question of how long it takes new features to reach a certain threshold in the worst case.

I'm also curious what the starting point is for the measurement, is it when the feature is first available in any browser, or when it's first available in all browsers?

@dfabulich
Copy link
Owner Author

Does the cohort analysis take all of the features for a given year and treat them as a group?

Yes.

I'm also curious what the starting point is for the measurement, is it when the feature is first available in any browser, or when it's first available in all browsers?

It's the date the feature is first available in all major browsers, which I define as the "keystone release date" in the README. It's the release date of the browser version of the last major browser to support the feature.

For example, the CSS revert value feature was first available in Safari 9.1 in 2016, then in Firefox 67 in 2019, then Chrome 84 on July 14, 2020, then in Edge 84 on July 16, 2020.

So, in this example, Edge was the "keystone browser" for the feature; Edge 84 was the "keystone version," and the "keystone release date" was July 16, 2020.

More broadly, the premise of what I'm doing is a survival analysis. https://statsandr.com/blog/what-is-survival-analysis/

Survival analysis comes from the field of medicine, where we make a line graph (a "survival function") that answers the question: "given a group of patients who have been diagnosed with cancer on various dates, what percentage of them are alive N months after their diagnosis?"

In a survival-function graph, the X axis is a time duration (T); the Y axis is "percentage alive." At T=0, Y=100%, which is to say, each patient is alive on the day they were diagnosed. At T=1 month, X% of them are alive. At T=2 months, Y% of them are alive. The analysis might end after T=12 months, and by the end, Z% are alive. The final Z% is usually not 0%; some people manage to beat the cancer (at least for the duration of the study).

Note that T=0 is no particular date. The patients may have been diagnosed with cancer throughout the year. But, in a survival analysis, we line them all up, so T=0 for Patient 1 might be Jan 3, T=0 for Patient 2 might be Feb 19. We can then look at T=1 for Patient 1 (on Feb 3) and T=1 for Patient 2 (Mar 19).

image
Using this line graph, we can ask questions like: "How long does it take before Q% of the patients die?"

Translating that into our problem, we have to identify a start date (an equivalent to "diagnosis day") for each feature, and an equivalent to the date of death. I picked the "keystone release date" as the start date. For the date of "death," we're measuring the duration until a good event, the date that the feature achieves a certain threshold of market share.

Of course we can argue about whether the threshold should be 90% market share or 95% market share, so I computed six survival functions, for 80% market share, 90% market share, 95% market share, 97% market share, 98% market share, and 99% market share.

With those six survival functions, I could ask, "how long after the keystone release date does it take for 80% of features to achieve 95% market share?"

If so that might approximate what I'm looking for, but it seems fundamentally like an unnecessary indirection to consider real features when trying to answer the question of how long it takes new features to reach a certain threshold in the worst case.

OK, so, it sounds like you have another analysis in mind, but I still don't understand what it is. Could you explain the analysis you'd like to run, perhaps using fake data?

Perhaps you could write something like this:

With these three fake data points:

  • Feature X which has keystone release date July 2020, and achieved 95% share in Jan 2023
  • Feature Y which has keystone release data Nov 2021, and has not yet achieved 95% share
  • Feature Z which has keystone release date Jan 2019, and achieved 95% share in Nov 2022

We would take X, Y, and Z, and blah blah blah, and so the result of the computation would be "QQ months."

But I think, in fact, your criticism might be the very idea of "keystone release date" as a start date. Instead of treating each feature as a patient and treating their diagnosis date as the keystone release date, you might prefer to treat a tuple of <Browser, Feature> as the unit of measure.

So, for CSS revert value, instead of the patient being "CSS revert value," we might have four patients:

  1. Chrome, CSS revert value, released 2020
  2. Edge, CSS revert value, released 2020
  3. Firefox, CSS revert value, released 2019
  4. Safari, CSS revert value, released 2016

And then, T=0 would be the actual release date of the feature on each browser.

But the question then is: what's the date of "death"? For my survival analysis, the date of death was "X% market share across all browsers, including IE, UC, Opera Mini, etc." Is that what you'd use, too?

CSS revert value has never reached 95% market share, but it achieved 90% market share on June 5, 2021.

What's the date of "death" for Safari/revert? Is it June 5, 2021? Is the date of death the same for all browsers? If so, aren't we, effectively, just detecting the slowest browser to upgrade?

If not, what are we measuring here, and why?

@foolip
Copy link
Contributor

foolip commented Jun 1, 2023

Thanks for the detailed response, @dfabulich. To be honest, I am not spoiled with good READMEs, so I always go directly to the code to understand what something is doing, and couldn't work out what the keystone date was. But your README is excellent, and I've read it all now.

Thinking about this like survival analysis makes sense I think, it's roughly equivalent to looking at an S-curve of new versions adoption or an exponential decay of old versions and finding a 90/95/98% point.

your criticism might be the very idea of "keystone release date" as a start date

No, I think your definition is spot on, it's exactly what I hoped it was, and the README says as much :)

OK, so, it sounds like you have another analysis in mind, but I still don't understand what it is. Could you explain the analysis you'd like to run, perhaps using fake data?

Yes, the analysis I'm after is of a single fake feature where a feature is enabled at the same time in all browser engines, and is available in all subsequent browser releases. I think this is the most straightforward way to approach the question "How many months/years does it take for a feature to reach widespread availability?" Using real features with a spread of release dates and counting features in the caniuse.com database can only add noise to the analysis:

  • When release dates are far apart, the time from keystone release to 95% availability is necessarily shorter due to the "head start" for some browsers. By assuming a single date, the analysis simpler and the answer is the worst case delay, i.e., more conservative.
  • The features in caniuse.com are not random, they're features that are more likely problematic, that's why they were added.

The analysis I have in mind would not need the caniuse feature data at all, only browser+version market share over time. It looks like the data in historical-feature-data.json is availability per feature, but what's needed is a breakdown in the style of web-platform-dx/web-features#190 (comment).

Given that data, we could pick a date and assume that our hypothetical feature is in every subsequent browser release, because it was enabled in all browser engines at that date. Stepping forward in time, we should eventually pass the 95% threshold.

Note that while I like your definition of keystone release date, it's not quite the same as the date I'm suggesting here. The starting point here is the date when the feature is enabled, so closer to the time it lands in the first browser than the keystone release date when it lands in the last browser. This is OK in this analysis because we're assuming it's enabled at the same time in all browser engines, and the question is how fast that change propagates in the ecosystem.

Does that make sense?

@dfabulich dfabulich changed the title Slowest browser analysis Hypothetical feature analysis Jun 1, 2023
@dfabulich
Copy link
Owner Author

We talked for a while on chat, but I think the main question I have going into a conversation with you about this emerged during chat in this form.

In order to begin a survival analysis, there has to be a table of data, where each row has an ID, a start date, and a nullable end date.

I don't yet understand how, in a "hypothetical feature" analysis, I'd construct a table like that. What would the IDs be? What would the start and end dates be?

We've toyed around with a couple ideas:

  • The IDs could be a tuple of <Browser, Feature>, e.g. "Safari / focus-visible." The start date would be the release date of the first Safari version to support focus visible. But what would be the end date?
  • The IDs could be be browser versions, e.g. "Safari 9.1." The release date would be the release date of the version, and the end date would be the "death" of that version. But when is that? When do browser versions "die"?

If you were able to give me an answer of that form, I'd know exactly what to do with it!

@dfabulich
Copy link
Owner Author

For your convenience, I've pushed https://github.com/dfabulich/baseline-calculator/blob/main/historical-browser-data.json which just tracks the global market share for each browser version for the entirety of the git history of caniuse/data.json, without tracking the history of any particular feature.

In chat, you suggested that you'd be able to use this to "write two for loops" based on this data. I'm looking forward to seeing what you come up with. May the fors be with you!

@foolip
Copy link
Contributor

foolip commented Jun 2, 2023

Thanks @dfabulich, I will give it a shot! I'm just kidding with the for loops, because I find that so much of the code I write ends up being two nested for loops 😆

@foolip
Copy link
Contributor

foolip commented Jun 2, 2023

Release dates for each browser version are also needed for this analysis, but oddly I can't find those in https://github.com/Fyrd/caniuse. There are dates on caniuse.com, so I don't know why.

I'll get the release dates from BCD for now.

@dfabulich
Copy link
Owner Author

The release dates are in caniuse/fulldata-json/data-2.0.json, not in caniuse/data.json.

@foolip
Copy link
Contributor

foolip commented Jun 2, 2023

OK, I've written some code. Turns out it was 4 nested for loops, not 2 :)

The code is in 07d5407 and I can send a PR if there's a license, see #4.

I don't think the results seem sensible, but here's the output. Inlining the most interesting data, the computed availability for a feature released on Jan 1, 2020 as of June 1, 2023:

  -> 2023-06-01: chrome availability 95.67% (23.15% / 24.20%)
  -> 2023-06-01: edge availability 99.23% (4.89% / 4.93%)
  -> 2023-06-01: firefox availability 83.08% (2.34% / 2.81%)
  -> 2023-06-01: safari availability 96.31% (5.01% / 5.20%)
  -> 2023-06-01: TOTAL availability 95.28% (35.39% / 37.14%) (of the included browsers)

The Firefox data seems very suspect, 83.08% after 3.5 years is very different from the 95% after 1 year I found in web-platform-dx/web-features#190 (comment) based on rumarchive.com data.

Poking at the Firefox data is the next step.

@foolip
Copy link
Contributor

foolip commented Jun 2, 2023

https://gist.github.com/foolip/87088161970e9cc152cc8b425f8474ca is a dump of the caniuse data for Firefox usage from 2023-05-22T09:14:47.000Z, which is what ends up being used to compute the 83.08% number in the previous comment.

Edit: I've filed #7 about this.

@foolip foolip linked a pull request Jun 3, 2023 that will close this issue
@foolip
Copy link
Contributor

foolip commented Jun 3, 2023

I've sent #5 with my for-loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants