Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we archive this? #82

Open
marcoscaceres opened this issue Apr 15, 2020 · 21 comments
Open

Should we archive this? #82

marcoscaceres opened this issue Apr 15, 2020 · 21 comments

Comments

@marcoscaceres
Copy link
Contributor

I think this incubation might have run its course as it hasn't successfully gained the cross browser support we had hoped for in the last 7 years. Thus, I'd like to propose archiving this incubation.

Mozilla remains fairly opposed to this work (with the rationale being that it is "harmful"):
mozilla/standards-positions#117

I'm not sure WebKit folks have taken a stand on it. @othermaciej?

cc @WICG/chairs

@othermaciej
Copy link

On initial read, the Mozilla standards position makes a persuasive case tha this is harmful. It creates fingerprinting risk and yet provides info that is not sensibly actionable for a webpage. @hober csn you help collect feedback internally so we can come to an official position?

@yoavweiss
Copy link
Contributor

yoavweiss commented Apr 16, 2020

<chair-hat>
Generally, I don't think we should be archiving repos for specifications of shipped features, even if they are only shipped in a single engine.

Software's never "done". Web developers need a place to file issues against such specifications. Those specifications can evolve over time, even if the pace of change can be slowed down once the feature is shipped.
We need a place to maintain those specifications. I think it'd be great if it can be here.

At the same time, you're correct that such specifications are no longer being actively incubated. Maybe we can label them as such ("Shipped in a single engine") without archiving?
</chair-hat>

<spec-proponent-hat>
More specifically, reading through the Mozilla position, it seems that Mozilla finds value in at least some of the use-cases that this specification is solving, but disagrees about the way in which it solves them.
I'd have preferred if this feedback came in the form of issues against the spec.

FWIW, I'm more than happy to discuss with Mozilla and others ways in which we can slim down the specification to its core value (IMO - the low-entropy EffectiveConnectionType signal) and potentially add back a signal for metered connections, if we have line of sight to implementing it in a user-meaningful way.
</spec-proponent-hat>

@marcoscaceres
Copy link
Contributor Author

We can certainly try, but seriously need to scale this back. Can we maybe just try to metered (#84)?

@yoavweiss
Copy link
Contributor

metered and effectiveConnectionType? :)

(I wrote a long comment on the Mozilla position thread that can be summed up to that)

@marcoscaceres
Copy link
Contributor Author

I don't know... the effectiveConnectionType seems like something we will constantly need to keep adding to... maybe if it was just "slow", "average", "fast" and we can update what those mean from time to time.

@yoavweiss
Copy link
Contributor

Adding new values once every decade (when new cellular technology is introduced) seems easier than changing the semantics of existing values. But let's discuss in #85

@hober
Copy link

hober commented Apr 16, 2020

@marcoscaceres wrote:

[This incubation] hasn't successfully gained the cross browser support we had hoped for in the last 7 years. Thus, I'd like to propose archiving this incubation.

I agree.

@othermaciej wrote:

On initial read, the Mozilla standards position makes a persuasive case that this is harmful.

Indeed.

It creates fingerprinting risk and yet provides info that is not sensibly actionable for a webpage. @hober can you help collect feedback internally so we can come to an official position?

Our position is that this is harmful as specced, due to privacy and fingerprinting concerns.

Also, this API is probably not what developers want most of the time anyway. Developers want to understand what effective bandwidth is available, and that's best determined by measuring actual bandwidth instead of asking the browser to guess. I suppose developers might also want to know if the connection is or isn't metered.

@yoavweiss wrote:

Generally, I don't think we should be archiving repos for specifications of shipped features, even if they are only shipped in a single engine.

At the very minimum, specifications of single-engine features should be clearly marked as such. Ideally with a big red modal like WHATWG Review Drafts, stating that the spec documents the implementation of a feature that is implemented by only one engine, that it's unlikely to be implemented elsewhere, and that developers should refrain from using the features defined in it.

Maybe I should file a followup to WICG/admin#64 that suggests this approach more generally.

@yoavweiss
Copy link
Contributor

Developers want to understand what effective bandwidth is available

That's effectiveType

that's best determined by measuring actual bandwidth instead of asking the browser to guess

If you're suggesting active bandwidth measurements, that's typically harmful both for performance and for users' bandwidth costs.

@jyasskin
Copy link
Member

@hober, are you suggesting a server-side bandwidth measurement? i.e. "how fast am I pushing traffic to this particular client?"

@hober
Copy link

hober commented Apr 16, 2020

@hober, are you suggesting a server-side bandwidth measurement? i.e. "how fast am I pushing traffic to this particular client?"

Yes.

@yoavweiss
Copy link
Contributor

yoavweiss commented Apr 22, 2020

@hober, are you suggesting a server-side bandwidth measurement? i.e. "how fast am I pushing traffic to this particular client?"

Yes.

Server side bandwidth measurements are impractical for various reasons:

  • The application server is often decoupled from the actual server (e.g. think "Django" vs. "nginx")
  • The actual server can't always read the speeds in which its network stack is sending data down to a particular socket. Sockets buffer, and it's not always possible to distinguish userland "sends" that went into a buffer from ones that actually made it over the wire.
  • TCP terminators along the way can similarly buffer packets, resulting in bufferbloat. That means that even the network stack of the sending server cannot know the effective bandwidth. Those TCP terminators can include the origin's internal equipment (e.g. load balancers) or equipment in the operator network along the path.
  • CDNs complicate this even further, as they can proxy the responses, or even serve them entirely from cache. How can the origin measure bandwidth on requests it never sees?

There's no reporting infrastructure among all those different components that can send the "bandwidth measurements" to a single point. Beyond that, some of the use cases call for client-side decisions based on that information. How do you expect to do that with server-side measurement?

In summary, it's extremely hard, if not impossible, to correctly measure bandwidth from the sender(s) on today's Internet. It's significantly easier and cleaner to measure it from the (single) receiver, and act on that measurement from there.

@cwilso
Copy link
Member

cwilso commented Apr 23, 2020

I do not think we should archive repos from WICG that have shipped in browsers - primarily because archiving a repo locks it down as read-only. Issues cannot be created or commented on - and for example, we couldn't have this kind of conversation about what might make this a better, cross-platform-capable API.

It's unfortunate that Github doesn't allow labeling (tagging) of repos, because I think that would work pretty well here. We could:

  1. put a standard template header in the README.md of all WICG repos, that stated current status (possibly vendor signals and references?)
  2. revamp the WICG.io index to include signals

I am opposed to archiving as a matter of course.

@othermaciej
Copy link

Maybe there could be a separate WICG-Attic org for specs that are no longer in incubation (and thus not really in scope for WICG any more), but also with no path to get on the standards track? That plus a prominent message along the lines suggested by @hober would allow continued evolution with less potential for creating confusion

@othermaciej
Copy link

(For this specific spec, if we can change it to something likely to see broader implementation, then of course that would be even better.)

@othermaciej
Copy link

Reading closer, it seems to be connectionType and effectiveConnectionType are not likely to be useful to apps, or at least won't achieve positive user outcome on net.

Regarding connectionType, the cellular type spans everything from EDGE to 5g, which is such a side span of bandwidth and latency characteristics that it's basically useless. Likewise wifi and ethernet, which could be a very wide range depending on specific tech and characteristics of the uplink.

The spec suggests that effectiveConnectionType should be "determined using a combination of recently observed rtt and downlink values". If it's meant to be rtt and downlink to the same server, then it's not clear why the server couldn't measure it. Most of the issues raised by @yoavweiss would impact the client's measurement too. If it's meant to be rtt and downlink to any server, then there's a risk it creates an unacceptable side channel. The spec doesn't address side channel risk.

downlinkMax doesn't seem useful as written, no decision should be made based on only theoretical characteristics of the first hop.

Overall, these things don't seem worth the privacy cost, and the Privacy Considerations section is dismissive about the relevant privacy issues.

Overall, it does not seem like a good idea for web apps (or native apps) to make decisions based on guesses about the network path between client and server. For example, adaptive streaming works without the need for APIs like this because it observes the actual bandwidth/latency and adapts.

It does seem like knowing if the connection is metered or not can help a web app make informed decisions that benefit the user. (Assuming the underlying platform reliably knows this info and can share it with the browser.) It's wrong to assume that all cellular connections are metered or that all wifi connections aren't, so that feature isn't actually provided by the spec as it stands.

@yoavweiss
Copy link
Contributor

Reading closer, it seems to be connectionType and effectiveConnectionType are not likely to be useful to apps, or at least won't achieve positive user outcome on net.

Can you elaborate on why you think effectiveConnectionType is not likely to be useful?
Evidence suggests otherwise.

Regarding connectionType, the cellular type spans everything from EDGE to 5g, which is such a side span of bandwidth and latency characteristics that it's basically useless. Likewise wifi and ethernet, which could be a very wide range depending on specific tech and characteristics of the uplink.

I don't disagree and willing to work on removing that from the spec.

The spec suggests that effectiveConnectionType should be "determined using a combination of recently observed rtt and downlink values". If it's meant to be rtt and downlink to the same server, then it's not clear why the server couldn't measure it. Most of the issues raised by @yoavweiss would impact the client's measurement too.

That's not true. It's significantly easier to measure effective throughput on the receiver side than on the sender side.
There are multiple layers of "senders": application server, web server, load balancer, network "traffic shapers", CDN edge servers. On top of that, the decoupling of userland code from the TCP stack in the kernel means extra buffering happens at every one of those nodes along the way.
Contrary to that, there is only one meaningful "receiver" we want to measure - the browser.

If it's meant to be rtt and downlink to any server, then there's a risk it creates an unacceptable side channel. The spec doesn't address side channel risk.

Currently it is meant as an aggregate of past visited origins, and the risk of cross-origin leaks is mitigated. Do you see risks that those mitigations do not cover?

Aside: We really should outline those mitigations as part of the spec. Apologies for that.

downlinkMax doesn't seem useful as written, no decision should be made based on only theoretical characteristics of the first hop.

I don't disagree.

Overall, these things don't seem worth the privacy cost, and the Privacy Considerations section is dismissive about the relevant privacy issues.

I believe we can lower the "cost" (by removing the less useful parts).
Agree that the Privacy Considerations section can be improved.

Overall, it does not seem like a good idea for web apps (or native apps) to make decisions based on guesses about the network path between client and server. For example, adaptive streaming works without the need for APIs like this because it observes the actual bandwidth/latency and adapts.

Adaptive streaming is indeed an excellent example, as the client is responsible for requesting the adapted stream, based on bandwidth measurements.
Its adoption shows that it is useful to adapt the content to available network conditions.

ECT enables something extremely similar for the very different medium of websites: The browser performs bandwidth measurements, and that can inform client or server side logic as to what "bitrate level" of experience should the user be provided with.

It does seem like knowing if the connection is metered or not can help a web app make informed decisions that benefit the user. (Assuming the underlying platform reliably knows this info and can share it with the browser.) It's wrong to assume that all cellular connections are metered or that all wifi connections aren't, so that feature isn't actually provided by the spec as it stands.

I agree. This is being discussed in #84

@othermaciej
Copy link

Can you elaborate on why you think effectiveConnectionType is not likely to be useful?
Evidence suggests otherwise.

I may be underestimating how accurately it can be computed. But it seems like the computation involves looking at communication to multiple servers (so side channel risk) and thus is potentially inaccurate if the network path to different servers have different characteristics. A website saying that they want to use it is not very good evidence that it's good for the job.

Currently it is meant as an aggregate of past visited origins, and the risk of cross-origin leaks is mitigated. Do you see risks that those mitigations do not cover?

Aside: We really should outline those mitigations as part of the spec. Apologies for that.

Yeah, I don't think you can expect readers of the spec to know that an open issue with no PR describes crucial mitigations for a privacy problem with the spec. If you're saying that Chrome implemented this and considers it essential, then it should definitely go in the spec.

Unfortunately, the issue itself dives right into describing some mitigations without describing what problem it is trying to solve or how those mitigations address it, w hick makes it hard to evaluate whether they are enough.

@yoavweiss
Copy link
Contributor

I may be underestimating how accurately it can be computed. But it seems like the computation involves looking at communication to multiple servers (so side channel risk) and thus is potentially inaccurate if the network path to different servers have different characteristics. A website saying that they want to use it is not very good evidence that it's good for the job.

What we have is web developers saying that they are using it (in browsers where this is shipping) to provide improved analytics and differential content serving and experiences based on it. So I think it's safe to assume it's doing a reasonable job.

Regarding measurements from different servers, that's definitely prone to be skewed (e.g. if bandwidth to one server is significantly lower than to others), but the underlying assumption is that the last-mile is typically the bottleneck, at least in the cases we care about (slow networks).

I'd love to dive into the side-channel risk you mentioned and better understand it. Are you concerned that:

  • An origin will pretend to be slow in order to communicate a bit of information to following origins?
  • Origins can "correlate" users by using identical network speeds as a signal that it's the same user?
  • Something else?

Outlining the threat model would help us assess if current mitigations are sufficient.

@marcoscaceres
Copy link
Contributor Author

Mozilla has pref'ed off this Netinfo on Android:
https://groups.google.com/a/mozilla.org/g/dev-platform/c/u1QiOGUIUfk/m/B1MnwUyuCAAJ

@astearns
Copy link

astearns commented Jul 23, 2021

I opened issue #91 to push for one or more of the non-archiving, document-in-place options mentioned here to actually be done for this repo.

But after thinking about it a while, I am warming to the suggestion in #82 (comment) of having a separate org for WICG specs that are no longer in incubation.

I think it dilutes the incubation intent of WICG to have it host single-implementation things that no longer have a path to standardization. In the same way that WICG specs that successfully incubate get handed off to a standards group, WICG specs that do not gather multi-vendor interest should be handed off somewhere else to document the implementation. WICG should have a narrow focus on specs that still have a chance to succeed as standards

@tomayac
Copy link

tomayac commented Jul 23, 2021

Looks like we have at least a tendency toward agreement on proceeding with limiting the spec to just effectiveConnectionType (with modifications to how it works today, see #85) and the introduction of a metered flag (#84).

The Save-Data header and the prefers-reduced-data CSS user preference media feature seem to solve for the other use cases for both client and server.

I see no harm in leaving saveData in the spec. It is different from metered in that you may just want to be a good citizen and not tax a shared Wi-Fi too much, e.g., in a shared apartment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants