-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge mechanisms #376
Comments
I think there's a tension here; the more work put on repudiation (effectively what this is), the less useful this becomes to actually enabling a distributed, decentralized, or more robust Internet, which is at least where some of the interesting use cases are. Similarly, this might be mistaken as enabling a DRM-like solution, which is not a goal. Considering that these same issues already exist if, for example, a site sets a long cache lifetime, I think it may be worthwhile exploring more about the "what if we didn't offer a purge mechanism", as it seems like it comes with very, very high tradeoffs. |
In some sense, repudiation is already possible, by signing |
One reason I pointed out the risks above is that information and misinformation is currently getting weaponized on the Internet, and for information there are two important parts: The data and the source. If you can replay a historic (may it be 1 day historic) mistake or transient data as current information from a reputable source, that can be used to either mislead people or damage the reputation of a source that is trying to be accurate. Or both. Client caches don't have this particular weakness since you cannot transfer a cache to someone that you want to mislead. |
The proposal would compromise the anti-censorship use cases if the censor can compel the origin to respond to As @twifkak says, publishers can build a repudiation mechanism using Javascript even if we don't build this into the browser, so the main question is whether we want to build it in by default. A goal to be able to repudiate inaccurate content conflicts, although maybe not fatally, with @ericlaw1979's desire to make this work for Archival. I agree with @bratell-at-opera that SXGs are more of a risk than just long cache lifetimes, because they can be used maliciously, and if #300 goes toward trusting the SXG over the cache, that's not even persistently fixed by hitting reload, unlike the cache itself. |
Fetching the |
Good points regarding the tension of such validity mechanisms between keeping access to the content secret and making sure the content is still valid. From the publisher's side: From the user's side: |
Are there other use cases besides supporting censorship and geoblocking? It
seems developers already have the tools to accomplish that without
introducing additional SXG features, so I’m wondering what’s missing?
|
With SXG I can put on the face of someone else and replay something they have said/published. The problem I'm thinking of isn't the replaying operation (I think it's perfectly fine to archive and show old web pages), but that it will be attributed to the source even if the source has changed their mind. There has been numerous cases from the last few years where an initial news report turned out to be incorrect and was later changed, updated or corrected. As news sources have come under attack from decision makers it has become so much more important that information is up to date. If a decision maker or hostile power can replay a news source' mistakes or early best-effort reporting as "current reporting" that can be used to undermine the credibility of reputable news organizations, which is dangerous in so many ways. It might be that this feature would be too dangerous for a news source to use and they should stay away from it, but I don't trust everyone to fully understand the implications of using SXG, especially if it will be heavily promoted and use encouraged by portal sites (like those run by Google and Microsoft). Maybe there are ways to handle this scenario that I'm missing but with what I understand it seems like SGX comes packaged with a big foot gun. |
It seems like that threat model is “Something was published without
including subsequent corrections” - but that use case already seems
accounted for in the design. That is, as has been pointed out,
“responsible” organizations (which seems to be the presumption, given they
subsequently issue corrections) can, for example, use JS to ping to see if
there have been corrections. This doesn’t seem dissimilar from the same
root cause as publishing a vulnerable (e.g. XSS) SXG.
However, I think the tension here - between privacy on one hand and having
the “latest” edition, and censorship and being able to repudiate SXGs - is
somewhat intentional. SXG opts for more privacy and censorship resistance,
and that admittedly does come with trade offs.
|
An alternative to not displaying the resource (what you categorize as censorship) would be to stop labeling it with the original source. So instead of "foonews.com" it would be "foonews.com 3 June as per portalsite.com", but that seems to be opposite of what "portalsite.com" wants and I don't really see what UI design would want or allow such a complicated explanation. Or an interstitial "You are about to visit a revoked page, do you want to load the new version instead?". Might also not be the choice of a UI designer, but I don't see "refuse to display the page" (what you call censorship) as the only way to handle a page a site no longer wants to spread. Then if the goal is that only portalsite.com should know who reads the pages (and have it be secret from foonews.com), then maybe portalsite.com can have some kind of proxy for checking whether a page has been revoked. Again, I'm not saying that would be the solution, but that there may be ways to fix the foot gun without losing any features. |
I think there’s a first step, which is agreeing whether or not it
represents a problem. We should be very careful in designing functionality
that is inherently anti-privacy (functionally, the scheme just reinvented
online revocation checking, which most UAs find problematic). We’ve also
identified multiple alternatives that don’t require a new primitive and
achieve the same result. The argument for the new primitive seems to be
that it will be a “foot gun” unless it can be actively blocked at will, and
I’m not sure that’s a shared perspective. Have I missed why the multiple
options publishers have, including not publishing an SXG, aren’t sufficient?
|
|
To build on this thought a little bit: in a part of the world where all of foonews.com traffic is routed through a central point, SXG gives portalsite.com the ability to selectively downgrade part of foonews.com without disrupting access to the rest of it. So /sport might be downgraded, but /weather stays fresh. Since some pages are fresh, users may not notice that /sport is actually outdated. (This "attack" is most effective if the CDN delivers SXG in which all links go back to the CDN itself (not the origin), but the CDN could require this for performance or availability reasons.) Without SXG, a central point cannot disrupt access to a part of foonews.com--it can only block all of it, since it can't see what's being requested, or the content of individual URLs. (Similarly, portalsite.com could provide access to all of foonews.com with the exception of all articles that mention "bananas", which get a 404.)
Cautious origins will probably need to either not publish SXGs, or only publish to CDNs it trusts. Allowing any CDN to cache content seems risky (even though allowing anyone to copy content is usually a helpful anti-censorship technique). |
@ithinkihaveacat Note that there's no way for the CDN to deliver an SXG such that all links go back to the CDN itself. Even after bundling ships, the CDN would have to convince users to manually fetch content via the CDN, perhaps by blocking direct connections. |
@jyasskin There's not? I was thinking it would be possible, though via the mechanism of CDN policy, rather than a technical fix. e.g. a CDN will only cache content if all sub-resources are delivered from the CDN, and all links go back to the CDN. This would be faster for users, but would also lead users to think they're navigating around https://foonews.com, when they're actually on https://portalsite.com/s/foonews.com. |
The WebKit team has similar concerns. |
It seems that there is no definitely correct behavior for this. There is a significant amount of users who would prefer one behavior over the other to fulfill their needs (e.g. stronger privacy). This suggests that:
(edit: I removed the reference to Issue 388 since the root concern is still being explored). |
Yesterday, it was announced that Google Chrome will be shipping this: https://webmasters.googleblog.com/2019/04/instant-loading-amp-pages-from-your-own.html. AFAICT this has not yet been addressed. On the same day, Cloudflare announced support for Signed Exchanges: https://blog.cloudflare.com/announcing-amp-real-url/. |
Related to #324
While talking to @bratell-at-opera about signed exchanges, he raised concerns about invalid content continuing to circulate after its publisher realized it is invalid.
Thinking about that problem, it's very similar to the one for which CDNs offer purge mechanisms:
Talking to @jyasskin, the
validityUrl
can be used in order to verify that the content is still valid from the publisher's perspective. Although it was meant for intermediaries, we can use the same value for browser-side validation, to make sure the browser doesn't display invalid content.The browser would fetch the validity information when navigating to an SXG page. (assuming the browser is online)
If the SXG content is invalid, the browser would force its reload. Assuming that caches validate content regularly, very few users would actually witness those reloads, in the already rare case that content needed to be purged.
This won't solve purge for offline access, but that's similar to any offline content (e.g. in a PWA or a native app)
Thoughts?
The text was updated successfully, but these errors were encountered: