Is it OK to let the publishers know the SXG distributor's URL? #433

horo-t · 2019-05-13T02:14:41Z

I have stopped working on Signed Exchange Reporting for publishers. w3c/network-error-logging#99 (comment)
This is mainly because it is not clear whether is it OK to let the publisher know the distributor's URL.

If my understanding is correct, we can't know redirecting URLs.
The publisher can only know "aggregator.example/feed" using referrer in this case:

aggregator.example/feed -> redirect.example/publisher.example/article (returns 301 redirect) -> publisher.example/article

So I think it sounds reasonable not to let the publisher know the SXG distributor's URL.

And also if the publishers can know the distributor's URL, this can be used for user tracking.
Example:
https://TRACKING_ID.distributor.example/publisher.example/article.html.sxg
https://distributor.example/TRACKING_ID/publisher.example/article.html.sxg
I think we should avoid adding new features which can be used for user tracking.

yoavweiss · 2019-05-13T05:52:29Z

Is the above different from the distributor adding an <a ping="publisher.example/collection_endpoint/TRACKING_ID"> to the SXG link?

horo-t · 2019-05-13T13:04:50Z

From the user tracking point of view, it is almost same.

yoavweiss · 2019-05-13T13:50:58Z

Well, since we enable ping attributes, onclick events, redirections and multiple other ways of communicating that information out-of-band, I don't see why we'd want to block this particular one.

sleevi · 2019-05-13T13:54:39Z

I’m not sure I see how you see those others as substantively similar. They seem very different than what I’ve seen discussed/proposed. For example, `ping` lets you initiate as the outgoing link - but the proposal here does not establish any relationship between the distributor and the link provider. Further, `ping` let’s you notify that a link is being followed, but not the final target of that link. I think it’s unlikely to sway anyone to simply state that there are parallels, because it doesn’t establish the parallels, nor that those current behaviours are desired. A more thorough analysis would be useful, if that is the one to justify this.

yoavweiss · 2019-05-13T14:40:23Z

Maybe I misunderstood the threat model. Are we worried that publishers will know about tracking IDs without the distributor actively communicating them?

sleevi · 2019-05-13T14:52:16Z

Fair point. I don't think we've articulated a particular threat model or set of threat models. The question is, as I understood it, trying to articulate the potential risks.

We've got at least four parties in play:

The Publisher, the provider of the SXG
The Distributor, the hoster of the SXG resource
The "Sender", the one who provides the outbound link or initiates loading of SXG (I hesitate to use the term aggregator, as I think it implies a particular type of Sender)
The User, for whom we are the agent.

One potential risk is, as @horo-t noted, about active collusion between Publisher and Distributor in a way that allows them to exchange information.

As you note, if the Sender is colluding, then there's no need for the Publisher to have special capabilities / adding special capabilities does not change this, since they can always get the information out-of-band from the Sender.
However, if the Sender is not colluding / party to the threat, then I think this changes the calclus

Another potential risk is where the User does not wish the Publisher to learn about the Distributor they're using, especially if it may reveal the User's activity with the Sender. This is one of the tensions we noted early on with reporting the Distributor to the Publisher - whether or not it aligns with the user's interests, or whether it allows the Publisher to learn information about the User based on the Distributor.

I'm sure there are other scenarios being overlooked here, but I think it's a fair point that we're likely talking about different things, but which collectively go to identify whether or not it's OK.

jyasskin · 2019-05-13T21:01:21Z

#424 has an attempt to build an anti-tracking threat model, although I'm sure it's incomplete. In particular, there's only a tracking risk if the same request or JS environment exposes both the Distributor's tracking ID and the Publisher's tracking ID. Simply sending https://DISTRIBUTOR_TRACKING_ID.distributor.example/publisher.example/article.html.sxg to the publisher with no other information isn't enough, since it doesn't include the publisher's tracking ID.

<a ping> is specified to include credentials, so anyone trying to block tracking will need to change that along with any restrictions on reporting the distributor URL to the publisher. I think we should design for uncertainty in how far that effort will go, rather than assuming we need to pre-emptively block all communication via this one channel.

#424 doesn't cover the worry about the Publisher learning things about the User's interaction with the Sender because that's a more immediate risk than anti-tracking (i.e. fixing it doesn't assume a pile of other changes to the web platform). We should also write that more immediate threat model. If someone other than me can volunteer, it'll get done faster.

sleevi · 2019-05-13T22:11:27Z

Simply sending https://DISTRIBUTOR_TRACKING_ID.distributor.example/publisher.example/article.html.sxg to the publisher with no other information isn't enough, since it doesn't include the publisher's tracking ID.

I'm not sure that analysis is correct. When the SXG is loaded, the Publisher's JS will have access to localState and other storage mechanisms, and thus if the Distributor ID is exposed, can link that with the Publisher ID stored in localStorage and link them, right?

We talked a little about this in #347, in the context of what information about the content the Distributor can infer, as well as what affects the Distributor can have on the Publisher and how it loads its content.

jyasskin · 2019-05-13T22:27:22Z

@sleevi I think having the DISTRIBUTOR_TRACKING_ID in the publisher's javascript context is "with other information", but I want us to be clear that it's the combination that enables tracking. Maybe browsers that want to prevent identifier correlation can block JS access to the full URL but still send reports to the publisher via credential-less HTTP requests, for example.

igrigorik · 2019-05-17T20:35:07Z

Re, linking+tracking: we briefly explored this in the Navigation Timing thread, but I'll reiterate it here as well.. We don't necessarily need to expose the full distributor URL to satisfy some of the core use cases from a publisher's perspective: providing the origin, or even just the eTLD+1 if we're concerned about tracking ID being embedded in the subdomain, is likely good enough.

In absence of any signals about the distributor, my fear is that the publishers would simply default to whitelisting a small set of distributors that "they trust" — e.g. trust not to have negative impact on user experience, expect insights from distributors about such page loads in return, etc.

sleevi · 2019-05-17T21:22:59Z

If #430 was resolved in such a way that Publishers had to explicitly allowlist Distributors, would that obviate the need for exposing this information?

Alternatively, it would seem if #430 is not accepted - that is, neither an allowlist nor blocklist is pursued - it would seem that would obviate the Publisher’s ability to place such restrictions, thus reducing the risk of such ossification of Distributors.

It’s not clear to me, though, which of these outcomes was being imagined with the remark about lacking signals.

twifkak · 2019-05-19T05:25:54Z

However, if the Sender is not colluding / party to the threat, then I think this changes the calclus

How does this change the calculus of distributor/publisher being able to correlate tracking IDs? The sender (aka embedder) cannot read or intercept the bytes that the distributor sends to the user. ISTM the distributor could send an unsigned response containing any of the usual tricks, followed by a redirect to the SXG, and the sender couldn't prevent this with static analysis. (Perhaps there's another way the sender could prevent this that I'm not seeing...)

sleevi · 2019-05-20T05:46:55Z

How does this change the calculus of distributor/publisher being able to correlate tracking IDs?

Apologies, as I suppose my example could have been clearer. I was trying to highlight that there are more risks than 'just' tracking IDs, and this adds to the security and privacy complexity.

The Sender may not wish for the Publisher to learn about what the user is doing on the Sender. This is similar to, but distinct from, the User Privacy case; I think that many of their mitigations end up looking similar, but there are cases where the User may be fine with the information leakage, but the Sender not, and vice-versa, which is why I tried to enumerate them as separate in the considerations.

Basically, I'm trying to treat all of this from the lens of side-channels. Functionally, the Distributor gets to be the 'last mile' to the user for Publisher's content. This can be seen in some of the motivations for the RUM use-case. However, SXGs allow both the Sender and Distributor to look into what Publisher is sending, and further, be able to influence how that information is loaded. This can lead to accidental or intentional side channels (like those in #347 ) that allow the Sender/Distributor to learn not just about Publisher's content, but about the User's state at Publisher.

From the perspective of URLs, I'm trying to work through cases where the User or Sender may not want the Publisher to know about the URL they're viewing (or that of the Distributor's), and then see if and how we can balance those cases against the desire for the Publisher to control the Distributor (as in #430 )

jyasskin mentioned this issue May 13, 2019

Q: referrer from signed http exchanged page #206

Closed

jyasskin added architecture Big design questions that need to be figured out. feature request labels May 13, 2019

jyasskin added the Pending Security + Privacy Review label May 13, 2019

igrigorik mentioned this issue May 14, 2019

Navigation Timing behavior for HTTP Exchange (SxG) loading w3c/navigation-timing#107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it OK to let the publishers know the SXG distributor's URL? #433

Is it OK to let the publishers know the SXG distributor's URL? #433

horo-t commented May 13, 2019

yoavweiss commented May 13, 2019

horo-t commented May 13, 2019

yoavweiss commented May 13, 2019

sleevi commented May 13, 2019 via email

yoavweiss commented May 13, 2019

sleevi commented May 13, 2019 •

edited

jyasskin commented May 13, 2019 •

edited

sleevi commented May 13, 2019

jyasskin commented May 13, 2019

igrigorik commented May 17, 2019

sleevi commented May 17, 2019

twifkak commented May 19, 2019 •

edited

sleevi commented May 20, 2019

Is it OK to let the publishers know the SXG distributor's URL? #433

Is it OK to let the publishers know the SXG distributor's URL? #433

Comments

horo-t commented May 13, 2019

yoavweiss commented May 13, 2019

horo-t commented May 13, 2019

yoavweiss commented May 13, 2019

sleevi commented May 13, 2019 via email

yoavweiss commented May 13, 2019

sleevi commented May 13, 2019 • edited

jyasskin commented May 13, 2019 • edited

sleevi commented May 13, 2019

jyasskin commented May 13, 2019

igrigorik commented May 17, 2019

sleevi commented May 17, 2019

twifkak commented May 19, 2019 • edited

sleevi commented May 20, 2019

sleevi commented May 13, 2019 •

edited

jyasskin commented May 13, 2019 •

edited

twifkak commented May 19, 2019 •

edited