Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it OK to let the publishers know the SXG distributor's URL? #433

Open
horo-t opened this issue May 13, 2019 · 13 comments
Open

Is it OK to let the publishers know the SXG distributor's URL? #433

horo-t opened this issue May 13, 2019 · 13 comments
Labels
architecture Big design questions that need to be figured out. feature request Pending Security + Privacy Review

Comments

@horo-t
Copy link
Collaborator

horo-t commented May 13, 2019

I have stopped working on Signed Exchange Reporting for publishers. w3c/network-error-logging#99 (comment)
This is mainly because it is not clear whether is it OK to let the publisher know the distributor's URL.

If my understanding is correct, we can't know redirecting URLs.
The publisher can only know "aggregator.example/feed" using referrer in this case:

  • aggregator.example/feed -> redirect.example/publisher.example/article (returns 301 redirect) -> publisher.example/article

So I think it sounds reasonable not to let the publisher know the SXG distributor's URL.

And also if the publishers can know the distributor's URL, this can be used for user tracking.
Example:
https://TRACKING_ID.distributor.example/publisher.example/article.html.sxg
https://distributor.example/TRACKING_ID/publisher.example/article.html.sxg
I think we should avoid adding new features which can be used for user tracking.

@yoavweiss
Copy link
Collaborator

Is the above different from the distributor adding an <a ping="publisher.example/collection_endpoint/TRACKING_ID"> to the SXG link?

@horo-t
Copy link
Collaborator Author

horo-t commented May 13, 2019

From the user tracking point of view, it is almost same.

@yoavweiss
Copy link
Collaborator

Well, since we enable ping attributes, onclick events, redirections and multiple other ways of communicating that information out-of-band, I don't see why we'd want to block this particular one.

@sleevi
Copy link

sleevi commented May 13, 2019 via email

@yoavweiss
Copy link
Collaborator

Maybe I misunderstood the threat model. Are we worried that publishers will know about tracking IDs without the distributor actively communicating them?

@sleevi
Copy link

sleevi commented May 13, 2019

Fair point. I don't think we've articulated a particular threat model or set of threat models. The question is, as I understood it, trying to articulate the potential risks.

We've got at least four parties in play:

  • The Publisher, the provider of the SXG
  • The Distributor, the hoster of the SXG resource
  • The "Sender", the one who provides the outbound link or initiates loading of SXG (I hesitate to use the term aggregator, as I think it implies a particular type of Sender)
  • The User, for whom we are the agent.

One potential risk is, as @horo-t noted, about active collusion between Publisher and Distributor in a way that allows them to exchange information.

  • As you note, if the Sender is colluding, then there's no need for the Publisher to have special capabilities / adding special capabilities does not change this, since they can always get the information out-of-band from the Sender.
  • However, if the Sender is not colluding / party to the threat, then I think this changes the calclus

Another potential risk is where the User does not wish the Publisher to learn about the Distributor they're using, especially if it may reveal the User's activity with the Sender. This is one of the tensions we noted early on with reporting the Distributor to the Publisher - whether or not it aligns with the user's interests, or whether it allows the Publisher to learn information about the User based on the Distributor.

I'm sure there are other scenarios being overlooked here, but I think it's a fair point that we're likely talking about different things, but which collectively go to identify whether or not it's OK.

@jyasskin jyasskin added architecture Big design questions that need to be figured out. feature request labels May 13, 2019
@jyasskin
Copy link
Member

jyasskin commented May 13, 2019

#424 has an attempt to build an anti-tracking threat model, although I'm sure it's incomplete. In particular, there's only a tracking risk if the same request or JS environment exposes both the Distributor's tracking ID and the Publisher's tracking ID. Simply sending https://DISTRIBUTOR_TRACKING_ID.distributor.example/publisher.example/article.html.sxg to the publisher with no other information isn't enough, since it doesn't include the publisher's tracking ID.

<a ping> is specified to include credentials, so anyone trying to block tracking will need to change that along with any restrictions on reporting the distributor URL to the publisher. I think we should design for uncertainty in how far that effort will go, rather than assuming we need to pre-emptively block all communication via this one channel.

#424 doesn't cover the worry about the Publisher learning things about the User's interaction with the Sender because that's a more immediate risk than anti-tracking (i.e. fixing it doesn't assume a pile of other changes to the web platform). We should also write that more immediate threat model. If someone other than me can volunteer, it'll get done faster.

@sleevi
Copy link

sleevi commented May 13, 2019

Simply sending https://DISTRIBUTOR_TRACKING_ID.distributor.example/publisher.example/article.html.sxg to the publisher with no other information isn't enough, since it doesn't include the publisher's tracking ID.

I'm not sure that analysis is correct. When the SXG is loaded, the Publisher's JS will have access to localState and other storage mechanisms, and thus if the Distributor ID is exposed, can link that with the Publisher ID stored in localStorage and link them, right?

We talked a little about this in #347, in the context of what information about the content the Distributor can infer, as well as what affects the Distributor can have on the Publisher and how it loads its content.

@jyasskin
Copy link
Member

@sleevi I think having the DISTRIBUTOR_TRACKING_ID in the publisher's javascript context is "with other information", but I want us to be clear that it's the combination that enables tracking. Maybe browsers that want to prevent identifier correlation can block JS access to the full URL but still send reports to the publisher via credential-less HTTP requests, for example.

@igrigorik
Copy link
Member

Re, linking+tracking: we briefly explored this in the Navigation Timing thread, but I'll reiterate it here as well.. We don't necessarily need to expose the full distributor URL to satisfy some of the core use cases from a publisher's perspective: providing the origin, or even just the eTLD+1 if we're concerned about tracking ID being embedded in the subdomain, is likely good enough.

In absence of any signals about the distributor, my fear is that the publishers would simply default to whitelisting a small set of distributors that "they trust" — e.g. trust not to have negative impact on user experience, expect insights from distributors about such page loads in return, etc.

@sleevi
Copy link

sleevi commented May 17, 2019

If #430 was resolved in such a way that Publishers had to explicitly allowlist Distributors, would that obviate the need for exposing this information?

Alternatively, it would seem if #430 is not accepted - that is, neither an allowlist nor blocklist is pursued - it would seem that would obviate the Publisher’s ability to place such restrictions, thus reducing the risk of such ossification of Distributors.

It’s not clear to me, though, which of these outcomes was being imagined with the remark about lacking signals.

@twifkak
Copy link
Collaborator

twifkak commented May 19, 2019

However, if the Sender is not colluding / party to the threat, then I think this changes the calclus

How does this change the calculus of distributor/publisher being able to correlate tracking IDs? The sender (aka embedder) cannot read or intercept the bytes that the distributor sends to the user. ISTM the distributor could send an unsigned response containing any of the usual tricks, followed by a redirect to the SXG, and the sender couldn't prevent this with static analysis. (Perhaps there's another way the sender could prevent this that I'm not seeing...)

@sleevi
Copy link

sleevi commented May 20, 2019

How does this change the calculus of distributor/publisher being able to correlate tracking IDs?

Apologies, as I suppose my example could have been clearer. I was trying to highlight that there are more risks than 'just' tracking IDs, and this adds to the security and privacy complexity.

The Sender may not wish for the Publisher to learn about what the user is doing on the Sender. This is similar to, but distinct from, the User Privacy case; I think that many of their mitigations end up looking similar, but there are cases where the User may be fine with the information leakage, but the Sender not, and vice-versa, which is why I tried to enumerate them as separate in the considerations.

Basically, I'm trying to treat all of this from the lens of side-channels. Functionally, the Distributor gets to be the 'last mile' to the user for Publisher's content. This can be seen in some of the motivations for the RUM use-case. However, SXGs allow both the Sender and Distributor to look into what Publisher is sending, and further, be able to influence how that information is loaded. This can lead to accidental or intentional side channels (like those in #347 ) that allow the Sender/Distributor to learn not just about Publisher's content, but about the User's state at Publisher.

From the perspective of URLs, I'm trying to work through cases where the User or Sender may not want the Publisher to know about the URL they're viewing (or that of the Distributor's), and then see if and how we can balance those cases against the desire for the Publisher to control the Distributor (as in #430 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Big design questions that need to be figured out. feature request Pending Security + Privacy Review
Projects
None yet
Development

No branches or pull requests

6 participants