Intent to implement: Share tracking in AMP #3135

Open
rudygalfi opened this Issue May 6, 2016 · 11 comments

Projects

None yet

4 participants

@rudygalfi
Contributor
rudygalfi commented May 6, 2016 edited

The following doc proposes a mechanism for introducing share referral identifiers in AMP.

One relatively recent, and apparently very effective, development in “social media optimization” of content is to utilize fragment URLs to attach shared URLs to a particular page view. This allows tracking of viral shares and the resulting page impressions back to the original share across multiple social networks, instant messaging applications, etc.

Supporting this in AMP requires:

  • Expose new amp-analytics vars for “incoming share tracking identifier” and “outgoing share tracking identifier”.
  • Implement URL updating, including update of URL in viewer (e.g. search results page).
  • [Not in v1, but possibly future version] Expose a method by which a publisher can specify the “outgoing share tracking identifier” that should be used and set in the URL fragment after the page has loaded and the incoming share tracking identifier, if any, has been recorded.

For details see https://docs.google.com/document/d/1TSUhvA7O1fDiiI3ZznmHU4SY8iAC0kT6W0-ezMDyweQ/edit?usp=sharing

/cc @cramforce

@rudygalfi rudygalfi added this to the Pending Sprint Slotting milestone May 6, 2016
@cramforce
Member

LGTM from my side.

To clarify: This is a feature to enable the mechanism demonstrated in this URL https://medium.com/@cramforce/why-amp-is-fast-7d2ff1f48597#.bqzkwtl6u
with the fragment starting with ..

@akellehe

@rudygalfi When I originally started the pound project at Buzzfeed I used radix-36 encoded ids. We later updated to the hashed version of the ID to provide more IDs for each user for a number of reasons. In order to tell the new hashed ids apart from the radix-36 IDs I added a preceding ".".

I've noticed several implementations across the web apparently reverse engineering this aspect of the design. It shouldn't be necessary, though, for a new implementation. I can't really think of a great reason (besides confirming the id is in fact a hashed id) to keep that around.

A couple nitpicks: in your doc; "User A" visiting foo.html#abc123 implies "User A" received the link foo.html#abc123 from another user (the one represented by #abc123) so I think it is misleading to say #abc123 is attributed to "User A". "User A"'s arrival on foo.html should be attributed to that previous user.

It's also important to note the fragment is updated to reflect the id assigned to "User A" at the time "User A" arrives on the page foo.html. That could be made more explicit.

@cramforce
Member

We're thinking about picking up the pattern, but more to say "If the fragment starts with a ., then it is likely meant as a tracking fragment as opposed to used for e.g. deep linking into content". With that the . itself would not be part of the ID and if the ID started with a ., there would be 2 dots in the URL :)

@akellehe
akellehe commented May 10, 2016 edited

@cramforce @rudygalfi that's pretty rad. I should note; you'll probably get better reliability across networks through URL interpolation if that's possible. We'll have more information on that soon. Basically it looks like

http://www.example.com/p/[hashed id]/username/title-slug

The downside of this is that it prevents a canonical URL from being tracked as they are by 3rd parties without some modification (which is sometimes not possible). This probably applies, for example, to Google Plus +1 counts. The effect would be the +1 count is distributed across all unique ids that were interpolated into the url by various users.

The up-side is you can look for the pattern in the path such as /\/p\/[a-zA-Z0-9]+\// to test for an encoded id. As a nice side-effect this prevents ambiguity in handling the # parameter. Further it likely lowers the occurrence of "skip nodes" as we call them. (A->B->C appearing as A->C)

On your end you would handle that parameter in/on the client and pass the URL along with the interpolated params stripped out.

@cramforce
Member

I might misunderstand your proposal, but we will definitely use a URL fragment for this, because we do not want changing the id change the server observed URL. This is both for privacy and caching efficiency reasons.

@akellehe

@cramforce do you control the servers? The parameters can pretty easily be ignored, and it's straightforward in varnish to canonicalize the cache key.

@cramforce
Member

We are talking about basically all of the servers and the clients. Since the URL changes on page load, a reload of the current page does not hit the cache. This can be really bad on mobile where closing a browser and returning to it might have evicted the page in the meantime.

Secondly, we need to be sensitive with exposing the ID to the server. In AMP we do not want to log any ids in server logs, unless the publisher asked for it.

Those reasons, together with relying on canonicalization on the side of share targets, makes it not a good idea to encode in the path.

@akellehe
akellehe commented May 11, 2016 edited

Since the URL changes on page load, a reload of the current page does not hit the cache.

this is avoided by the canonicalization I mentioned. i.e. urls are rewritten by the client before fetching from the cache/hitting the host.

Secondly, we need to be sensitive with exposing the ID to the server. In AMP we do not want to log any ids in server logs, unless the publisher asked for it.

The server would never see the ID given the above

Those reasons, together with relying on canonicalization on the side of share targets, makes it not a good idea to encode in the path.

I can't address this since I don't know what you mean by "share targets". Are you referring to a piece of content that is shared? Or is this in reference to the user who receives a link from another user?

The implementation details are important here. I'm not familiar with the way AMP routes traffic to content. Is it going through a redirect (in which case you'll have plenty of control)? Or are you hitting publisher's hosts/CDNs directly at the user's demand?

@cramforce
Member
cramforce commented May 12, 2016 edited

I don't think your canonicalization example is correct. The browser, when it evicted a tab, will load whatever was in the URL bar. The same with the "server never seeing it". That is true on initial rewrite, but it can be shared and then loaded.

This entire proposal is only about sharing. The current URLs are shared and values from the read on incoming traffic. This has to work on the AMP Cache and every publisher's own servers (hence essentially all servers and all CDNs).

@rudygalfi
Contributor

@akellehe Thanks for the comments on the doc. Cleaned it up a bit; PTAL.

@muxin muxin was assigned by cramforce May 16, 2016
@rudygalfi rudygalfi modified the milestone: Next, Current Jul 29, 2016
@rudygalfi
Contributor

Update on this: the first release won't include "Expose a method by which a publisher can specify the “outgoing share tracking identifier” that should be used and set in the URL fragment after the page has loaded and the incoming share tracking identifier, if any, has been recorded."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment