Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analytics: gtag domain linker adds _gl query parameter even when navigating AMP internal links on origin #30176

Closed
westonruter opened this issue Sep 9, 2020 · 19 comments · Fixed by #32100

Comments

@westonruter
Copy link
Member

westonruter commented Sep 9, 2020

I'm not entirely sure this is a bug, but it feels like it.

On my WordPress site which is AMP-first, I have the Google Site Kit plugin running with Google Analytics configured like this (as provided by Site Kit):

<amp-analytics type="gtag" data-credentials="include">
    <script type="application/json">
        {"vars":{"gtag_id":"UA-123456-1","config":{"UA-123456-1":{"groups":"default","linker":{"domains":["weston.ruter.net"]}}}},"optoutElementId":"__gaOptOutExtension"}
    </script>
</amp-analytics>

From the homepage at https://weston.ruter.net/ if I then click on post for “Integrating with AMP Dev Mode in WordPress” which has the URL of https://weston.ruter.net/2019/09/24/integrating-with-amp-dev-mode-in-wordpress/ I am actually taken to a URL like:

https://weston.ruter.net/2019/09/24/integrating-with-amp-dev-mode-in-wordpress/?_gl=1~abcde5~

This happens for each internal link I click on my site. When I navigate around my site, each request includes a _gl query parameter, and then it gets stripped out with history.replaceState() when the page initializes (as far as I can tell).

As I understand, this is in order to measure customer journeys across domains. Nevertheless, I am not navigating across domains. I am navigating on my own domain.

Shouldn't this _gl query parameter only be added links for pages that are served on the AMP Cache over to the origin domain? The injection of this query parameter when navigating around on the origin server is problematic for a couple reasons:

  1. It is distracting for users who care about what is in the Location bar. I see I am navigated to some URL with a long random string, and then it disappears. That's somewhat disconcerting.
  2. It can interfere with full-page caching since no two page requests will have the same URL.
  3. It also breaks service worker caching of navigation requests, unless you figure out logic to explicitly strip out the _gl parameter when caching a response or looking up a previously-cached response.

This issue can also be seen on https://amp.dev/ when clicking on the “Blog” link in the nav menu. Instead of landing on https://blog.amp.dev/ I am actually taken to:

https://blog.amp.dev/?_gl=1~abcde5~

Here, however, the _gl query parameter is not stripped out because the domain linker functionality is not currently enabled. This was reported at ampproject/amp.dev#4292

In any case, should this _gl parameter be added to such links in the first place? Shouldn't it only happen if blog.amp.dev is included among the domains configuration for gtag?

@dero
Copy link

dero commented Sep 10, 2020

It can interfere with full-page caching since no two page requests will have the same URL.

This is causing major performance issues on an AMP standard site we've recently deployed running with Sitekit. I would suggest treating this issue as a high priority one.

@westonruter westonruter changed the title Analytics: GTag domain linker adds _gl query parameter even when navigating AMP internal links on origin Analytics: gtag domain linker adds _gl query parameter even when navigating AMP internal links on origin Sep 10, 2020
@westonruter
Copy link
Member Author

cc @ampproject/wg-analytics

@micajuine-ho micajuine-ho self-assigned this Sep 10, 2020
@zhouyx
Copy link
Contributor

zhouyx commented Sep 10, 2020

Hi @westonruter @dero

I can see two issues here.

1, the destinationDomain is set to weston.ruter.net. By default, AMP won't decorate a url with the exact same domain. However the domains value override that. This instructs AMP to always to decorate urls to this domain.
2. The proxyOnly value returned is false. As you mentioned, by default AMP doesn't decorate url if the page is served from the origin. However the proxyOnly: false config override that behavior, and instructs AMP to always decorate url no matter what.

The unexpected behavior should be fixed by changing the config. Thanks.

@westonruter
Copy link
Member Author

Thank you. I'm raising with the Site Kit team since the configuration comes from that plugin.

@westonruter
Copy link
Member Author

See google/site-kit-wp#2010.

@westonruter
Copy link
Member Author

westonruter commented Sep 11, 2020

@dero This WP filter code should address point 1 above:

add_filter(
	'googlesitekit_amp_gtag_opt',
	function ( $gtag_opt ) {
		foreach ( $gtag_opt['vars']['config'] as &$config ) {
			unset( $config['linker']['domains'] );
		}
		return $gtag_opt;
	}
);

The second point regarding proxyOnly:false I'm not sure about, as that seems to be coming from a response from https://www.googletagmanager.com/gtag/amp.

@adamsilverstein
Copy link
Contributor

1, the destinationDomain is set to weston.ruter.net. By default, AMP won't decorate a url with the exact same domain. However the domains value override that. This instructs AMP to always to decorate urls to this domain.

@zhouyx Are you sure this is the correct behavior? When testing and reviewing gtag.js linker functionality, I noticed that gtag will never decorate a link to the exact same domain (it does this check first). domains does not override that, unlike in AMP.

I feel like the AMP auto-linker should match this logic, what do you think?

@adamsilverstein
Copy link
Contributor

Moving up the check in isDomainMatch_ would fix this: #32100

@westonruter westonruter reopened this Jan 21, 2021
@adamsilverstein
Copy link
Contributor

@micajuine-ho can you please review this issue/PR when you have a chance?

@micajuine-ho
Copy link
Contributor

micajuine-ho commented Jan 21, 2021

Hi @adamsilverstein @westonruter

Thanks for bringing this up again.

Moving up the check in isDomainMatch_ would fix this: #32100

This would fix this specific case, however I am hesitant to make this fix as it would cause problems for other users since it would be breaking functionality.

The problem here is that the gtag analytics config is using a linker without proxyOnly: true.

We could bring this up to the gtag team and see if there is a way to customize the config (specifically the proxyOnly field) on their end.

Or we could make the proxyOnly feature a top level attribute in the config, which would allow publishers to have the final say on link decoration.

/cc @kristoferbaxter Do you know who a contact is on the gtag team?

@adamsilverstein
Copy link
Contributor

adamsilverstein commented Jan 21, 2021

@micajuine-ho Thanks for the reply...

This would fix this specific case, however I am hesitant to make this fix as it would cause problems for other users since it would be breaking functionality.

I can understand why you wouldn't want to risk breaking existing functionality. I'm still not sure I understand why not adding the linker to local links would be problematic. Is that related to a cache serve case? Do you have an example of what this would break? Is the existing behavior you described documented anywhere (ie "AMP won't decorate a url with the exact same domain. However the domains value overrides that.")?

Is there a use case where you would want to include link decoration to the same domain?

The problem here is that the gtag analytics config is using a linker without proxyOnly: true.

I'm not sure this will work, for example in the Newspack case, they need to include the linker decorations on origin because they use a 3p subscription flow and want to keep track of users who leave and then return to the site (two way linking). So they can't use proxyOnly: true, right?

@micajuine-ho
Copy link
Contributor

micajuine-ho commented Jan 22, 2021

Is the existing behavior you described documented anywhere?

Yes it is documented here

I'm still not sure I understand why not adding the linker to local links would be problematic. Is that related to a cache serve case?

I don't have too much context around why we don't automatically allow the same hostnames to be automatically decorated, maybe @calebcordry can chimes in here. But I imagine it's because we have the proxyOnly option that we expected to be used.

I'm not sure this will work, for example in the Newspack case, they need to include the linker decorations on origin because they use a 3p subscription flow and want to keep track of users who leave and then return to the site (two way linking).

I see. This idea of linking between cache to origin and origin to 3p, but not origin to origin seems to be a new use case for our Linker. I'd be happy to explore some options that would allow for this sort of behavior.

However, I am a little confused still on how gtag plays a roll in this. Will publishers be using the newspack / wordpress plugins to generate amp pages that have <amp-analytics type=gtag>?

@adamsilverstein
Copy link
Contributor

adamsilverstein commented Jan 22, 2021

However, I am a little confused still on how gtag plays a roll in this. Will publishers be using the newspack / wordpress plugins to generate amp pages that have <amp-analytics type=gtag>?

Yes. @westonruter gave some sample Site Kit output above where you can see the full configuration. WordPress users might use the official AMP plugin and Google's Site Kit for AMP compatible Analytics integration.

In addition, gtag comes in because I am comparing the linker behavior between AMP and non AMP pages; the non-AMP versions use the standard gtag.js script and does not decorate links to the origin domain at all, even if added to the linker->domains array (as in the Newspack case). The AMP version using the same linker configuration does decorate links to the local domain.

@micajuine-ho
Copy link
Contributor

To reiterate our offline conversation:

We needed the proxyOnly along with the domain name allowlisted in destinationDomains:[] for fragment links from cache to canonical.

However, adding the origin to the destinationDomains also means that internal navigation will always be linker decorated.

A possible solution here is to add a feature that disallows linker decoration for internal navigation, when not navigating from the cache:

linkers: {
  noInternalLinking: true, // boolean, defaults to false
  ...
}

noInternalLinking set to true will not decorate links for internal navigation, even if the domain is in destinationDomains: []. noInternalLinking will not apply to proxyOrigins and so should not interfere with linker decoration when navigating from cache to canonical.

Vendors, such as gtag, can change their checked in json or change their configRewritter endpoint to add this in, and publishers can inline this feature.

@calebcordry
Copy link
Member

I vaguely remember something about some pubs wanting same domain decoration, but when I looked through old issues I couldn't find anything concrete. I agree with Mica that Adam's PR would be a breaking change if there is a valid use case, but again, I don't have a real world example. I like the new flag idea, but it would require coordination with the vendors. I will defer to Mica and team as to which solution they would like to move forward with.

@adamsilverstein
Copy link
Contributor

Thanks for the feedback @calebcordry, appreciate the consideration.

@micajuine-ho
Copy link
Contributor

micajuine-ho commented Feb 1, 2021

After getting results back form cookbook, I'm seeing a good amount of URLs that use destinationDomains w/ same origin and proxyOnly: false (including a few notable pubs). I also see that localhost is allow-listed in destinationDomains in some of these pages--alluding to the fact that devs need the current setup to test locally.

With these findings, I think we should move ahead with the introduction of a new feature to disable to same domain linking while having proxyOnly: false.

Edit: After, talking to @kristoferbaxter, we agreed, that since adding this change would still require analytics vendors to change their code, we should just change this behavior within amp-analytics. We should also add documentation describing how to locally test Linker (i.e. no internal navigation testing via localhost).

@adamsilverstein
Copy link
Contributor

Hey @micajuine-ho - I have one follow up question about the AMP linker vs. using the non-AMP Analytics linker:

We noticed that on non-AMP pages the tracking query parameter decorating links is _ga and on AMP it is _gl. Is this expected/correct?

@micajuine-ho
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants