Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Check doc head for custom URLs #26829

Merged
merged 7 commits into from
Mar 27, 2020
Merged

Conversation

mdmower
Copy link
Contributor

@mdmower mdmower commented Feb 16, 2020

Check for a custom URL definition in special <meta> tags. Note that
this does not allow for distinct custom URLs in AmpDocShadow instances.
The shell is allowed to define one set of custom URLs via AMP_CONFIG
(recommended) or by including <meta> tags in the shell <head>. Those
custom URLs then apply to all AMP documents loaded in the shell.

Check for a custom URL definition in special <meta> tags. Note that
this does not allow for distinct custom URLs in AmpDocShadow instances.
The shell is allowed to define one set of custom URLs via AMP_CONFIG
(recommended) or by including <meta> tags in the shell <head>. Those
custom URLs then apply to all AMP documents loaded in the shell.
@mdmower
Copy link
Contributor Author

mdmower commented Feb 16, 2020

Corresponding amp-toolbox PR: ampproject/amp-toolbox#622
Self-hosted runtime sample project: https://github.com/mdmower/amp-self-host-demo/

src/config.js Outdated

/** @type {!Object<string, ?string>} */
const metaUrls = {
'runtime-host': getMetaUrl('runtime-host'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, we need to confirm that AMP Cache transforms will remove this meta. @Gregable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change to only allow getMetaUrl() on non-proxy origins. Let me know if you think it is unnecessary and I can back it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a separate topic, the metaUrls object defined here is not needed. getMetaUrl('self-host') can be used directly in the definition of urls.cdn:

cdn: env['cdnUrl'] || getMetaUrl('runtime-host') || 'https://cdn.ampproject.org',

return null;
}

const metaEl = self.document.head./*OK*/ querySelector(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will immediately invoke the meta lookup, which creates a race between head parsing and JS loading. We need to delay this as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I see your concern. I'll hunt around for a suitable event on which to listen or a flag that has been set. Notably, though, I'm trying to avoid adding dependencies to config.js as they quickly lead to circular dependencies when this file is imported elsewhere. I'll post back when I have a solution.

Copy link
Contributor Author

@mdmower mdmower Feb 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jridgewell I'm not sure how to accomplish the delay without turning urls.cdn into a promise or letting urls.cdn contain an incorrect value for a short time until <head> is fully available, and then re-evaluating its value. I'm not thrilled about either solution. The former requires refactoring all consumers of urls.cdn and the later is... sloppy. Could we simply make it a stipulation that <meta name="runtime-host"> must appear earlier than all <script> tags? It's not much different than what is required for PWAs and viewers that define AMP_CONFIG={cdnUrl:"..."} before any AMP scripts load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simply make it a stipulation that <meta name="runtime-host"> must appear earlier than alll <script> tags?

That seems fine to me.

@ampproject/wg-caching: Is it possible to specify that this meta tag must come before the <script src=v0.js>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, validator doesn't currently support ordering constraints, though it seems possible to change that. I'll defer to @Gregable and @honeybadgerdontcare on whether we should.

A half-way solution is to change the reorder head transformer to ensure this is correct on AMP cache, at least. We could also update AMP optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering of meta tags by optimizer and head transformer is being implemented/discussed here: ampproject/amp-toolbox#636 . It makes sense to split out the reordering into a separate PR from the geoapi change, so I'll do that next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Google AMP Cache and the AMP Packager, we do put meta tags before all script tags in the reorder head transformer. This is a requirement for several extensions. We also only keep an explicit subset of meta tags, see internal or external documentation.

It looks like the AMP optimizer does deviate from what we do, with putting the runtime before other meta tags. That may be okay but guidance I've received is all meta tags should go before any script tag. @sebastianbenz

As to restricting the ordering @twifkak is right that we don't have a general way to do that but we have some rudimentary way to do it, see ChildTagSpec. That won't work here. We can file a feature request but it would need to be implemented first to avoid making pages already implementing this meta tag as invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually want Caches to strip the meta entirely.

@jridgewell what's the reason for this? The cache doesn't strip all meta tags either. Not sure it's a good idea for Optimizer to remove meta tags as it might break publishers on the origin.

I've filed ampproject/amp-toolbox#641 for updating optimizer to correctly place meta tags.

A half-way solution is to change the reorder head transformer to ensure this is correct on AMP cache, at least. We could also update AMP optimizer.

This sounds like a valid approach as long as runtime self-hosting is only supported by transformed AMP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizer shouldn't, only cache. But actually, the code now only respects the meta if we're on Origin, so it won't be bad either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as runtime self-hosting is only supported by transformed AMP.

I would prefer this is not made a stipulation.

It's worth pointing out, the code change made by this PR ignores the meta tags under discussion if the origin is a cache origin.

@mdmower
Copy link
Contributor Author

mdmower commented Mar 12, 2020

Is there anything that needs to be changed here or dependencies that should be noted in the PR message (e.g. validator)?

@honeybadgerdontcare
Copy link
Contributor

Is there anything that needs to be changed here or dependencies that should be noted in the PR message (e.g. validator)?

This code doesn't seem to affect validation, so I'm not sure what you're asking. We have a whitelist of meta tags see starting here. Since the validator is currently unaware of the context it's running in it would give an error if it's not part of the rules. E.g. the outcome would be the same if it's a NPM package, on dev console via #development or running in an AMP cache. I would like to think that consistency is good and that a static page run through any of those scenarios still produces the same result.

@mdmower
Copy link
Contributor Author

mdmower commented Mar 26, 2020

Bump to keep this on folks' radars. As far as I can tell, no changes are needed in this PR. amp-toolbox/optimizer modifications (ampproject/amp-toolbox#636) and documentation (#27100) are in draft status until a decision is make here. Thanks, and let me know if there are any questions that I can clear-up.

Copy link
Contributor

@jridgewell jridgewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize I was blocking.

@mdmower
Copy link
Contributor Author

mdmower commented Mar 26, 2020

Thanks @jridgewell.

@sebastianbenz, it's your I2I that this partially satisfies. Green light?

@sebastianbenz
Copy link
Contributor

Thanks a lot @mdmower! Looks good from my side!

@mdmower mdmower merged commit c0bff85 into ampproject:master Mar 27, 2020
@mdmower mdmower deleted the pr-selfhost2 branch March 27, 2020 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants