Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2I: AMP Runtime Self-Hosting #25873

Open
sebastianbenz opened this issue Dec 4, 2019 · 54 comments
Open

I2I: AMP Runtime Self-Hosting #25873

sebastianbenz opened this issue Dec 4, 2019 · 54 comments
Labels
INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code WG: Caching: Triaged Initial triage from wg-caching complete. Remove label if new input required. WG: caching WG: runtime

Comments

@sebastianbenz
Copy link
Contributor

sebastianbenz commented Dec 4, 2019

Summary

Make it possible for publishers to self-host the AMP runtime, without breaking AMP validity.

Design document

Motivation

The benefits are:

  • Independence: AMP runtime and components no longer needs to be loaded from Google servers for AMP First sites.
  • Control: AMP First sites can align the roll-out of a new AMP version with their own QA (this won’t help them if pages are served in SERPs). They can also rollback a broken runtime version themselves.
  • Performance: faster page load and TTI by avoiding an additional HTTPs connection.

Launch tracker

/cc @ampproject/wg-approvers @mattwomple @honeybadgerdontcare

@sebastianbenz sebastianbenz added the INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code label Dec 4, 2019
@ssantosms
Copy link
Contributor

I can't add comments to the design doc.
I would like to comment that Bing likes this and I wanted to add our section with the Google cache that we will do the same rewrite to bing-amp.com that we do today. We already rewrite from ampproject, so, we would have to add that re-write.

@ssantosms
Copy link
Contributor

Another question is around build dist. Will it change? We currently override the cache and 3p addresses so we can host ourselves. Is there any plan to change that?

@honeybadgerdontcare
Copy link
Contributor

I can't add comments to the design doc.
I would like to comment that Bing likes this and I wanted to add our section with the Google cache that we will do the same rewrite to bing-amp.com that we do today. We already rewrite from ampproject, so, we would have to add that re-write.

That is great to hear! I've changed permissions so anyone can comment on the design doc.

@dvoytenko
Copy link
Contributor

Couple of notes:

  1. With the OpenJS transition, will we actually be serving JS resources from non-Google infra?
  2. For non-CDN scripts, I'd strongly suggest we require resource integrity. This way we can always fully validate this in our validator and optimize in Cache. And we could always implement the version locking of sorts this way.

@honeybadgerdontcare
Copy link
Contributor

In regards to 1: The plan is for caches to rewrite origin URLs to serve their own version of the Runtime. So yes and no. If it's Google's cache then the JavaScript will come from Google infra. Otherwise if the Runtime is self hosted then it may not be what Google has served in the past.

In regards to 2: Wouldn't this necessitate publishers to republish ALL their AMP documents with each version change?

@kristoferbaxter
Copy link
Contributor

kristoferbaxter commented Dec 6, 2019

On 1 -- What does the transition change about this?
For 2 -- Agree about requiring resource integrity.

@kristoferbaxter
Copy link
Contributor

In regards to 2: Wouldn't this necessitate publishers to republish ALL their AMP documents with each version change?

It wouldn't if the url used in the document contained the version number of AMP used on the document.

@honeybadgerdontcare
Copy link
Contributor

But then if they want to move forward to another version, all documents still need to be updated. It's not sufficient for them to just update whatever was at https://example.com/amp/v0.js.

I guess I don't understand the "and optimize in Cache". The cache can always rewrite it.

If we're worried about a version being too old, there may be other options. One that was suggested was for each Runtime release to have an expiry function. If it's passed a certain time it fetches from cdn.ampproject.org or some other "failure."

TBH, I'm conflicted. Publishers should be able to use whatever they want on origin. If we're concerned that these are served by caches with a newer version then maybe there's another solution. Caches could choose not to accept origins that do this, inspect the runtime that is being served and decide not to accept them, or require some kind of assurance from the domain that it isn't X old.

@kristoferbaxter
Copy link
Contributor

The resource integrity suggestion is one way to ensure the AMP runtime on the page doesn't meaningfully change from the cache's (and thus the document wouldn't be guaranteed to execute on the cache correctly).

I also like your idea of Caches choosing to not support versions beyond a threshold (and using the content of the self-hosted files to ensure a document is running a compliant version).

@dvoytenko – What do you think about @honeybadgerdontcare's suggestion here?

@dvoytenko
Copy link
Contributor

The caches should definitely reject the "too old" versions. But somehow the self-hosted scripts should indicate what RTV they belong to. Resource integrity can do that, or even a simple attribute amp-rtv="12345678" can do that too. The only exception of this is when we expect the self-hosted scripts to be most recent version, but that partially defeats the reason why someone would want to self-host.

Let's look at a basic premise: a publisher wants to self-host to have their own upgrade cycle. I.e. to avoid any regressions on their site when AMP goes from V to V+1. If they don't specify V anywhere, our cache would simply "optimize" the pages to use V+1, which defeats the publisher's reasoning. So we need them to tell us the V. I don't think updating pages to go from version to version is a big deal.

One note aside: subresource integrity would not work with amp-geo extension since it's a dynamic extension.

@kristoferbaxter
Copy link
Contributor

@dvoytenko This i2i could also improve performance on self hosted documents.

  1. For browsers who employ double or triple key caching, caching is invalidated across document domains.
  2. If the canonical domain has already established a H2/H3 connection for the document, it can reuse this connection for script payloads.

@dvoytenko
Copy link
Contributor

@kristoferbaxter yeah, I was wondering if somehow we could improve the case for the self-hosting for the purpose of only avoiding CDN. Maybe the scripts can state that (as much as possible) they are relying on the most recent releases of AMP. We could even alerts web masters when we notice that the actual served RTV is too far away from AMP's PROD?

@kristoferbaxter
Copy link
Contributor

I agree we should do something here, it would be a shame for documents to not get served by caches because their version of components get too outdated.

@sebastianbenz and @honeybadgerdontcare do you have some ideas about how we could address this? Perhaps as a second step?

@dvoytenko
Copy link
Contributor

But I think we're so far finding two clear use cases, right?

  1. Want to serve a RTV of my choice to interop better with my QA cycle. I.e. the desire is to lock to a specific RTV.
  2. Want to serve the as-much-as-possible latest RTV (WEEKLY or LTS) to avoid 3rd-party CDN when possible. I.e. the desire is to follow AMP's schedule, just not the CDN.

@sebastianbenz
Copy link
Contributor Author

I like the idea of encoding the AMP runtime version into the markup. This way we could use Search Console to automatically warn publishers if:

  • the AMP runtime version Is out-of-date
  • the current version has a critical bug

This would make hosting slightly more complicated for publishers though, because they not only need to update the JS, but also all their HTML.

+1 to @dvoytenko's summary.

@dreamofabear
Copy link

@Gregable suggested adding some runtime code that checks (current_time - RTV) <= max_skew and reloads from AMP CDN if false. Basically, version locking for v0.js.

Pubs could recompile and remove, though most probably won't go through the trouble. One concern is breaking everything if their CSP is misconfigured i.e. doesn't allow cdn.ampproject.org.

@honeybadgerdontcare
Copy link
Contributor

honeybadgerdontcare commented Dec 9, 2019

In a previous design review there was a discussion around what RTVs are "green" and what RTVS may be "bad". Do we still have plans on publishing thatj?

I agree we should have some mechanism to say this "RTV" is bad and not supported including saying we are no longer going to rewrite certain RTVs to green. That said, the Validator does no know what RTV a host is serving, nor does Search Console. We'd have to add that ability if we want to restrict what is served/rewritten.

@mdmower
Copy link
Contributor

mdmower commented Jan 17, 2020

@sebastianbenz I have an idea of how we could handle the difficulties amp-geo presents when self hosting a runtime.

If a cache is not able to support dynamically generated amp-geo, then fallback to fetching the country when amp-geo loads. Given that /v0/amp-geo-0.1.js is an asynchronously loaded script that appends a body class late in the page load process, this fallback isn't terrible and will only be utilized by caches that can't support the faster option of dynamically altering amp-geo. The clear advantage to this feature is that there would be no need to disable version locking or to create a "stable" version of amp-geo-0.1.js that is many-version compatible.

To support this feature, there needs to be an API from which the country code could be fetched. The AMP project could offer this feature at the CDN level (with cache modification). I suggest this be made part of the release process (not part of the build process), perhaps something like https://cdn.ampproject.org/geo-api-0.1.json returns:

{
  "country": "de"
}

Then, when amp-geo detects unpatched value {{AMP_ISO_COUNTRY_HOTPATCH}}, it queries the API instead of defaulting to "unknown".

@sebastianbenz
Copy link
Contributor Author

@mdmower I like this idea!

@Gregable would speak anything against providing such an API endpoint?

@Gregable
Copy link
Member

Gregable commented Jan 17, 2020

@jridgewell and @zhouyx might have more state than me.

@mdmower I'm not sure I understand what returning JSON from https://cdn.ampproject.org/geo-api-0.1.json improves over returning JS from https://cdn.ampproject.org/v0/amp-geo-0.1.js. It does not appear to change anything with regards to @sebastianbenz 's comments about independence, control, or performance.

@mdmower
Copy link
Contributor

mdmower commented Jan 18, 2020

@Gregable From the start of my involvement in trying to find a solution for self-hosting, I've been advocating for publishers who would like to self-host the AMP runtime but do not have fancy CDNs that can perform modifications to components based on country (see Guidelines: Adding a new cache to the AMP ecosystem). The solution to this problem so far has been to self-host the entire runtime except for amp-geo, which could be fetched from cdn.ampproject.org. This raises plenty of concern: "can we still use version locking?" and "do we need to offer an 'API stable' version of amp-geo outside of the normal release process?" See sections Version Locking and Dynamic components: amp-geo in the design doc.

The suggestion I am making lets publishers self-host the entire runtime, including amp-geo, and if they do not have the capability to modify amp-geo-0.1.js at the CDN level, the component will fall-back to querying an API to determine the country. The advantage to this method is that it is far easier to make a small JSON file like { "country": "de" } API stable across many runtime versions than it is to hope that amp-geo-0.1.js from any single runtime version will work with many other runtime versions. The level of effort is also very low for the AMP project to help these publishers. The capability to modify a cached resource by country is already implemented for amp-geo-0.1.js and it simply needs to be extended to a new JSON file.

I argue that this hits both of the independence and control categories. The reason is that it broadens the self-hosting audience to include small publishers who can then modify the runtime locally (control) or host with archiving in mind [in case certain rtvs disappear from cdn.ampproject.org] (independence).

@mdmower
Copy link
Contributor

mdmower commented Jan 18, 2020

My suggested changes to amp-geo are in draft PR #26407.

@mdmower
Copy link
Contributor

mdmower commented Jan 18, 2020

@Gregable suggested adding some runtime code that checks (current_time - RTV) <= max_skew and reloads from AMP CDN if false. Basically, version locking for v0.js.

Pubs could recompile and remove, though most probably won't go through the trouble. One concern is breaking everything if their CSP is misconfigured i.e. doesn't allow cdn.ampproject.org.

If I understand this suggestion correctly, I think it should only be implemented for AMP pages served from the Google AMP cache. Otherwise, you're telling publishers they are not allowed to take snapshots in history to record the way in which a page was rendered by a specific version of the runtime.

@Gregable
Copy link
Member

@mdmower Hopefully @jridgewell or someone else on this thread can address version locking and the possible need to upgrade the client-side implementation over time.

From my read of the cache code, I don't see why patching the country into the .json document would be any different than patching it into the .js document. The cache implementation would probably be straightforward.

Otherwise, I'm not sure I have enough context to compare the options.

@mdmower
Copy link
Contributor

mdmower commented Feb 1, 2020

This means simple static sites cannot self-host AMP

I'm not sure how you're reaching this conclusion. A publisher can opt to use RTV specific URLs, in which case they only need to host the runtime at example.com/rtv/123/, or they can use unversioned URLs, in which case they need to host identical runtimes at example.com/ and example.com/rtv/123/.

@sebastianbenz
Copy link
Contributor Author

Got it - thanks for the clarification. In this case, I'd actually prefer to enforce RTV specific URLs via the validator to keep things simple.

@mdmower
Copy link
Contributor

mdmower commented Feb 6, 2020

Got it - thanks for the clarification. In this case, I'd actually prefer to enforce RTV specific URLs via the validator to keep things simple.

Sorry to drag this out, but I'm confused. Why? If a publisher would like to follow the same model as the AMP project where AMP pages reference a rolling AMP release, shouldn't they be able to do so?

@sebastianbenz
Copy link
Contributor Author

Maybe I'm missing something. What are the advantages for a publisher in having a rolling AMP release if they need to provide an RTV'd endpoint as well?

The only reason I can see is that publishers can encode the import scripts in their templates and don't need to update it with every release. However, as self-hosting is only supported for transformed AMP, this should be done in the transformer anyway.

@mdmower
Copy link
Contributor

mdmower commented Feb 7, 2020

@sebastianbenz The RTV paths are still necessary because dynamically loaded components use the RTV paths when version locking is applied.

I don't think there's any reason to stipulate transformation by amp-toolbox's optimizer for self-hosting to be allowed. If one of the goals of this I2I is "independence", then a publisher ought to be able to simply swap out cdn.ampproject.org with runtime.example.org in their AMP documents, add a couple of meta tags, and the runtime does the rest. None of this requires any extra work by the runtime than the changes we've discussed to read the meta tags. I don't even think the validator requires additional work to support rolling release self-hosting URLs. This all matches the AMP project so closely that support is essentially the same.

@ithinkihaveacat
Copy link
Contributor

ithinkihaveacat commented Feb 7, 2020

The self-hosting experience seems a bit bumpy, especially if the developer was expecting a self-hosted "hello, world" to involve unzip and a text editor, or minor search-and-replace surgery of an existing AMP:

  • Even if I'm building a simple static site, I can't deploy the AMP JS wherever I want; I need to generate RTV-aware HTML and make the AMP JS available at an RTV-dependent location (at the root).
  • Even though I've had to make my HTML and/or server RTV-aware, AMP caches will (surprisingly) ignore the RTV.

I assume it's a non-goal that self-hosting is easy, perhaps make this explicit?

@mdmower
Copy link
Contributor

mdmower commented Feb 7, 2020

@ithinkihaveacat I think that's a fair assessment. Version locking is pretty much the reason for the complex JS hosting requirements. I think a simpler hello world example could be possible without version locking, but I'm not sure anyone is onboard with that idea. I largely don't care anymore as long as we can get some self hosting support out the door in the near future. Long term, sure, I would like to see the AMP project loosen it's grip on how folks host the runtime.

@mdmower
Copy link
Contributor

mdmower commented Feb 16, 2020

amphtml PR: #26829
amp-toolbox PR: ampproject/amp-toolbox#622
Self-hosted runtime sample project: https://github.com/mdmower/amp-self-host-demo/

@jpettitt
Copy link
Contributor

Re the amp-geo API - when you write the docs it's probably worth pointing out that there are accuracy/performance cache tradeoff considerations on the endpoint. Longer cache window means faster 2nd+ page loads but increases the risk of a wrong country. amp-geo uses 30 min if I recall correctly.

@mdmower
Copy link
Contributor

mdmower commented Mar 4, 2020

While it might benefit from some reorganization, I've been preparing a "hosting howto": https://gist.github.com/mdmower/b56e94f0dc36beafb825b0c5e31fceef . Is this something that should be included in #26829 or should it have its own PR where it can be independently scrutinized?

@sebastianbenz
Copy link
Contributor Author

@mdmower that guide is amazing! Thanks a lot for starting this. Please create a different PR for this and assign @CrystalOnScript as a reviewer.

@mdmower
Copy link
Contributor

mdmower commented Mar 5, 2020

PR for AMP framework hosting guide: #27100
(in draft status until #26829 is approved)

@mdmower
Copy link
Contributor

mdmower commented Mar 28, 2020

One of the open questions at the end of the original design doc is "Do we want to ship a zip containing all AMP runtime assets with every AMP release, e.g. via NPM?" I would like to see this happen but I don't think there's any way I can contribute (pretty sure only Googlers with access to internal release code/scripts can work on this). Does this task need a new Issue created?

@sebastianbenz
Copy link
Contributor Author

New issue would be great! Thanks!

@westonruter
Copy link
Member

Can we allow for cache-busting query parameters to be added to the versioned scripts? This would make it much easier to juggle writing the scripts to the filesystem and not having to worry about keeping old versions around. In other words, instead of:

<script async src="https://example.com/amp/rtv/012002290616360/v0.js"></script>

Could we support:

<script async src="https://example.com/amp/v0.js?rtv=012002290616360"></script>

For the WordPress plugin, we'd probably be writing the scripts to the filesystem in a directory like wp-content/amp-scripts, so that would look like:

<script async src="https://example.com/wp-content/amp-scripts/v0.js?rtv=012002290616360"></script>

And whenever our scheduled tasks runs to update the AMP scripts, we'd just have to update that one wp-content/amp-scripts/v0.js file as opposed to adding a new wp-content/amp-scripts/rtv/012002290616360, writing v0.js into it, and then deciding which previous rtv directory can be safely deleted based on page cache times.

@jridgewell
Copy link
Contributor

We can, but it requires two changes:

  1. Extension parsing now needs to handle ?rtv= param:
    export function parseExtensionUrl(scriptUrl) {
    if (!scriptUrl) {
    return null;
    }
    // Note that the "(\.max)?" group only applies to local dev.
    const matches = scriptUrl.match(
    /^(.*)\/(.*)-([0-9.]+|latest)(\.max)?\.(?:js|mjs)$/i
    );
    const extensionId = matches ? matches[2] : undefined;
    const extensionVersion = matches ? matches[3] : undefined;
    if (!extensionId || !extensionVersion) {
    return null;
    }
    return {extensionId, extensionVersion};
    }
  2. Extension insertion needs to request with ?rtv=:
    export function calculateExtensionScriptUrl(
    location,
    extensionId,
    version,
    opt_isLocalDev
    ) {
    const fileExtension = getMode().esm ? '.mjs' : '.js';
    const base = calculateScriptBaseUrl(location, opt_isLocalDev);
    const rtv = getMode().rtvVersion;
    const extensionVersion = version ? '-' + version : '';
    return `${base}/rtv/${rtv}/v0/${extensionId}${extensionVersion}${fileExtension}`;
    }

@westonruter
Copy link
Member

And then this would also need to be supported in the RewriteAmpUrls optimizer transformer. Namely, there would need to be a way to prevent the ampRuntimeVersion from being appended to the path in calculateHost(), and then the _replaceUrl() transformer method would need to take a rtv argument to conditionally add the query param if configured to do so.

How would that configuration be supplied? In addition to the ampUrlPrefix should there be a boolean like rtvAsQueryParam that defaults to false?

@jridgewell
Copy link
Contributor

How would that configuration be supplied? In addition to the ampUrlPrefix should there be a boolean like rtvAsQueryParam that defaults to false?

I'm not sure we need both params. Presence of ampUrlPrefix could imply query params.

@twifkak
Copy link
Member

twifkak commented Apr 14, 2021

Hi, haven't been following closely, but just wanted to include https://github.com/ampproject/amppackager/blob/releases/transformer/transformers/urlrewrite.go in the list of things that might need changing.

@samuelspagl
Copy link

Hey there,

as this issue is still open since 2021, I wanted to ask what the current status of this is.
I would love to self host part of the AMP runtime for the "Google Webstories" Wordpress plugin.

Thanks a lot in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INTENT TO IMPLEMENT Proposes implementation of a significant new feature. https://bit.ly/amp-contribute-code WG: Caching: Triaged Initial triage from wg-caching complete. Remove label if new input required. WG: caching WG: runtime
Projects
None yet
Development

No branches or pull requests