Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2I: AMP Runtime Self-Hosting #25873

Open
sebastianbenz opened this issue Dec 4, 2019 · 42 comments
Open

I2I: AMP Runtime Self-Hosting #25873

sebastianbenz opened this issue Dec 4, 2019 · 42 comments

Comments

@sebastianbenz
Copy link
Contributor

@sebastianbenz sebastianbenz commented Dec 4, 2019

Summary

Make it possible for publishers to self-host the AMP runtime, without breaking AMP validity.

Design document

Motivation

The benefits are:

  • Independence: AMP runtime and components no longer needs to be loaded from Google servers for AMP First sites.
  • Control: AMP First sites can align the roll-out of a new AMP version with their own QA (this won’t help them if pages are served in SERPs). They can also rollback a broken runtime version themselves.
  • Performance: faster page load and TTI by avoiding an additional HTTPs connection.

Launch tracker

/cc @ampproject/wg-approvers @mattwomple @honeybadgerdontcare

@ssantosms

This comment has been minimized.

Copy link
Contributor

@ssantosms ssantosms commented Dec 4, 2019

I can't add comments to the design doc.
I would like to comment that Bing likes this and I wanted to add our section with the Google cache that we will do the same rewrite to bing-amp.com that we do today. We already rewrite from ampproject, so, we would have to add that re-write.

@ssantosms

This comment has been minimized.

Copy link
Contributor

@ssantosms ssantosms commented Dec 4, 2019

Another question is around build dist. Will it change? We currently override the cache and 3p addresses so we can host ourselves. Is there any plan to change that?

@honeybadgerdontcare

This comment has been minimized.

Copy link
Contributor

@honeybadgerdontcare honeybadgerdontcare commented Dec 4, 2019

I can't add comments to the design doc.
I would like to comment that Bing likes this and I wanted to add our section with the Google cache that we will do the same rewrite to bing-amp.com that we do today. We already rewrite from ampproject, so, we would have to add that re-write.

That is great to hear! I've changed permissions so anyone can comment on the design doc.

@dvoytenko

This comment has been minimized.

Copy link
Collaborator

@dvoytenko dvoytenko commented Dec 6, 2019

Couple of notes:

  1. With the OpenJS transition, will we actually be serving JS resources from non-Google infra?
  2. For non-CDN scripts, I'd strongly suggest we require resource integrity. This way we can always fully validate this in our validator and optimize in Cache. And we could always implement the version locking of sorts this way.
@honeybadgerdontcare

This comment has been minimized.

Copy link
Contributor

@honeybadgerdontcare honeybadgerdontcare commented Dec 6, 2019

In regards to 1: The plan is for caches to rewrite origin URLs to serve their own version of the Runtime. So yes and no. If it's Google's cache then the JavaScript will come from Google infra. Otherwise if the Runtime is self hosted then it may not be what Google has served in the past.

In regards to 2: Wouldn't this necessitate publishers to republish ALL their AMP documents with each version change?

@kristoferbaxter

This comment has been minimized.

Copy link
Contributor

@kristoferbaxter kristoferbaxter commented Dec 6, 2019

On 1 -- What does the transition change about this?
For 2 -- Agree about requiring resource integrity.

@kristoferbaxter

This comment has been minimized.

Copy link
Contributor

@kristoferbaxter kristoferbaxter commented Dec 6, 2019

In regards to 2: Wouldn't this necessitate publishers to republish ALL their AMP documents with each version change?

It wouldn't if the url used in the document contained the version number of AMP used on the document.

@honeybadgerdontcare

This comment has been minimized.

Copy link
Contributor

@honeybadgerdontcare honeybadgerdontcare commented Dec 6, 2019

But then if they want to move forward to another version, all documents still need to be updated. It's not sufficient for them to just update whatever was at https://example.com/amp/v0.js.

I guess I don't understand the "and optimize in Cache". The cache can always rewrite it.

If we're worried about a version being too old, there may be other options. One that was suggested was for each Runtime release to have an expiry function. If it's passed a certain time it fetches from cdn.ampproject.org or some other "failure."

TBH, I'm conflicted. Publishers should be able to use whatever they want on origin. If we're concerned that these are served by caches with a newer version then maybe there's another solution. Caches could choose not to accept origins that do this, inspect the runtime that is being served and decide not to accept them, or require some kind of assurance from the domain that it isn't X old.

@kristoferbaxter

This comment has been minimized.

Copy link
Contributor

@kristoferbaxter kristoferbaxter commented Dec 6, 2019

The resource integrity suggestion is one way to ensure the AMP runtime on the page doesn't meaningfully change from the cache's (and thus the document wouldn't be guaranteed to execute on the cache correctly).

I also like your idea of Caches choosing to not support versions beyond a threshold (and using the content of the self-hosted files to ensure a document is running a compliant version).

@dvoytenko – What do you think about @honeybadgerdontcare's suggestion here?

@dvoytenko

This comment has been minimized.

Copy link
Collaborator

@dvoytenko dvoytenko commented Dec 6, 2019

The caches should definitely reject the "too old" versions. But somehow the self-hosted scripts should indicate what RTV they belong to. Resource integrity can do that, or even a simple attribute amp-rtv="12345678" can do that too. The only exception of this is when we expect the self-hosted scripts to be most recent version, but that partially defeats the reason why someone would want to self-host.

Let's look at a basic premise: a publisher wants to self-host to have their own upgrade cycle. I.e. to avoid any regressions on their site when AMP goes from V to V+1. If they don't specify V anywhere, our cache would simply "optimize" the pages to use V+1, which defeats the publisher's reasoning. So we need them to tell us the V. I don't think updating pages to go from version to version is a big deal.

One note aside: subresource integrity would not work with amp-geo extension since it's a dynamic extension.

@kristoferbaxter

This comment has been minimized.

Copy link
Contributor

@kristoferbaxter kristoferbaxter commented Dec 7, 2019

@dvoytenko This i2i could also improve performance on self hosted documents.

  1. For browsers who employ double or triple key caching, caching is invalidated across document domains.
  2. If the canonical domain has already established a H2/H3 connection for the document, it can reuse this connection for script payloads.
@dvoytenko

This comment has been minimized.

Copy link
Collaborator

@dvoytenko dvoytenko commented Dec 9, 2019

@kristoferbaxter yeah, I was wondering if somehow we could improve the case for the self-hosting for the purpose of only avoiding CDN. Maybe the scripts can state that (as much as possible) they are relying on the most recent releases of AMP. We could even alerts web masters when we notice that the actual served RTV is too far away from AMP's PROD?

@kristoferbaxter

This comment has been minimized.

Copy link
Contributor

@kristoferbaxter kristoferbaxter commented Dec 9, 2019

I agree we should do something here, it would be a shame for documents to not get served by caches because their version of components get too outdated.

@sebastianbenz and @honeybadgerdontcare do you have some ideas about how we could address this? Perhaps as a second step?

@dvoytenko

This comment has been minimized.

Copy link
Collaborator

@dvoytenko dvoytenko commented Dec 9, 2019

But I think we're so far finding two clear use cases, right?

  1. Want to serve a RTV of my choice to interop better with my QA cycle. I.e. the desire is to lock to a specific RTV.
  2. Want to serve the as-much-as-possible latest RTV (WEEKLY or LTS) to avoid 3rd-party CDN when possible. I.e. the desire is to follow AMP's schedule, just not the CDN.
@sebastianbenz

This comment has been minimized.

Copy link
Contributor Author

@sebastianbenz sebastianbenz commented Dec 9, 2019

I like the idea of encoding the AMP runtime version into the markup. This way we could use Search Console to automatically warn publishers if:

  • the AMP runtime version Is out-of-date
  • the current version has a critical bug

This would make hosting slightly more complicated for publishers though, because they not only need to update the JS, but also all their HTML.

+1 to @dvoytenko's summary.

@choumx

This comment has been minimized.

Copy link
Contributor

@choumx choumx commented Dec 9, 2019

@Gregable suggested adding some runtime code that checks (current_time - RTV) <= max_skew and reloads from AMP CDN if false. Basically, version locking for v0.js.

Pubs could recompile and remove, though most probably won't go through the trouble. One concern is breaking everything if their CSP is misconfigured i.e. doesn't allow cdn.ampproject.org.

@honeybadgerdontcare

This comment has been minimized.

Copy link
Contributor

@honeybadgerdontcare honeybadgerdontcare commented Dec 9, 2019

In a previous design review there was a discussion around what RTVs are "green" and what RTVS may be "bad". Do we still have plans on publishing thatj?

I agree we should have some mechanism to say this "RTV" is bad and not supported including saying we are no longer going to rewrite certain RTVs to green. That said, the Validator does no know what RTV a host is serving, nor does Search Console. We'd have to add that ability if we want to restrict what is served/rewritten.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Jan 17, 2020

@sebastianbenz I have an idea of how we could handle the difficulties amp-geo presents when self hosting a runtime.

If a cache is not able to support dynamically generated amp-geo, then fallback to fetching the country when amp-geo loads. Given that /v0/amp-geo-0.1.js is an asynchronously loaded script that appends a body class late in the page load process, this fallback isn't terrible and will only be utilized by caches that can't support the faster option of dynamically altering amp-geo. The clear advantage to this feature is that there would be no need to disable version locking or to create a "stable" version of amp-geo-0.1.js that is many-version compatible.

To support this feature, there needs to be an API from which the country code could be fetched. The AMP project could offer this feature at the CDN level (with cache modification). I suggest this be made part of the release process (not part of the build process), perhaps something like https://cdn.ampproject.org/geo-api-0.1.json returns:

{
  "country": "de"
}

Then, when amp-geo detects unpatched value {{AMP_ISO_COUNTRY_HOTPATCH}}, it queries the API instead of defaulting to "unknown".

@sebastianbenz

This comment has been minimized.

Copy link
Contributor Author

@sebastianbenz sebastianbenz commented Jan 17, 2020

@mdmower I like this idea!

@Gregable would speak anything against providing such an API endpoint?

@Gregable

This comment has been minimized.

Copy link
Member

@Gregable Gregable commented Jan 17, 2020

@jridgewell and @zhouyx might have more state than me.

@mdmower I'm not sure I understand what returning JSON from https://cdn.ampproject.org/geo-api-0.1.json improves over returning JS from https://cdn.ampproject.org/v0/amp-geo-0.1.js. It does not appear to change anything with regards to @sebastianbenz 's comments about independence, control, or performance.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Jan 18, 2020

@Gregable From the start of my involvement in trying to find a solution for self-hosting, I've been advocating for publishers who would like to self-host the AMP runtime but do not have fancy CDNs that can perform modifications to components based on country (see Guidelines: Adding a new cache to the AMP ecosystem). The solution to this problem so far has been to self-host the entire runtime except for amp-geo, which could be fetched from cdn.ampproject.org. This raises plenty of concern: "can we still use version locking?" and "do we need to offer an 'API stable' version of amp-geo outside of the normal release process?" See sections Version Locking and Dynamic components: amp-geo in the design doc.

The suggestion I am making lets publishers self-host the entire runtime, including amp-geo, and if they do not have the capability to modify amp-geo-0.1.js at the CDN level, the component will fall-back to querying an API to determine the country. The advantage to this method is that it is far easier to make a small JSON file like { "country": "de" } API stable across many runtime versions than it is to hope that amp-geo-0.1.js from any single runtime version will work with many other runtime versions. The level of effort is also very low for the AMP project to help these publishers. The capability to modify a cached resource by country is already implemented for amp-geo-0.1.js and it simply needs to be extended to a new JSON file.

I argue that this hits both of the independence and control categories. The reason is that it broadens the self-hosting audience to include small publishers who can then modify the runtime locally (control) or host with archiving in mind [in case certain rtvs disappear from cdn.ampproject.org] (independence).

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Jan 18, 2020

My suggested changes to amp-geo are in draft PR #26407.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Jan 18, 2020

@Gregable suggested adding some runtime code that checks (current_time - RTV) <= max_skew and reloads from AMP CDN if false. Basically, version locking for v0.js.

Pubs could recompile and remove, though most probably won't go through the trouble. One concern is breaking everything if their CSP is misconfigured i.e. doesn't allow cdn.ampproject.org.

If I understand this suggestion correctly, I think it should only be implemented for AMP pages served from the Google AMP cache. Otherwise, you're telling publishers they are not allowed to take snapshots in history to record the way in which a page was rendered by a specific version of the runtime.

@Gregable

This comment has been minimized.

Copy link
Member

@Gregable Gregable commented Jan 21, 2020

@mdmower Hopefully @jridgewell or someone else on this thread can address version locking and the possible need to upgrade the client-side implementation over time.

From my read of the cache code, I don't see why patching the country into the .json document would be any different than patching it into the .js document. The cache implementation would probably be straightforward.

Otherwise, I'm not sure I have enough context to compare the options.

@cramforce

This comment has been minimized.

Copy link
Member

@cramforce cramforce commented Jan 21, 2020

Just very generally: We'd like to avoid providing an easily usable general purpose geo location API that folks would easily starting to rely on outside of amp-geo. This is for numerous reasons.

So, I'd strongly prefer a solution that is directly tried to amp-geo in some fashion.

@mattwomple

This comment has been minimized.

Copy link
Member

@mattwomple mattwomple commented Jan 21, 2020

Just very generally: We'd like to avoid providing an easily usable general purpose geo location API that folks would easily starting to rely on outside of amp-geo. This is for numerous reasons.

So, I'd strongly prefer a solution that is directly tried to amp-geo in some fashion.

To play devil's advocate, doesn't the API already exist for outsiders?

(await fetch('https://cdn.ampproject.org/v0/amp-geo-0.1.js').then(r => r.text()).then(t => /"([a-z]{2}) {26}"/.exec(t)))[1]
@cramforce

This comment has been minimized.

Copy link
Member

@cramforce cramforce commented Jan 21, 2020

@mattwomple I think this is a good example of the line. If we break that regex and somebody files a bug here, then I would not in any way feel bad about just closing it.

@jridgewell

This comment has been minimized.

Copy link
Member

@jridgewell jridgewell commented Jan 21, 2020

I cannot make a decision on providing a amp-geo JSON API. That's way above my authority.

someone else on this thread can address version locking

There's a very real risk that the locally compiled v0.js will not be compatible with the amp-geo-0.1.js file served by the AMP JS CDN. That's one of the reasons we version lock to begin with.

But what are the chances the locally compiled version will match one of the official versions (say, /rtv/123)? Is the locally compiled version going to have the RTV "123"? Unless you're compiling from the exact same SHA, then no. So the only way you can get Runtime to request /rtv/123/amp-geo-0.1.js is if you're compiling the same SHA.

And if you have any uncommitted local modifications, then you'll get a different local RTV number. And there's no guarantee the property mangling will match (we've observed this before).

If you request the generic /v0/amp-geo-0.1.js, then we're getting really risky.

The only way local serving works without any issues is if you are capable of serving every extension. Providing a a amp–geo JSON API would solve that, but again, I can't make that decision.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Jan 22, 2020

Separate topic from amp-geo:

I have been revisiting my original suggestion to use a <meta> tag to update config.urls.cdn when an AMP document is loaded in singleDoc mode. See PR #25028 and section "Full self-hosting" in the design doc. This may not be the best path forward. Consider this...

Several AMP components import urls, e.g. import {urls} from '../../../src/config';. Since imports are written at compile time, these components have the urls object hardcoded into them. As #25028 current exists, these components do not recognize changes to urls.cdn when an ampdoc instance is created.

I can see a few options to move forward, but I'm not sure which is preferred:

  1. AMP pages include the following script which is automatically stripped by Google AMP Cache optimizer:
    <script id="amp-config">
        window.AMP_CONFIG = window.AMP_CONFIG || {};
        window.AMP_CONFIG.cdnUrl = 'https://example.com/myruntime';
    </script>
    
    This is by far the easiest solution, as config.js already has support for initializing certain urls members via environment (AMP_CONFIG) variables. The question is whether the AMP page could still be considered valid. Is it possible to ensure the script only writes to AMP_CONFIG and does nothing else?
  2. config.js gets new methods to propagate urls.cdn changes when they are detected. It's important to note that this propagation could occur later than some components need, or would require significant rewrites to change the order of operations in all affected components to "wait for urls promise, then proceed".
  3. AMP pages include the following JSON which is automatically stripped by the Google AMP Cache optimizer:
    <script id="amp-config" type="application/json">
      {
        "urls": {
          "cdn": "https://example.com/myruntime"
        }
      }
    </script>
    
    config.js gets a new method to check for document and then for #amp-config. If detected, then when urls is defined, JSON properties in #amp-config are merged into the final object. The caveat is that every file that includes import {urls} from 'path/to/src/config would get a bit bulkier and would be running the exact same check (check for element, decode JSON, write to urls object).
@sebastianbenz

This comment has been minimized.

Copy link
Contributor Author

@sebastianbenz sebastianbenz commented Jan 30, 2020

@Gregable @mdmower @zhouyx @jridgewell and I just met and discussed the the proposal. Here is what we came up with:

  1. Publishers need to serve the runtime using the same RTV pattern used by AMP Caches (unversioned URLs will not be supported by the validator).

  2. The runtime host needs to be encoded in the document via meta tag (as described in the original design doc):

    <meta name="runtime-host" content="https://example.com/optional/path/prefix">
    
  3. amp-geo: we will provide an unpatched version of amp-geo.js that supports calling a publisher hosted geo endpoint to retrieve the current country. The AMP Cache will not provide such an endpoint. Instead, publishers need to configure a geo endpoint via meta tag. amp-geo.js then calls the endpoint on load to retrieve the location:

    <meta name="amp-geo-api" content="https://example.com/location.json">
    

I'll update the design doc with more details in the next days. Feedback is welcome :-)

@zhouyx

This comment has been minimized.

Copy link
Contributor

@zhouyx zhouyx commented Jan 30, 2020

To add to the amp-geo approach

  1. The amp-geo.js will be updated to make another roundtrip request when the country code is not presented. There will still only be one version of this component.
  2. We are aware of the delay introduced by the roundtrip. Including content further blocked by <amp-consent> waiting for the geolocation, element with amp-geo-pending further blocked, and higher chances of UI shifting caused by appending the CSS with further delay.
  3. Subresourse integrity will not be applied, so that publisher can choose to patch the country code to the amp-geo.js if possible.
  4. New features added to amp-geo.js may require changes to the json response format as well. We will notice publishers in advance if such changes are required.
@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Jan 30, 2020

  1. Publishers need to serve the runtime using the same RTV pattern used by AMP Caches (unversioned URLs will not be supported by the validator).

Is this correct? I agree with with the rtv pattern requirement, but can't publishers still use unversioned URLs? I think they just need to follow the same logic that cdn.ampproject.org uses to fetch versioned components, e.g. if the AMP page includes script https://example.com/v0.js, and the runtime determines its own version is 123, then version locking results in components downloading from https://example.com/rtv/123/v0/amp-component-0.1.js. Briefly, the requirements are:

  • Non versioned URLs to latest runtime: https://example.com/
  • Versioned URLs to runtime: https://example.com/rtv/{rtv}/
  • Runtime metadata: https://example.com/rtv/metadata
@jridgewell

This comment has been minimized.

Copy link
Member

@jridgewell jridgewell commented Jan 31, 2020

I think @mdmower is correct. I don't think it's necessary for validator to require versioned URLs in the <script> srcs. It's only required that the publisher respond to https://example.com/rtv/${RTV}/* requests so that runtime may dynamically requests scripts (either due to version locking rejecting a mismatched version, or a missing script, etc).

@sebastianbenz

This comment has been minimized.

Copy link
Contributor Author

@sebastianbenz sebastianbenz commented Jan 31, 2020

It's only required that the publisher respond to https://example.com/rtv/${RTV}/*

This means simple static sites cannot self-host AMP, which is one of the use cases I'd like to support. Is there no way around this?

Also: doesn't the runtime check the version again after downloading or does it assume that it has the correct version if downloading from an RTV endpoint?

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Feb 1, 2020

This means simple static sites cannot self-host AMP

I'm not sure how you're reaching this conclusion. A publisher can opt to use RTV specific URLs, in which case they only need to host the runtime at example.com/rtv/123/, or they can use unversioned URLs, in which case they need to host identical runtimes at example.com/ and example.com/rtv/123/.

@sebastianbenz

This comment has been minimized.

Copy link
Contributor Author

@sebastianbenz sebastianbenz commented Feb 6, 2020

Got it - thanks for the clarification. In this case, I'd actually prefer to enforce RTV specific URLs via the validator to keep things simple.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Feb 6, 2020

Got it - thanks for the clarification. In this case, I'd actually prefer to enforce RTV specific URLs via the validator to keep things simple.

Sorry to drag this out, but I'm confused. Why? If a publisher would like to follow the same model as the AMP project where AMP pages reference a rolling AMP release, shouldn't they be able to do so?

@sebastianbenz

This comment has been minimized.

Copy link
Contributor Author

@sebastianbenz sebastianbenz commented Feb 6, 2020

Maybe I'm missing something. What are the advantages for a publisher in having a rolling AMP release if they need to provide an RTV'd endpoint as well?

The only reason I can see is that publishers can encode the import scripts in their templates and don't need to update it with every release. However, as self-hosting is only supported for transformed AMP, this should be done in the transformer anyway.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Feb 7, 2020

@sebastianbenz The RTV paths are still necessary because dynamically loaded components use the RTV paths when version locking is applied.

I don't think there's any reason to stipulate transformation by amp-toolbox's optimizer for self-hosting to be allowed. If one of the goals of this I2I is "independence", then a publisher ought to be able to simply swap out cdn.ampproject.org with runtime.example.org in their AMP documents, add a couple of meta tags, and the runtime does the rest. None of this requires any extra work by the runtime than the changes we've discussed to read the meta tags. I don't even think the validator requires additional work to support rolling release self-hosting URLs. This all matches the AMP project so closely that support is essentially the same.

@ithinkihaveacat

This comment has been minimized.

Copy link
Contributor

@ithinkihaveacat ithinkihaveacat commented Feb 7, 2020

The self-hosting experience seems a bit bumpy, especially if the developer was expecting a self-hosted "hello, world" to involve unzip and a text editor, or minor search-and-replace surgery of an existing AMP:

  • Even if I'm building a simple static site, I can't deploy the AMP JS wherever I want; I need to generate RTV-aware HTML and make the AMP JS available at an RTV-dependent location (at the root).
  • Even though I've had to make my HTML and/or server RTV-aware, AMP caches will (surprisingly) ignore the RTV.

I assume it's a non-goal that self-hosting is easy, perhaps make this explicit?

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Feb 7, 2020

@ithinkihaveacat I think that's a fair assessment. Version locking is pretty much the reason for the complex JS hosting requirements. I think a simpler hello world example could be possible without version locking, but I'm not sure anyone is onboard with that idea. I largely don't care anymore as long as we can get some self hosting support out the door in the near future. Long term, sure, I would like to see the AMP project loosen it's grip on how folks host the runtime.

@mdmower

This comment has been minimized.

Copy link
Contributor

@mdmower mdmower commented Feb 16, 2020

amphtml PR: #26829
amp-toolbox PR: ampproject/amp-toolbox#622
Self-hosted runtime sample project: https://github.com/mdmower/amp-self-host-demo/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.