Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache SW: Support Precent Diversions #7914

Merged
merged 15 commits into from Mar 16, 2017

Conversation

jridgewell
Copy link
Contributor

This introduces the concept of RTV environments and the /diversions endpoint to the Cache SW.

The Cache SW now understands the "production" environment, and arbitrary diversion environments. The production env is whatever the env of the SW's version (the RTV is part of the Sw's bundled code). Any other env is considered a diversion.

Diversions are treated specially inside the stale-while-revalidate logic. Firstly, a diversion will never be served in place of (a "stale" serve) a request to a production env. Only cached scripts in the production environment can be stale served for a production request. Secondly, any request for a diversion will always be served with that version (hopefully from cache). That is, diversion requests are never stale served an older version. This is because stale serving a 3 week old diversion is useless for our data collection.

Now for the /diversions endpoint. The SW will maintain a cached request to the endpoint, and the endpoint returns a JSON encoded array of diversion RTVs (note, they must not be production ENV). Whenever the SW "revalidates" (grabs a newer copy of a production script in the background), we will issue requests for the current diversions of the script as well. This allows us to eagerly cache our diversions, making sure they will be served as speedily as our normal production scripts (and we don't see any false-negative stats).

The /diversions endpoint will be live soon. 🤞

@erwinmombay erwinmombay self-requested a review March 2, 2017 00:13
Copy link
Member

@cramforce cramforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


return fetchPromises[url] = cache.match(request)
.then(response => {
if (response && !expired(response)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only relevant for the diversions endpoint, right? Because rtv URLs practically never expire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.


// Did we receive a invalid response?
if (!response.ok) {
throw new Error(`failed fetching ${url}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more info here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a statusCode check. I'll add the statusCode it fails with.

date = headers.get('expires');
}

return Date.now() >= Number(new Date(date)) + age;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scary :)

I'd consider adding a shortcut for "reasonably close to infinity".

Wouldn't this consider everything expired if the machine's clock is in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider adding a shortcut for "reasonably close to infinity".

What do you mean?

Wouldn't this consider everything expired if the machine's clock is in the future?

Hmm, I can override the header on the Date header on our side, ensuring the response uses the user's clock time.

// Fetch all diversions of this file.
// This intentionally does not block the request resolution to speed
// things up.
diversions(cache).then(diversions => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to debounce this with like setTimeout(…, 10000) to get out of the way of the critical path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, and thought maybe (slightly) delaying the initial requests to batch them together would be a better approach. It would help this case, and would allow us to be smarter about which cached version to serve.

But, this isn't in the critical path. Notice this line isn't returned, the line two under this statement is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the critical path, but I'm seeing us bandwidth constraint downloading JS a lot!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, you are on 2G and only get .5 MBit/s. Then it makes a big difference to use the bandwidth for concurrent non-critical downloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

describe('with cache-control and date', () => {
describe('date in the past', () => {
beforeEach(() => {
response.headers.set('date', new Date(Date.now() - 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test case with actual date copy pasted from a real SFFE request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what SFFE generates.

// From SFFE Request
// date:Thu, 02 Mar 2017 00:36:19 GMT
// cache-control:public, max-age=31536000

new Date().toUTCString();
// => "Thu, 02 Mar 2017 00:37:11 GMT"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always better not to generate stuff in tests. Makes failures more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I can do this one, since it will start failing in a year. Maybe just a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a fake clock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

});
});

describe('with expires', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider deleting this code. Our responses will always have max-age

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dknecht
Copy link
Contributor

dknecht commented Mar 2, 2017

Anything changes we need to make to Cloudflare cache?

Returns a struct of all the data, instead of issuing multiple `#exec`
calls to grab a particular sub group.
This includes fetching JS from cache or network, purging old versions,
and a new expired response helper.
@jridgewell
Copy link
Contributor Author

jridgewell commented Mar 2, 2017

Anything changes we need to make to Cloudflare cache?

Not necessarily. If you plan on supporting 1% diversions of the AMP runtime, than you'll need to return a JSON array of the diversion's RTV. If not, then you don't have to do anything.

However, if you don't return a valid (200 status code) response with a cache time header, you'll end up getting a lot of /diversions requests. Just returning an empty JSON array response, with a 1 day cache control should save your servers the headache and speed up the SW.

@erwinmombay erwinmombay removed their request for review March 3, 2017 01:29
// Fetch all diversions of this file.
// This intentionally does not block the request resolution to speed
// things up.
diversions(cache).then(diversions => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, you are on 2G and only get .5 MBit/s. Then it makes a big difference to use the bandwidth for concurrent non-critical downloads.

describe('with cache-control and date', () => {
describe('date in the past', () => {
beforeEach(() => {
response.headers.set('date', new Date(Date.now() - 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a fake clock.

@jridgewell
Copy link
Contributor Author

Ping @cramforce

@jridgewell
Copy link
Contributor Author

It's live!

@jridgewell jridgewell merged commit c860a24 into ampproject:master Mar 16, 2017
@jridgewell jridgewell deleted the cache-sw-diversions branch March 16, 2017 19:03
mrjoro pushed a commit to mrjoro/amphtml that referenced this pull request Apr 28, 2017
* Reduce Regex#exec calls

Returns a struct of all the data, instead of issuing multiple `#exec`
calls to grab a particular sub group.

* Split helpers into separate methods

This includes fetching JS from cache or network, purging old versions,
and a new expired response helper.

* Support the /diversions endpoint

* Linting

* Update tests

* No need for global regex

* Remove expires header support

* Ensure diversions are not prod env

* Output statusCode in failed requests

* Avoid server/client time skew

* Fix tests

* Fix bug with batching fetch requests

They must each get there own clone of the response.

* Fetch diversions after 10 seconds

* Use exact date header strings from amp cache

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants