Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving SPA index.html detection #4988

Open
EatonZ opened this issue Apr 26, 2020 · 4 comments
Open

Improving SPA index.html detection #4988

EatonZ opened this issue Apr 26, 2020 · 4 comments
Labels
enhancement New feature or request kv-asset-handler Relating to @cloudflare/kv-asset-handler

Comments

@EatonZ
Copy link
Contributor

EatonZ commented Apr 26, 2020

I have an Angular app. A standard Angular app consists of an index.html and several js bundle files.
In my Workers Sites app, the general Cache-Control setting I want to use for all assets is max-age=86400. There is 1 exception to this - the index.html. I want to use no-cache for that. However, differentiating the index.html from other assets has proven rather tricky.

Here is my code:

let isIndex = false;
response = await getAssetFromKV(event, { mapRequestToAsset: request =>
{
    //KV is case-sensitive. We will only publish files with names in lower-case, so force the request URL path to be lower-case.
    //https://community.cloudflare.com/t/workers-sites-missing-images/121633/9
    const defaultAssetKey = serveSinglePageApp(request);
    const defaultAssetKeyUrl = new URL(defaultAssetKey.url);
    defaultAssetKeyUrl.pathname = defaultAssetKeyUrl.pathname.toLowerCase();

    isIndex = defaultAssetKeyUrl.pathname.endsWith("index.html");

    return new Request(defaultAssetKeyUrl.toString(), defaultAssetKey);
}, cacheControl : { edgeTTL: 31536000 } }); //We cannot set browserTTL in here at this time because we don't know at this point if the requested asset will end up being index.html or something else.

if (isIndex) response.headers.set("Cache-Control", "no-cache");
else response.headers.set("Cache-Control", "max-age=86400");

As-is, that works and does what I need, but I feel it could be better.

Here's what I mean. I am aware that options has browserTTL. However, at the point where options need to be specified, I cannot determine whether the asset being processed is going to end up being index.html or something else. That's the job of the serveSinglePageApp function. As you can see in my code, I came up with a workaround.

Ultimately, I would like to be able to set options.browserTTL to make things look clean, but I'm not sure how that is possible given the way this is currently designed.

Thoughts?

@EatonZ
Copy link
Contributor Author

EatonZ commented Jul 2, 2020

Any thoughts on this?

@ashleymichal
Copy link
Contributor

perhaps we need to separate out the options for browser ttl on html files separate from other assets?

@ashleymichal ashleymichal added the enhancement New feature or request label Jul 7, 2020
@Cherry
Copy link
Contributor

Cherry commented May 17, 2021

I've had to tackle something like this in the past for someone, with very specific caching rules for different files in a project, set by Gatsby. For example:

/* Cache-Control templates used for caching long-term, or instructing browsers not to cache */
// We use `immutable` for supported browsers so this is cached perpetually, since URLs are cache-busted
// Because of this, we can't just use the `browserTTL` option in kv-asset-handler, since it doesn't support this
const CACHE_CONTROL_FOREVER = 'public, max-age=31536000, immutable';

// We use `max-age=0` and `must-revalidate` instead of a blanket `no-cache` so the client can be smart about ETag/Last-Modified validation
const CACHE_CONTROL_NEVER = 'public, max-age=0, must-revalidate';

// ... more bootstrap, regex, etc.

async function handleEvent(event) {
  const url = new URL(event.request.url);

  // Setup default kv-asset-handler options for browser and edge-based caching
  // Cache everything on the edge for 2 days by default, because these will be cleared on deploy
  // But _don't_ cache on the browser at all - browserTTL not set, `cache-control` header not sent
  // These defaults are safe and won't cause any issues with caching or new deploys. We'll override specific headers below where we can
  let options = {
    cacheControl: {
      edgeTTL: 2 * 60 * 60 * 24, // 2 days,
      browserTTL: null,
    },
  };

  try {
    if (DEBUG) {
      // customize caching
      options.cacheControl = {
        bypassCache: true,
      };
    }

    const originalResponse = await getAssetFromKV(event, options);
    const response = new Response(originalResponse.body, originalResponse);

    // Gatsby has pretty strict documentation on the best way to cache aspects of the site
    // If these aren't followed, weird side-effects with caching can occur, especially with a service worker
    // Reference: http://gatsbyjs.org/docs/caching
    if (
      url.pathname.startsWith('/static') ||
      (url.pathname.match(MATCH_JS_AND_CSS) && url.pathname !== '/sw.js') ||
      url.pathname === '/manifest.webmanifest'
    ) {
      // Set longer cache time for any file in /static, and any JS/CSS assets. These filenames are always cache-busted
      // The only real exception to this is the `sw.js` file, since this file's contents can change without the filename itself changing
      // We should also not cache the `manifest.webmanifest` file to prevent any changes to this file (favicons, etc.) being cached
      // Reference: https://www.gatsbyjs.org/docs/caching/#javascript-and-css and https://www.gatsbyjs.org/docs/caching/#static-files
      response.headers.set('cache-control', CACHE_CONTROL_FOREVER);
    } else if (url.pathname.match(MATCH_FEED)) {
      // Set small browser cache for the RSS feed
      options.browserTTL = 60 * 60; // 1 hour
    } else if (
      url.pathname.startsWith('/page-data') ||
      url.pathname === '/sw.js'
    ) {
      // Add Cache-Control header for page data and app data to instruct browsers to never cache, and always revalidate with the server
      // Also add this header for the `sw.js` as mentioned above
      // Reference: https://www.gatsbyjs.org/docs/caching/#page-data and https://www.gatsbyjs.org/docs/caching/#app-data
      response.headers.set('cache-control', CACHE_CONTROL_NEVER);
    } else if (response.headers.get('content-type').includes('text/html')) {
      // Add CSP header on HTML pages. This header isn't necessary on assets
      response.headers.set(
        // 'Content-Security-Policy-Report-Only',
        'Content-Security-Policy',
        url.hostname === 'staging.example.com'
          ? CSP_HEADERS_STAGE
          : CSP_HEADERS_PROD
      );
      // Add Cache-Control header for HTML pages to instruct browsers to never cache, and always revalidate with the server
      // Reference: https://www.gatsbyjs.org/docs/caching/#html
      response.headers.set('cache-control', CACHE_CONTROL_NEVER);
    }

    return response;
  } catch (e) {
    // if an error is thrown try to serve the asset at 404.html
    if (!DEBUG) {
      try {
        let notFoundResponse = await getAssetFromKV(event, {
          mapRequestToAsset: (req) =>
            new Request(`${new URL(req.url).origin}/404.html`, req),
        });

        return new Response(notFoundResponse.body, {
          ...notFoundResponse,
          status: 404,
        });
        // eslint-disable-next-line no-empty
      } catch (e) {}
    }

    return new Response(e.message || e.toString(), { status: 500 });
  }
}

It's not exactly the same as your use-case, but is similar in the sense of having very specific caching rules for very specific pages. I'm honestly not sure what the best solution here is, since it's so specific.

Netlify seems to offload this entirely to the user with a _headers file, whereas a lot of cache headers are abstracted by kv-asset-handler to suit the vast majority of static site use-cases. Perhaps per-mime type cache rules? Or perhaps just some extended documented detailing caching and how best to handle custom caching with your own cache-control headers if you need to do something complex.


To solve your immediate use-case, you could do something similar and check the parts of the URL, or the content-type response for text/html, and then override the cache-control header there.

@EatonZ
Copy link
Contributor Author

EatonZ commented Jul 10, 2021

Hi @Cherry, thank you for responding, and for your great code sample! It was really helpful.

First, regarding your comment on the CACHE_CONTROL_NEVER line, you can actually use no-cache to achieve the same result. Apparently Gatsby's docs were incorrect (see gatsbyjs/gatsby#18763). I believe no-cache is essentially a shorthand for public, max-age=0, must-revalidate. Feel free to correct me.

In relation to this issue, I see you are checking the asset response's Content-Type for HTML, instead of the file name like I am doing. That's a good idea, and cleans up my code nicely. browserTTL is basically unnecessary for me since I have more fine-grained Cache-Control values now.

My issue is ultimately solved, but I think there is room for improvement regarding Cache-Control. It seems like you could evolve browserTTL into something else to make it easier to construct a Cache-Control header. If you would consider improvements there, feel free to leave this issue open. Otherwise, if you're content with the way things work right now, you can close this issue.

@GregBrimble GregBrimble transferred this issue from cloudflare/kv-asset-handler Feb 9, 2024
@GregBrimble GregBrimble added the kv-asset-handler Relating to @cloudflare/kv-asset-handler label Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request kv-asset-handler Relating to @cloudflare/kv-asset-handler
Projects
None yet
Development

No branches or pull requests

4 participants