New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asset Manager should not include timestamp in published path #2500

Closed
brandonkelly opened this Issue Feb 26, 2018 · 18 comments

Comments

Projects
None yet
4 participants
@brandonkelly
Copy link
Member

brandonkelly commented Feb 26, 2018

Description

Files published into web/cpresources/ by Asset Manager will include a hash of the timestamp in the published path, which can lead to complications if running Craft in a load-balanced environment.

See yiisoft/yii2#8154

@brandonkelly brandonkelly added the bug label Feb 26, 2018

@khalwat

This comment has been minimized.

Copy link
Contributor

khalwat commented Feb 26, 2018

My imperfect but functional solution: https://gist.github.com/khalwat/a0687a8a8f0bf3ef5db7c371d88d3e04

@brandonkelly brandonkelly added enhancement and removed bug labels Mar 7, 2018

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Mar 13, 2018

@khalwat's solution solves a possible timestamp issue. Without the timestamp, the paths are consistent across all servers, even if the date of the source directory differs (which is possible).

However, it does not fully mitigate the problem in load balanced environments.

Think of this scenario:

  • 2 http nodes (server one & server two)
  • initial state: /web/cpresources does not exist
  • 1st request goes to /admin on server one
  • required AssetBundle files get generated on that server and html is rendered
  • from the html links point to static files
  • some static css/js files are requested from server one (that's okay)
  • other static files are requested from server two

It takes a few reloads (dynamic requests) until all static files are published on all servers.

IMO, there are two ways to solve this issue:

  1. Generate the file when it is requested (on-the-fly) if it does not exist
  2. Generate all files in a "build process" and deploy these files across all servers

Here is my approach for "Generate all files": https://github.com/fortrabbit/craft-asset-bundler

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Mar 14, 2018

@brandonkelly Is this something for Core or should we invest in this plugin?

@brandonkelly

This comment has been minimized.

Copy link
Member

brandonkelly commented Mar 20, 2018

OK ended up coming up with (I think) a good solution to this:

  • Asset folder/file modify times are no longer a factor in their published paths

  • To ensure that source directories are re-published when something changes, craft\web\AssetManager::publishDirectory() now sets the forceCopy option to true if it detects any changes with the help of our new FileHelper::hasAnythingChanged() method:

    if (!is_dir($dstDir) || FileHelper::hasAnythingChanged($src, $dstDir) === true) {
    $options['forceCopy'] = true;
    }

  • To ensure that browsers don’t keep loading old versions of any images/fonts/etc. that are referenced by CSS files, after re-publishing a directory, CSS files within the directory are now parsed for url() functions, and timestamps are added to their contained URLs. (Asset URLs that are embedded in the HTML already have timestamps.)

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Mar 20, 2018

Thanks for paying attention.
I'll review the changes later, but I think this does not fully solve the whole issue.

@khalwat

This comment has been minimized.

Copy link
Contributor

khalwat commented Mar 20, 2018

I'll defer to @ostark on this obviously, but to me the new solution feels like it's trying to be too tricky. Going in and modifying URL references in published asset bundles just... it seems like things could go wrong.

The issue is caused by the fact that the timestamp that is being used is coming from time() -- in other words, whatever time it is on the server that happens to be publishing the directory. This can obviously vary from server to server.

What if instead, you got the timestamp from the mtime of the asset bundle's published directory? This should be in sync on all of the servers, and accomplishes what we want: the timestamp only changes when the files that are being served changed.

If the mtime for the directory alone isn't enough, you could just make a hash out of the mtime for any (recursive) files that are inside of the published directory.

This would keep the servers in sync, and it'd work similar to how webpack and other tools will generate a manifest file at the time of publishing that reflects the hashed contents of the files being published. In our case, we're making a hashed manifest from the mtime of the files, but that should suffice here. https://www.npmjs.com/package/webpack-manifest-plugin

The old way of doing things was probably busting the cache too often/too much. Better to just bust the cache when the files have actually changed, not the time that it presently is when the bundle is being published, I'd think?

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Mar 21, 2018

@brandonkelly Now I do understand the reason for the timestamp in the path :-)

I agree with @khalwat, modifying URLs in css files to break the browser cache feels hacky. It does not solve the multi server scenario I described above. The issue is not only a possible wrong path, moreover, static files simply do not exist until a dynamic request copies them to the expected destination.

I would love to see something like a manifest.json - a single source of truth that is generated once and used on every server. This is not possible as we can add files on the fly.

I don't expect this to be solved before 4/4 - but we should be aware of the issue - it's not solved.

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Mar 21, 2018

With a global (cp)resource revision, cache busting will happen too often, but it does not overcomplicate things.

function hash($path) {	
	return $this->getRevision() . DS . md5($path);
}

function getRevision() {
	// TBD how it is generated and stored
}
@khalwat

This comment has been minimized.

Copy link
Contributor

khalwat commented Mar 21, 2018

Yeah @ostark that's a system I've used for websites, too. Rather than trying to keep a manifest of every single thing and give it its own hash, just have a global one, and when something changes (which honestly shouldn't be that often), the whole cache is busted.

That's another route that could be taken.

As for the global manifest that keeps track of a hash for each bundle, if it's calculated in a deterministic way from the contents of the files, you don't need to then sync that manifest. It's going to be the same on every server, since it's a hash derived from the file contents.

Related/looks interesting: https://github.com/pgaultier/yii2-webpack

@brandonkelly

This comment has been minimized.

Copy link
Member

brandonkelly commented Mar 22, 2018

It does not solve the multi server scenario I described above

@ostark adding timestamps to CSS URLs is for preventing browsers to load old cached versions of referenced images, fonts, etc., which is only an issue since we took a timestamp aspect out of the published directory path. But it’s not meant to help solve the original issue; that is only solved by the first point I made.

Have you actually tested the change to confirm? I’m pretty sure this fix does in fact solve the multi-server issue. Happy to debate over what timestamp to add to the CSS URLs, or whether there is a better approach to solving that problem, but that is a separate issue.

@brandonkelly

This comment has been minimized.

Copy link
Member

brandonkelly commented Apr 17, 2018

Took another stab at this today (3924a74). As of 3.0.3:

  • Resource hashes once again take the date modified into account.
  • Hash/path mappings are stored in a new resourcepaths DB table
  • Craft now intercepts 404-ing resource requests, and checks to see if the hash has been saved in the resourcepaths table. If so, it will publish the directory on the fly, and respond with the requested resource’s contents.
  • Resources are generally not eager-published anymore (asset bundles being an exception here).

Resource flow:

  1. Browser requests HTML page
  2. HTML response includes resource URLs
  3. Browser requests resource URL (possibly to a different web server; doesn’t matter anymore)
  4. If resource already exists, Nginx/Apache serves the resource immediately. Otherwise Craft intercepts the request, publishes the resource, and returns its contents

If there are multiple web servers, the worst case here is that a single resource might need to get published multiple times, if its date modified differs between two servers. Which I can live with, as it simplifies everything else. (No more hacking the URLs in CSS files, etc.)

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Apr 17, 2018

@brandonkelly
Thanks for another attempt. Haven't tried it out yet, but when I read the code I'm missing the part where the files get published "on the fly".

What happens if the hash is found in the DB, but not in the filesystem?

What I don't like is the DB overhead: the hash() method gets called 15-30 times. The queries are fairly simple but ...

@brandonkelly

This comment has been minimized.

Copy link
Member

brandonkelly commented Apr 17, 2018

I should have mentioned, that the problem with the last attempt was that it still required the resources to be published on both servers before they could be served correctly, if the initial page request hits Server A, and the resource request hits Server B. So we were still seeing a fair amount of 404s on load balanced environments.

I'm missing the part where the files get published "on the fly".

In craft\web\Application. We now call a _processResourceRequest() method at the beginning of processRequest():

cms/src/web/Application.php

Lines 144 to 147 in 3924a74

public function handleRequest($request): Response
{
// Process resource requests before anything else
$this->_processResourceRequest($request);

which checks the requested URI and compares it to the resourceBaseUrl config setting. If it looks like a resource request, then it will find the requested resource (using the resourcepaths DB table to resolve the folder hash) and publish it, and then return the file’s contents. Once the file has been published, future requests for the same file will be served directly by Ngnix/Apache, and Craft won’t be loaded at all.

What happens if the hash is found in the DB, but not in the filesystem?

As described above. The request goes to Craft instead of the file, Craft publishes the file, and returns its contents.

What I don't like is the DB overhead: the hash() method gets called 15-30 times. The queries are fairly simple but ...

Brad and I spent a couple hours yesterday mulling over this stuff, and this is really the only decent option we could think of, short of requiring all load balanced web servers to share the same cpresources/ folder, which we realize isn’t an option in Fortrabbit’s case.

@ostark

This comment has been minimized.

Copy link
Contributor

ostark commented Apr 18, 2018

Unfortunately, the issue is not fully solved.

It seems that AssetsThumbs (S3 Volume) are only generated on one server. I think it is the server that generates the HTML response. The next request that goes to cpresources/{hash}/thumb-60x60.png is handled by _processResourceRequest() , but the files does not exist in storage/runtime/assets.

2018-04-18 11:02:44 [62.158.214.152][1][3dc111f6df71d44d30412489de5402f8][error][yii\base\InvalidArgumentException] yii\base\InvalidArgumentException: The file or directory to be published does not exist: /srv/app/proprod/htdocs/storage/runtime/assets/thumbs/383 in /srv/app/proprod/htdocs/vendor/yiisoft/yii2/web/AssetManager.php:456
Stack trace:
#0 /srv/app/proprod/htdocs/vendor/craftcms/cms/src/web/AssetManager.php(34): yii\web\AssetManager->publish('/srv/app/propro...')
#1 /srv/app/proprod/htdocs/vendor/craftcms/cms/src/web/Application.php(445): craft\web\AssetManager->getPublishedPath('/srv/app/propro...', true)
#2 /srv/app/proprod/htdocs/vendor/craftcms/cms/src/web/Application.php(147): craft\web\Application->_processResourceRequest(Object(craft\web\Request))
#3 /srv/app/proprod/htdocs/vendor/yiisoft/yii2/base/Application.php(386): craft\web\Application->handleRequest(Object(craft\web\Request))
#4 /srv/app/proprod/htdocs/web/index.php(25): yii\base\Application->run()
#5 {main}
2018-04-18 11:02:44 [62.158.214.152][1][3dc111f6df71d44d30412489de5402f8][info][application] $_GET = [
    'p' => 'cpresources/34a5ea06/thumb-60x60.png'
    'v' => '123456789'
]

@brandonkelly
I solved this here fortrabbit/craft-asset-bundler@c8662ce

@brandonkelly

This comment has been minimized.

Copy link
Member

brandonkelly commented Apr 18, 2018

Gah yes you’re right, we’d still need to deal with remote images. Hm…

@timkelty

This comment has been minimized.

Copy link
Contributor

timkelty commented Apr 24, 2018

Should this be reopened then, or is there a separate issue to track remote image thumbs?

@brandonkelly

This comment has been minimized.

Copy link
Member

brandonkelly commented Apr 24, 2018

@timkelty yeah at this point a new issue would probably be better. The original issue reported here has been resolved.

@timkelty

This comment has been minimized.

Copy link
Contributor

timkelty commented Jun 28, 2018

So I'm pretty sure this was working at some point with @brandonkelly's updates, but I just updated (3.0.13.2) and now I'm getting all kinds of cpresources 404s in the control panel in my load balanced environment. Did something get reverted?

Update: scratch that - I had some nginx rules for expires headers that were overriding try_files. 😛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment