Skip to content

Added support for rolling back assets, and removing expired assets #281

Merged
merged 1 commit into from Nov 8, 2012
@ndbroadbent

Solution for assets rollback and invalidation

Following on from the discussions at #227 and #210, I decided it would be fun to have a proper go at solving this problem. Here are the changes I've made:

The deploy:assets:symlink task ensures the presence of a asset_manifests directory. This directory now holds all of the manifest.yml files for each release, under the names {release_name}.yml. At the end of the deploy:assets:precompile task, we move #{shared_path}/assets/manifest.yml to asset_manifests/#{release_name}.yml, and symlink it back to #{shared_path}/assets/manifest.yml. Before precompile, we make sure to replace the symlink with a copy of the manifest, so that the task doesn't write to the previous manifest through the symlink.

We chain a deploy:assets:update_previous_manifest task after deploy:create_symlink, so that after each deploy, the previous release's manifest.yml is updated with a new :retired_at time:

:retired_at: #{Time.now.to_s}

This records the time when the assets were retired from deployment, which is how we determine when they should be invalidated.

Cleaning up expired assets

The deploy:assets:cleanup_expired task is chained after deploy:cleanup.

It processes all the manifests in asset_manifests/*.yml, and marks each set of assets as expired if the corresponding release has been deleted (by deploy:cleanup), and the manifest's retired_at time is more than :expire_assets_after seconds old (defaults to 1 week.) Otherwise, we mark that set of assets to be preserved. We then remove the set of preserved assets from the set of expired assets, and end up with a set of assets that are safe to delete. These steps are necessary because the same assets will be shared between releases if their fingerprints haven't changed.

To clean up the manifest files in asset_manifests, we delete a manifest file only if it's release has been deleted, and it doesn't contain any unique, unexpired assets. This means that manifests are now decoupled from releases, and will only be removed once their assets have expired.

Rollbacks

There is now a deploy:assets:rollback task, which is chained after deploy:rollback:revision.
It symlinks asset_manifests/#{previous_release}.yml to shared/assets/manifest.yml, and then runs rake assets:precompile:nondigest in order to recompile the nondigest assets.
Note that if you are using my turbo-sprockets-rails3 gem, this will only take a couple of seconds since nondigest assets will be generated from the precompiled digest assets.

Please let me know if this all sounds reasonable. I haven't written tests yet, but can definitely add some before it's merged.

@jrochkind

nice. One detail, but I still think it should always be keeping assets from the immediately previous release, even if that release was older than 1 week or any other time. I am thinking there are cases where you want the assets which were there only 5 seconds ago (right when you did the new release) to be servable, to handle clients that may still be referencing them. Am I wrong?

That's a detail which would be easy to fix with your approach. Still thinking over all the implications of your general approach, but it for sure sounds good initially.

@ndbroadbent

Yep, it will always keep assets from all releases until they are cleared by the cleanup task - so the default is that it will always keep assets for at least the last 5 releases.

@jrochkind

Ah, wait, last_deployed_at is the point it was retired, not the point at which it was initially deployed! Okay, now i get that's handling what I was worried at. The key 'last_deployed_at' may be confusing here, as it's potentially confusing that it means the last time the manifest was operative, not the point it was 'deployed'. Is there a better name for this key? last_retired_at? I see that "last deployed at" can mean what it means, but I think it can be easily misread.

Yep, it will always keep assets from all releases until they are cleared by the cleanup task - so the default is that it will always keep assets for at least the last 5 releases.

Wait, what's last_deployed_at used for again then? I'm getting confused over the interactions between last_deployed_at, expired_assets, and preserved_assets. Can you explain it in terms of semantics rather than implementation? The goal is to preserve assets that.... what? Are in a non-cleaned-up release and (or or? I'm not sure) are not marked as expired? And what are the semantics of being marked as expried?

@jrochkind

In order to run this clean up task efficiently, I've written a Ruby script that is gsubbed with params, uploaded to the server's shared directory, and then executed remotely. (Wasn't sure if there was a way to reliably add rake tasks to the parent app from the capistrano gem, especially if developers only add the gem to the :development group.)

I'm a bit worried about the complexity of this too, and the possibility of it being fragile to rvm/rbenv/new versions of ruby/new versions of capistrano/other things. I think there's definitely no good way to add rake tasks to the parent app from the capistrano gem, that doesn't sound like a good idea either. But why can't this be done with ordinary capistrano techniques for executing commands on a remote server, instead of an uploaded ruby script? Hmm, because too much of it would have to be inconveniently written in bash instead of ruby to do that?

@ndbroadbent

Great point about the name, I've renamed that to :retired_at, which makes a lot more sense.

Ok, so the goal is to preserve assets that either belong to a release in the releases/ folder, or have been deployed less than a week ago (for example). This is why the manifest needs to be decoupled from the release, because the assets might need to stay around longer.

@ndbroadbent

The ruby script should run fine on whatever ruby is available. (There must be a ruby available somewhere or rake wouldn't run either.) It's a very minimal script which doesn't have any external dependencies.

The only other would be to download all of the manifests to the local machine, and process them there.. Which I didn't think was a very good idea. And I definitely wouldn't want to touch bash for this :)

@jrochkind
@ndbroadbent

Ok, I've changed it so that it downloads the manifests to a temporary directory, does all the processing on the local machine, then runs a command to deleted any expired files. I agree, that's much better!

@jrochkind

Doh, now I feel worried about -- what happens when there's more than one app server assets need to be compiled on? I have to say I'm feeling a bit overwhelmed trying to figure out everything that's going on and think through how robust it is -- this is complicated stuff. It's also clearly vitally needed stuff, something along the lines of what you are doing seems the only reasonable way to handle assets, thanks for working on it.

I think we need some more brains to review it and and some more testers.

@jrochkind

I wonder if it might make sense to first release this as a seperate gem, so people can try it before it's in cap -- and crucially, so even with latest versions of cap they still have the option to not use the seperate gem and go back to the 'old' cap behavior if the new func is causing problems. On the other hand, I'm not sure how much luck we'd have getting anyone to use such an extra gem. Nobody wants to think about deployment, heh.

(Note again, I'm still not a committer to cap. I think @leehambley might be the only active one. I'm just trying to help review and suggest, because this issue, and cap, concerns me!)

@ndbroadbent

Great point about multiple app servers, that's something I hadn't considered. I will set up a multistage environment to test that.

I also think it's wrong that the same assets should need to be compiled multiple times for each server. I wonder if it would be possible to compile them once, download them to your local machine, and then upload them to each of the other servers. For now, I'll keep it simple and just run the compile on each server.

It might be a good idea to release this as a gem. However, like you say, there might not be any point if it's only used by a few people, and It's easy enough to just add my fork of capistrano to your Gemfile:

gem 'capistrano', :github => 'ndbroadbent/capistrano', :branch => 'assets_rollback_and_expiry'

Thanks a lot for all your help and feedback so far! I really appreciate it

@jrochkind

Yeah, while it's not optimal to compile the assets multiple times on multiple servers -- I think it's safest to just continue to do that (as cap has always done) at this point, and not try to fix that too. The goal here (a very important and welcome one, and a very tricky one to handle all by itself, and it's awesome you are working on it) is just to make cap asset handling actually semantically do the right thing, handling edge cases properly -- for the first time in cap's history actually. So I would suggest not trying to do the asset compilation only once, which has it's own pitfalls, but rather just to make sure things work if there are multiple servers.

(If one DID want to do the app compilation only once, I think it would probably make sense to do THAT part on the local server. Maybe even do it on the local server and check assets (and manifest file revisions for this new func) into your source control repo, and just let them deploy naturally with the scm checkout! But that's a big difference from existing workflow, that would require you to have proper version of ruby/rails on your local server. Right now, it's not even technically a requirement to have a source checkout on your lcoal server, just the capfile pointing to it! Something with that large change of requirements should probably be an alternate option, rather than the default.)

@leehambley
Capistrano member

I'm the only committer of lately @carsomyr has been helping a lot, but I'm starting my own company at the moment and there's not a lot of time for open source in 75 hour working weeks. Sorry gents. ( @carsomyr also has Rubygems access, and has pushed the last couple of releases out )

FWIW I'm in favor of a separate Gem, I believe these stupid problems with the asset pipeline should be solved with the asset pipeline. I foolishly accepted the asset support into the core, although tightly binding Cap to Rails was one of the worst decisions in the project's history.

The asset pipeline is badly thought out, and doesn't perform well at all - Github and 37s both work on a "pull-fetch" deploy model, with their own tools not related to the Cap style.

I believe going forwards it should be assumed:

  1. Asset directory persists between releases, hang yourself if you have to deal with multiple servers.
  2. If you want to clean up your old assets, do it with rake or something.
  3. If you want to do that, hook into Rake's assets:precompile task and expire them there, not in a deploy tool.
  4. No solution in Cap (or belongs in Cap) for asset sync between servers.
@jrochkind

Oh, I must clarify. By multiple app servers, I did NOT mean multi-stage. Multi-stage is a different thing. I just mean more than one server specified with the 'app' and 'web' roles. In a single stage. Maybe that's what you mean too. Multi-stage should not be an issue.

@jrochkind

@leehambley I think there's no way to get the semantics of asset deploy right just being directly in the Rails rake tasks. To do it right, you need to pay attention to release history, which cap can know about, but a Rails app by itself does not.

However, of course it could be in an extra gem for use with cap rather than cap core.

Politically, I recognize the problem here: With current versions of rails, esp because of bundler and assets, there's pretty much no way to reasonably deploy without cap. Rails mission to have sensible defaults that just work and dont make the deveoper think for expected use cases -- falls down at deployment. Unless you have a cap that is maintained to do the right thing for Rails.

Yet the cap development team (just you!) does not have the resources, expertise, or interest, to support the gigantic rails community, and Rails shifting and under-documented deploy needs. Perhaps all rails-specific stuff needs to be extracted into a seperate "capistrano-rails" gem (with a cap dependency of course), and then time needs to be spent crying out for help from Rails devs and community -- this 'capistrano-rails' thing has no maintainers, Rails peeps if you want Rails to have 'sensible deploy defaults that just work' via cap, somebody better step up and take it over.

@jrochkind

@ndbroadbent okay,now maybe I'll contradict myself. especially based on lee's statement that " Github and 37s both work on a "pull-fetch" deploy model, with their own tools not related to the Cap style." I'm not sure what "pull-fetch deploy model" means here, but I wonder if it's what I was trying to think about before:

I wonder if the 'right' way to do this is actually:

Have cap run "bundle exec rake assets:precompile", as well as all the calculations you are doing with expired/preserved on the local server, then check the compiled assets into your source control. Then the deploy (on possibly multiple servers) gets the assets just as part of regular deploy.

The problem here is a bunch of extra requirements and the need for cap to do things that are tricky to do agnostically toward environment: You possibly need to make sure to have the right version of ruby on your local server, so that the 'rake' will actually work right. More difficultly, you need cap to have the ability to actually modify code in the source repo (add, change and delete asset files!), and do a 'git push' (if it's git), and be doing this in the right branch, and do it all agnostic to scm supporting any scm cap supports! Ugh, and even if you could pull that off, it would break the ability to do "cap deploy -S a_specific_tag", because cap would want to change the repo before deploying (first fixing it's assets), so would never want to deploy a tag as-is in the repo.

Ugh, nevermind, back to first suggestion, don't even try to do this, it's too hard/weird, just try to compile assets remotely on each server.

Man, assets make things a mess.

@ndbroadbent

@jrochkind, sorry, I did mean multi-server, not sure why I said multistage.

@leehambley, I'm also in favor of a separate gem, but not just for the asset pipeline - I think a capistrano-rails gem might be a good way to remove the Rails stuff from core. My asset expiry solution currently does need access to a list of previous releases and manifests, and it's working well so far. But maybe there's a better approach.

I have to be honest, my intention was to just fix capistrano's assets support, so I'm not sure if I can commit to extracting and maintaining a capistrano-rails gem. I don't really use capistrano any more, since I'm hosting most of my projects on Heroku. However, let me know if you think that's the best way forward, and I'll see what I can do.

@leehambley
Capistrano member

I have to be honest, my intention was to just fix capistrano's assets support, so I'm not sure if I can commit to extracting and maintaining a capistrano-rails gem. I don't really use capistrano any more, since I'm hosting most of my projects on Heroku. However, let me know if you think that's the best way forward, and I'll see what I can do.

That's always the problem with open source @ndbroadbent, Back in 2005/6 I just started updating the manual, here I am years later still doing my best to maintain things!

Open source is a trap

I'm also in favor of a separate gem, but not just for the asset pipeline - I think a capistrano-rails gem might be a good way to remove the Rails stuff from core. My asset expiry solution currently does need access to a list of previous releases and manifests, and it's working well so far. But maybe there's a better approach.

I'd love to see this too, but alas it's a question of time, and I simply don't have any. Unless someone can step up to maintain it, it won't happen.

I do imagine that situation could change, my biggest problem with Capistrano is that I truly believe it needs a rewrite, and I'm not motivated to work on the existing codebase, I have a WIP framework to replace Cap, with a much cleaner architecture where I would rather spend my effort, that project also has some support from a handful of others, more so than Capistrano seems to.

Fortunately for me, I'm in a position to push people towards something newer, and better if it turns out to be newer and better when I'm done working on it. (Which, again is not likely to happen this year, now)

@ndbroadbent

Haha, I've fallen into that trap myself a few times. I won't be able to be a very active maintainer, but I guess I can still help out whenever possible. I've still got friends using capistrano at my old job, and I'm sure I'll use it again one day.

As for the current problems with assets in capistrano, I think this pull request does a pretty good job at solving them. So what do you think are the chances of merging now, and splitting all the Rails stuff into a separate gem later? I'd be happy to stick around and solve any issues that people might have, so I won't be just dumping code and running away :)

@jrochkind

@leehambley and @ndbroadbent , I feel like maybe we should put out a call to rubyrailsland saying that capistrano needs help. Perhaps just for a rails-capistrano gem, as Lee doesn't neccesarily want more cooks on capistrano core? But Rails has gotten to the point where it's "sensible defaults that just work" philosophy fails when it comes to deployment; with bundler and assets, you really need capistrano or something like it (such as heroku's own stuff) to deploy sanely. Which requires maintenance on capistrano's rails integration, which it isn't getting.

(Although it kind of seems like you might need some help on capistrano in general too, I realize you might not want it. :) )

@DavidEGrayson

I think I have a way to solve this that is a lot simpler than the current pull request we are discussing.

Here's one idea: For seamless deployment, all you really need is to have the assets from one previous deployment available; you don't need all the assets you ever deployed (but my idea can be easily extended if you want that). So every release will have its own assets and a few symlinks back to the previous release for any assets that changed.

Implementation: Remove shared/assets from the design entirely. Add a task that runs after assets:deploy:precompile. For every regular file in the previous release's assets directory, if an asset with the same name does NOT exist in the new release's assets directory, symlink it into there.

Is there anything wrong with this?

I think this solution could be implemented with a relatively small amount of code. I'd be happy to make a patch if people are interested.

@ndbroadbent

Hi @DavidEGrayson, yes it does handle the rollback of non-digest assets by running assets:precompile:nondigest after symlinking the previous release's manifest.yml. You can use my turbo-sprockets-rails3 gem to make this task generate the non-digest assets in only a few seconds. Otherwise, it could take a minute or two to recompile the non-digest assets.

It sounds like your approach might just be a variation of what I'm doing in this pull request. Linking the individual assets instead of the shared/assets folder doesn't seem to have any additional benefits. Also, it doesn't really matter if you use hard links or symlinks, they'll both work fine in this case.

The asset precompilation will definitely write through a symlinked or hard-linked manifest.yml, so I handle this by removing the symlink before running the task.

A lot of the code in this pull request is related to asset expiry, so the rollback fix does only take a small amount of code.

@ndbroadbent

Also note that we're definitely going to get rid of non-digest assets in Rails 4+, so this rollback problem will only apply to Rails 3.2.x deploys.

@DavidEGrayson

Sorry, @nbroadbent, I was editing that comment extensively as I thought about it more. Please read the latest version if you haven't already. I promise I'll stop now!

@jrochkind
@DavidEGrayson

@jrochkind, please see my latest revision of that comment which doesn't use hard links. Sorry!

@ndbroadbent My understanding is that all of the code in the pull request that is used for asset expiry wouldn't be needed if we do things the way I suggested. The assets will be deleted from the system when the release they were in is deleted.

I guess you already know what's in the pipeline for Rails 4, so you are probably designing something here that will work nicely for Rails 3 and Rails 4. My idea as it stands wouldn't work so great in Rails 4 because it would force you to recompile every asset when you deploy, whereas it is hard to avoid that in Rails 3.

@ndbroadbent

Just re-read your comment. That was also my initial thought, but unfortunately, we do need to keep the old assets around for a while, and not just for the last couple of releases. The problem is to do with caching mechanisms that are outside our control, such as browsers. A cached page in a browser might still be requesting older assets for a while.

Only relying on the last couple of releases doesn't work because that doesn't guarantee anything about the times between releases. You might deploy 5 times in an hour, but someone could still be depending on assets that are more than an hour old.

This is the reason for the relatively complex expiry code. It expires assets only after their release has been removed, and only if they haven't been deployed for at least one week (default).

@DavidEGrayson

Ah, so you want the assets to sometimes live on longer than the release they were in?

@jrochkind
@ndbroadbent

@DavidEGrayson - Yep, that's the idea. It's certainly an edge case that isn't immediately obvious, but it's one that's important to support for those 1% of users that would see a broken website otherwise.

@jrochkind - I'm also on the fence about that, but see the following comment from Josh Peek: sstephenson/sprockets#367 (comment)

@jrochkind
@DavidEGrayson

OK, I understand now. The deploy:cleanup task removes all releases except the latest 5, and you are worried that someone could be using a cached HTML page somewhere that relies on assets from 6 releases ago, so really the assets should be removed based on retirement time.

Clearly I haven't thought about all the details as much as @nbroadbent has. It just seems that the current pull request we are discussing injects so much specific knowledge of the asset pipeline and manifest.yml into Capistrano, and a solution that involves less of that would be nice (if possible!). But it's probably not possible.

@carsomyr

@ndbroadbent While the retirement time calculation is nifty, it does introduce quite a bit of machinery. What do you say to consolidating retirement policy into Capistrano's default behavior of cleaning up old releases? Although this is not as powerful, I think it's a nice compromise because:

  1. People are already getting the ability to rollback, whereas previously there was none.
  2. People already know about Capistrano's n-releases back policy, and so learning about asset rollbacks will be easy.
  3. If they're mindful of point 2, users might not need the retirement time for practical purposes.
  4. The Pull Request becomes easier to merge and less risky.
@jrochkind
@carsomyr

@jrochkind Thanks for the reply. After thinking about it some more, I've identified a single point of control, keep_releases. Thus, I would recommend taking out the expiry date calculations for now. Afterwards, if anyone is motivated enough, they can make "teach" keep_releases how to accept a date.

@ndbroadbent

That's a great idea. If releases could be kept for a given calendar window then we wouldn't need to worry about assets.

If we go down this road, then assets should not be compiled to the shared/assets directory. Rather, each release should compile assets in their releases/***/public/assets directory, after assets from the previous release are copied in. After the compile, any missing assets from previous releases will be symlinked into the current release.
This way, each public/assets folder will contain a manifest.yml file, and each release will have it's own non-digest assets, so rollback will be extremely simple to support.

We will also need to add another task that runs before deploy:cleanup, and ensures that the old release's asset symlinks are removed from the more recent releases, before that old release is 'cleaned up'. Otherwise you'll get broken symlinks when old releases are deleted.

This pretty much voids the current pull request, but I can submit a new pull request with these ideas.

@DavidEGrayson

Hooray! My 2 cents is that everything @carsomyr and @ndbroadbent said in the last two messages sounds great. I don't think some broken symlinks matter that much, but I suppose it will be easy to clean them up. The broken symlinks could actually be useful as part of the historical record of deployments.

@jrochkind
@carsomyr

All, here's my understanding of the situation so far. Please correct if I'm wrong.

  1. The issue at hand (#210) is that Capistrano has trouble managing state in the event of a rollback, since it only has a shared manifest.yml.
  2. @ndbroadbent's proposed fix doesn't change existing deployment directory structures; instead, it adds an asset_manifests directory for storing old manifests.
  3. In the event of a rollback, the old manifest.yml is retrieved, and some math is done to "normalize" the shared asset state.
  4. @ndbroadbent's new proposal is to "carry" assets from release i to release i + 1. Calendar-based accounting of deployments then ensures that everything is nice from a policy standpoint.

If the above is true so far, then I am all for it. My only request is that we use deploy count-based accounting for now and alert users about the new policy. Afterwards, if anyone is motivated enough, they can modify keep_releases to accept a natural language integer or date (i.e., "2 days"): If an integer is given, the accounting will be deploy count-based; on the other hand, if a date is given, the accounting will be calendar-based.

@jrochkind
@carsomyr

@jrochkind More moving parts make me hesitant. They're fun at first, but things stop being fun when an obscure component breaks and you have to answer for it. @ndbroadbent's changes have two new moving parts, namely the asset_manifests directory and calendar-based accounting. Why not change the way Capistrano deploys? No more remembering the name of a new directory, and no breakage from the accounting mechanism, because we co-opt an existing one.

@jrochkind
@carsomyr

@jrochkind I think you misunderstand me. I believe that the calendar-based accounting mechanism belongs in a more important place. If I were to merge the PR right now, it would be relegated to asset manifest management duty. I believe that it should be integrated as part of Capistrano's main deployment policy. @ndbroadbent himself proposed the new deployment scheme of not having a shared assets directory. If he were to submit a patch for that, it could easily be split in two: the first part would key off of keep_releases in its current form, and the optional second part would modify the functionality of keep_releases to do calendar-based accounting, if the user so desires.

@leehambley
Capistrano member

@ndbroadbent I really like your solution of retired_at. I'd like to suggest that we try and get that into Sprockets directly, that when a manifest.yml exists, and a resource is replaced that the old one stays in the manifest and is retired. Do you think that'd be a sane place to take this?

I always get nervous about including workarounds for common problems with Rails into Capistrano, since they probably ought to be fixed by Rails/. I recognise that getting patches into Rails is like pulling teeth, and they move really slowly, but you've come up with a really neat solution that I'd like to see more widely adopted.

I also second @carsomyr on the notion of splitting the pull request into two. But more based on the fact that I'd like to see this feature go upstream to Sprockets.

Thoughts everyone?

Also, thank you all for your time, and some really worth-while conversation here, it's great to see the community running like a well oiled cooperation machine! :+1:

@jrochkind
@jrochkind
@ndbroadbent

Hi everyone,

I'm also working on an asset expiry solution for the Heroku platform, and during that process I've come up with a slightly different approach. The solution is baked into my turbo-sprockets-rails3 gem in the form of a rake task, and the only capistrano change will be a hook after deploy:assets:precompile.

Before assets:precompile is run, a file is created at public/assets/expire_assets_after.yml, containing the modification time of the current manifest.yml.
The assets:clean_expired task looks up this timestamp, and will only expire assets older than that baseline time. This supports Heroku, where you only need the concept of the current deploy, and the previous deploy. (For more details, please see the Removing Expired Assets section in the README: https://github.com/ndbroadbent/turbo-sprockets-rails3#removing-expired-assets)

In order to support Capistrano's multiple releases, we just need to set the time in expire_assets_after.yml to the date of the oldest release. That way, you will never expire assets that were created during or after your oldest release, and you will still keep older assets that are referenced in your current manifest.yml.

Here's how I see it working, in terms of support:

  • If someone is deploying a version of Rails <= 3.1.x, then they are out of luck.

  • If someone is deploying a Rails 3.2.x app, then they must use my turbo-sprockets-rails3 gem to provide the cleanup task. They must also manually register the deploy:assets:precompile after hook in their own config/deploy, which will replace the public/assets/expire_assets_after.yml timestamp with the time of their oldest release.

  • When Rails 4 is released, the assets:clean task will remove expired assets. This will be a built-in Rails feature. (In case you were wondering, assets:clobber will be the task that deletes all assets.)

Please let me know your thoughts on this approach. Cheers!

@jrochkind
@ndbroadbent
@jrochkind
@ndbroadbent
@jrochkind
@ndbroadbent

Hi again, everyone. Sorry if you're all getting sick of this discussion by now, but here's a final (hopefully) attempt at this pull request, with a slightly different approach. I think this is the best solution I've come up with so far. It basically provides the same behavior as my previous pull request, while removing some of the complexity and relying on Linux tools instead of Ruby to do most of the work.

Here's the general idea of my new approach: I'm leveraging mtime to show when the asset was last 'active'.

After the assets:precompile task is run, /public/assets/manifest.yml is copied to /assets_manifest.yml. This makes sure that a copy of the manifest is stored for each release, since the shared manifest.yml will be overwritten each time. The rollback task just copies this manifest back into shared/assets/, and recompiles the non-digest assets.

Before new assets are precompiled, we run a task called deploy:assets:update_asset_mtimes. This task fetches all the assets from the current manifest, and uses the touch command to update their mtimes to the current time. This effectively marks the time where the assets were last active.

After deploy:cleanup, we run deploy:assets:clean_expired. It parses all the manifests from each release folder, and then writes the list of all required assets to the server, at #{deploy_to}/REQUIRED_ASSETS. [1]
We then use the Linux find command to find all assets older than the expire_assets_after time, and delete them if their file-name isn't contained in the REQUIRED_ASSETS file.

I've used the same mtime approach in my turbo-sprockets-rails3 gem, which handles assets expiry for the Heroku buildpack. However, it has no concept of multiple releases, so Capistrano will need to handle this itself.

Anyway, you'll notice that the pull request is now quite a bit smaller, so I hope that will do. The release-based method we discussed above simply doesn't handle all the cases properly, and I really think this is the best solution.

[1] I'm doing this because 'grep'ping a YAML file can be unpredictable with rogue string encodings.

@carsomyr

@ndbroadbent Some questions/comments for you.

  1. What corner case does the release-based method not cover?
  2. You don't seem to modify anything in line 111.
  3. Is it right to look for just *.gz file patterns in the asset manifests?
  4. How does the new code interact with keep_releases?
  5. Don't use a while loop in lines 119-128; use comm (yes, it's standard issue).
@ndbroadbent
  1. The release-based method doesn't properly handle the case when you deploy 5 times in quick succession, and use up all of your 'release buffer' at once. This means that you could be deleting assets that were deployed only 5 minutes ago. My current approach handles that case, as well as the case where you choose to keep only 1 release.
  2. Good catch, fixed!
  3. It was previously stripping .gz from the filename, but now I've added the gzipped filenames to REQUIRED_ASSETS in order to use the comm command.
  4. As soon as a release is deleted, the asset is allowed to be deleted, but only after a certain period of time.
  5. Great idea about using the comm command! Have changed the pull request to use that.
@carsomyr
carsomyr commented Nov 5, 2012

@ndbroadbent Sorry for the delay. I have reviewed the changes, and barring some minor nitpicks, they look fine. I still wonder about keep_releases, however. What if the asset_manifest.yml logic were separated from the date math? While I understand that the former is necessary for rollbacks, what do you say to rewriting the deploy:cleanup task to accept a pure integer as well as a date parseable by a gem like Chronic? That would alleviate the need for the update_asset_mtimes task. Instead, the user would have a choice between a release-based strategy and a date-based strategy, depending on the value of keep_releases. I am proposing this because having the asset accounting mechanism and the deploy:cleanup mechanism introduces extra complexity: The user would have two variables to keep track of (expire_assets_after and keep_releases), as opposed to one (just keep_releases).

What do you think? Just throwing this idea out there so that others can chime in.

@jrochkind
@ndbroadbent

@jrochkind, thanks for your reply, and I agree with all your points.

RE: Easier date support: - I think Chronic would be overkill, and a better solution would be to use ActiveSupport::CoreExtensions::Numeric::Time. Then you could write set :expire_assets_after, 2.weeks.

RE: extra complexity - The user doesn't actually need to worry about setting expire_assets_after (or even know about it) if they're happy with the default value of 7 days, which is more than enough for most applications. No-one will need to change anything in their config/deploy.rb in order to start using this new assets life cycle, as long as they are running load 'deploy/assets'.

@carsomyr
carsomyr commented Nov 8, 2012

@ndbroadbent @jrochkind Some comments below.

  1. Good points on the use of a date library. I am fine with the expiration in seconds, although if there's something that's well known and standard issue, we could use that.
  2. I am still worried about the extra complexity. As a Capistrano user myself, it's quite unpleasant to have to remember the myriad of configuration variables, and I like to be in control. From my point of view, it comes down to the story we can tell the users. Story 1: Capistrano will keep old releases up to the number that you specify in the keep_releases variable. Assets will be expired based on the expire_assets_after variable. Story 2: Capistrano will keep old releases up to the number or a window of time (given as a string like "10 days ago") in the keep_releases variable. The current patch would be Story 1. My proposal would be Story 2. While I would prefer Story 2 because of its simplicity, Story 1 would be fine if you think it's more appropriate.
  3. Some minor cleanups of the code are necessary.
    1. Shell escape every variable.
    2. Add -- to separate positional arguments from optional arguments.
    3. In your use of find and comm, have find pipe into comm and use the - argument to denote STDIN, thus getting rid of the temporary file OLD_ASSETS.

Other than that, I think we're close. I tested the patch a few days back and everything seemed to be OK.

@ndbroadbent

@carsomyr - I've made all of the changes in point 3, and I think there is now adequate escaping and separation of positional arguments.

I do feel that Story 1 is "The Right Way™", and more appropriate. I think the current pull request implements it in a pretty clean way, and I especially like your suggestions to use the comm command, etc. That's some pretty slick bash scripting :)

As for point 1, I would still say that ActiveSupport::CoreExtensions::Numeric::Time is the best time library for that sort of configuration.

require 'active_support/core_ext/numeric/time'
set :expire_assets_after, 1.week

Capistrano could try to require it on behalf of the user, since ActiveSupport will be present for any Rails app.

Anyway, I've tested the patch and confirmed that everything is working as expected. Let me know if there's anything else that needs to be done.

@carsomyr
carsomyr commented Nov 8, 2012

@ndbroadbent We could go with ActiveSupport::CoreExtensions::Numeric::Time, but that's a discussion for a later time. It could even be required by the deploy recipe and not the Capistrano core. As for final preparations, the run and capture seem to lack escaping and --'s. Other than that, looks good. I'll try to merge this by the end of night tomorrow.

@ndbroadbent ndbroadbent commented on an outdated diff Nov 8, 2012
lib/capistrano/recipes/deploy/assets.rb
@@ -39,7 +43,29 @@
set :asset_env, "RAILS_GROUPS=assets"
DESC
task :precompile, :roles => assets_role, :except => { :no_release => true } do
- run "cd #{latest_release} && #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:precompile"
+ run <<-CMD.compact
+ cd #{latest_release} &&
+ #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:precompile &&
+ cp #{shared_path}/assets/manifest.yml #{current_release}/assets_manifest.yml
+ CMD
+ end
+
+ desc <<-DESC
+ [internal] Updates the mtimes for assets that are required by the current release.
+ This task runs before assets:precompile.
+ DESC
+ task :update_asset_mtimes, :roles => assets_role, :except => { :no_release => true } do
+ # Fetch assets/manifest.yml contents.
+ manifest_path = "#{shared_path}/assets/manifest.yml".shellescape
@ndbroadbent
ndbroadbent added a note Nov 8, 2012

manifest_path is escaped here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ndbroadbent
@carsomyr carsomyr commented on an outdated diff Nov 8, 2012
lib/capistrano/recipes/deploy/assets.rb
- run "cd #{latest_release} && #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:precompile"
+ run <<-CMD.compact
+ cd #{latest_release} &&
+ #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:precompile &&
+ cp #{shared_path}/assets/manifest.yml #{current_release}/assets_manifest.yml
+ CMD
+ end
+
+ desc <<-DESC
+ [internal] Updates the mtimes for assets that are required by the current release.
+ This task runs before assets:precompile.
+ DESC
+ task :update_asset_mtimes, :roles => assets_role, :except => { :no_release => true } do
+ # Fetch assets/manifest.yml contents.
+ manifest_path = "#{shared_path}/assets/manifest.yml".shellescape
+ manifest_yml = capture("[ -e #{manifest_path} ] && cat '#{manifest_path}' || echo").strip
@carsomyr
carsomyr added a note Nov 8, 2012

If you escape manifest_path, then the single quotes are actually incorrect. Let's escape things as they are being interpolated to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@carsomyr carsomyr commented on an outdated diff Nov 8, 2012
lib/capistrano/recipes/deploy/assets.rb
@@ -39,7 +43,29 @@
set :asset_env, "RAILS_GROUPS=assets"
DESC
task :precompile, :roles => assets_role, :except => { :no_release => true } do
- run "cd #{latest_release} && #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:precompile"
+ run <<-CMD.compact
+ cd #{latest_release} &&
+ #{rake} RAILS_ENV=#{rails_env} #{asset_env} assets:precompile &&
+ cp #{shared_path}/assets/manifest.yml #{current_release}/assets_manifest.yml
@carsomyr
carsomyr added a note Nov 8, 2012

These lines probably need escaping and some --'s, just in case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@carsomyr carsomyr commented on an outdated diff Nov 8, 2012
lib/capistrano/recipes/deploy/assets.rb
+ logger.info "Writing required assets to #{deploy_to}/REQUIRED_ASSETS..."
+ escaped_assets = current_assets.to_a.join("\\n").shellescape
+ run %Q{printf -- "#{escaped_assets}" | sort > "#{deploy_to}/REQUIRED_ASSETS"}, :silent => true
+
+ # Finds all files older than X minutes, then removes them if they are not referenced
+ # in REQUIRED_ASSETS.
+ expire_after_mins = (expire_assets_after.to_f / 60.0).to_i
+ logger.info "Removing assets that haven't been deployed for #{expire_after_mins} minutes..."
+ run <<-CMD.compact
+ cd #{shared_path}/assets/ &&
+ for f in $(
+ find * -mmin +#{expire_after_mins.to_s.shellescape} -type f | sort |
+ comm -23 -- - #{deploy_to}/REQUIRED_ASSETS
+ ); do
+ echo "Removing unneeded asset: $f";
+ rm -f -- $f;
@carsomyr
carsomyr added a note Nov 8, 2012

$f probably needs to be in quotes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@carsomyr carsomyr commented on an outdated diff Nov 8, 2012
lib/capistrano/recipes/deploy/assets.rb
+ else
+ logger.info "Fetched #{manifests.count} manifests from #{releases_path}/*/assets_manifest.yml"
+ current_assets = Set.new
+ manifests.each do |yaml|
+ manifest = YAML.load(yaml)
+ current_assets += manifest.to_a.flatten.flat_map do |file|
+ [file, "#{file}.gz"]
+ end
+ end
+ current_assets += %w(manifest.yml sources_manifest.yml)
+
+ # Write the list of required assets to server.
+ # Files must be sorted in dictionary order using Linux sort
+ logger.info "Writing required assets to #{deploy_to}/REQUIRED_ASSETS..."
+ escaped_assets = current_assets.to_a.join("\\n").shellescape
+ run %Q{printf -- "#{escaped_assets}" | sort > "#{deploy_to}/REQUIRED_ASSETS"}, :silent => true
@carsomyr
carsomyr added a note Nov 8, 2012

I see what you are trying to do here with the quoting. I believe that current_assets.join("\n").gsub("\"", "\\\"") is more correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ndbroadbent

OK, everything pushed

@carsomyr carsomyr merged commit 23452d6 into capistrano:master Nov 8, 2012

1 check passed

Details default The Travis build passed
@carsomyr
carsomyr commented Nov 8, 2012

Thanks to @ndbroadbent and everyone else who provided feedback!

@ndbroadbent
@carsomyr
carsomyr commented Nov 8, 2012

What do you guys say to putting together a Capistrano v3 requirements list? If we were to do this, where would we put the wiki? I'm cc'ing @leehambley on this.

@jeremy
jeremy commented Dec 5, 2012
The asset pipeline is badly thought out, and doesn't perform well at all -
Github and 37s both work on a "pull-fetch" deploy model, with their own
tools not related to the Cap style.

We use cap for asset precompiles. What is a "pull-fetch" deploy model?

@leehambley
Capistrano member

We use cap for asset precompiles. What is a "pull-fetch" deploy model?

That should have been "pull/fetch", I meant to say that as far as I was aware (2nd hand information, obviously) - but that 37s simply used the SCM's pull model and tags to control releases, rather than having one directory per release.

@jeremy
jeremy commented Dec 7, 2012

Ah, right - gotcha! We do directory per release, using remote_cache. Never used pull + tags :grin:

I'm ashamed to admit I didn't know cap got asset support built-in! We've always handled that with a deploy:update_code task (even before the asset pipeline rolled around).

@leehambley
Capistrano member

I'm ashamed to admit I didn't know cap got asset support built-in!

Don't be, I do a terrible job of release announcements!

@andrew

This doesn't work if shared/assets/manifest.yml doesn't exist, which is the case the first time its ran after updating capistrano.

Thanks, will add a condition to check that the file exists first.

Actually, it should work fine... the previous line should have run assets:precompile, which generates the manifest file into the shared directory. Are you sure your assets directory is shared and symlinked properly?

It could be a problem with my app, I upgraded a different app today and didn't have the same problem, I wouldn't worry about it.

:-1: It fails when you doing first release on empty server. There is no latest_release in such case.

:-1: this has broken my deployment... oh fun times...

I think it will also fail if you use a non default assets prefix

Still broken, making it a pain in the ass to deploy anything without any assets (a blank testing app.)

@jlecour

Enumerable#flat_map is a Ruby 1.9 only method.

Did Capistrano become a Ruby 1.9 only gem?

It didn't, https://github.com/capistrano/capistrano still lists Ruby ≥ 1.8.7

This is an bug I think.

Other people reported the same problem:
#356

The solution seems to be to downgrade to 2.13.5 for now
https://gist.github.com/4505518

Capistrano tests with 1.8.7 on Travis https://github.com/capistrano/capistrano/blob/master/.travis.yml

But that didn't catch this error, the tests are green https://travis-ci.org/capistrano/capistrano/jobs/4296207

Hey, sorry about this. That flat_map was my fault. I've opened a pull request to fix this. (#367)

Unfortunately, none of the deploy recipes are tested on Travis.

No problem, thank you for the fix. I wouldn't know how to easily test the recipes.

@jshafton

Dumb question for the group, or maybe at least @ndbroadbent -- what is the reason that Capistrano is now touching all assets after deployment? The end result of this is in my setup is invalidation of browser caching --

  1. Browser sends a request for application.js with an If-Modified-Since header set to the time of the previous deployment
  2. The web server looks at the asset, sees the modification time of the file is now set to the current deployment (despite no change), and returns 200 OK with a Last-Modified header set to the time of the current deployment

My expectation is that it would return 304 Not Modified with a Last-Modified header set to the time of the previous deployment.

I'm sure I'm missing something here and there's a good reason for doing this that I just haven't thought through; or I possibly have something else misconfigured that could be causing this. Can anyone provide some insight?

@ndbroadbent

Hi @jshafton, sorry about my delayed response. My reason for touching all assets after deployment was to keep track of which assets were currently deployed. The mtimes for unneeded assets would get older and older, until they were pruned by the expire task.

Unfortunately, I didn't realize that this would come at the cost of invalidating browser caching.

I do have a solution in mind. Instead of using mtimes to record the 'last deployed at' time, I will use an extra YAML file to record these timestamps, such as public/assets/last_deployed_at.yml.

This will make browser caching work properly again, as well as removing the need to touch lots of files.

@jshafton

@ndbroadbent - great, works for me. I'll keep an eye out for updates. Let me know if I can be of any assistance with testing.

@jrochkind
@mauriciopasquier

I think this task just deleted all the attachments I had in shared/assets. Maybe I was wrong putting them there in the first place. It caught me a bit by surprise, because I noticed the task cleaning old assets and thought Great!, but then a few deploys later...

Now I keep the attachements in another dir and link it after deploy. Is this the common/recommended practice? I'm kinda new to capistrano

@jrochkind
@mauriciopasquier

Hhm.. I hadn't thought about this earlier. I should have saved stuff in public instead of public/assets. Or, as recommended by Paperclip's README, in public/system. I don't know why I decided that storage path.

Well, thanks for the reply

@ScotterC ScotterC referenced this pull request in ndbroadbent/turbo-sprockets-rails3 Mar 20, 2013
Open

Asset rollback? #54

@alperkokmen alperkokmen added a commit to PagerDuty/pd-cap-recipes that referenced this pull request Jul 21, 2014
@alperkokmen alperkokmen add capistrano ~> 2.15 to gemspec as a dependency
pd-cap-recipes without capistrano doesn't make sense. thus, we are
adding it to gemspe.

2.15 because we would like to start using smarter asset compilation and
rollback features that were released in recent releases of capistrano.

capistrano/capistrano#281
b05d91b
@alperkokmen alperkokmen added a commit to PagerDuty/pd-cap-recipes that referenced this pull request Jul 21, 2014
@alperkokmen alperkokmen add capistrano ~> 2.15 to gemspec as a dependency
pd-cap-recipes without capistrano doesn't make sense. thus, we are
adding it to gemspe.

2.15 because we would like to start using smarter asset compilation and
rollback features that were released in recent releases of capistrano.

capistrano/capistrano#281
d1fd33f
@alperkokmen alperkokmen added a commit to PagerDuty/pd-cap-recipes that referenced this pull request Jul 22, 2014
@alperkokmen alperkokmen add capistrano ~> 2.15 to gemspec as a dependency
pd-cap-recipes without capistrano doesn't make sense. thus, we are
adding it to gemspe.

2.15 because we would like to start using smarter asset compilation and
rollback features that were released in recent releases of capistrano.

capistrano/capistrano#281
c59fe87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.