Performance: Sync only new files? #116

Open
ChrisHughes opened this Issue Oct 17, 2012 · 7 comments

Projects

None yet

4 participants

@ChrisHughes

On our rails project that has about 3000 images and 3000 scss files the asset sync time takes over 20 minutes to go up to Amazon S3.

We currently only upload the md5 tagged files.

I'd really like a feature that only uploads the files with md5 tags that are different (i.e. filename not present in last sync). For example every time a file is successfully uploaded, it logs that filename so that it doesn't have to keep checking the server for the presence of a file or reupload the same file. If we can skip directory listing and uploading unnecessary files this would speed up this process to a matter of minutes.

@davidjrice
Contributor

@ChrisHughes hey, this feature is accomplished by #110

#110

Check it out using the asset_sync#turbosprockets branch and the turbosprockets-rails-3 gem!

I would class this as experimental as it is a backport of upcoming Rails 4 functionality.

@davidjrice davidjrice closed this Oct 22, 2012
@ndbroadbent

@davidjrice - I'm not sure that turbo-sprockets-rails3 solves this problem. It definitely speeds up asset compilation, but it won't change any file uploading logic for asset_sync.

@ChrisHughes was talking about maintaining a local index of uploaded files, so that you don't need to keep requesting the directory listing from the server. It could be done with a new file at public/assets/uploaded_assets.yml. Non-digest assets would also need to be supported, so the file would need to contain their digests.

I'm not too familiar with S3 or asset_sync, but isn't the directory listing only fetched once per run? If so, that shouldn't take much time, and is nothing to worry about since the compilation bottleneck is fixed by the turbo-sprockets-rails3 gem.

@ndbroadbent

P.S. It turns out that sprockets and sprockets-rails are heading in a different direction (while also speeding up assets), so my gem is only going to be relevant for Rails 3.2.x.

@davidjrice
Contributor

Okay, reopening.

@ndbroadbent yes, listing is only fetched once per run.

@ChrisHughes did you try turbosprockets? improvement?

Also, @ChrisHughes how come you have so many assets! Understanding the source of the problem might help with further solution.

Regarding keeping track of what files have been uploaded by asset_sync, I get it! However it's not something I would be interested in implementing... as we don't have that problem. Maintaining a list of files that have been uploaded would infer that the assets must be generated locally and uploaded, an asset_sync uploads file generated and committed.

The benefit of asset_sync is that we do not have to version control bundled assets, or any other files relating to such. We just git push. There is no other step.

@davidjrice davidjrice reopened this Oct 25, 2012
@ChrisHughes

@davidjrice I did try using turbosprockets, gave a very small improvement in speed. Compile time isn't the issue as I have timed that versus the syncing part. The reason we have so many files is that we have an app with > 150 themes, compiled at ~ 20 mobile screen resolutions to ensure the designs are looking correct on every iOS / android phone out there. Not much I can do about the number of files unfortunately, it's already fully optimized.

@dashbitla

We are in the same situation with 500+ themes - using themes_for_rails gem.
Would like to know how you guys solved the issue!?

The no. of files are even more in our case, using turbosprockets-rails-3 gem - giving stack level too deep error. Even increasing ulimit didn't help either.

Anyways of reducing the pre-compile time and syncing only the changed files to S3?

@ChrisHughes

We could not do much to improve it, but we found that the directory listing (i.e. reading what existing files are in the S3 bucket) was very slow, also uploading already existing files can be avoided. So extended the AssetSync::Storage.upload_files function to cache the directory listing (i.e. list of synced files) on a local drive. Then, as each are hashed based on content, only upload the files that are modified. Here is a working example: https://gist.github.com/ChrisHughes/f89a48743c99e135f285

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment