Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Sync only new files? #116

Open
ChrisHughes opened this issue Oct 17, 2012 · 10 comments
Open

Performance: Sync only new files? #116

ChrisHughes opened this issue Oct 17, 2012 · 10 comments

Comments

@ChrisHughes
Copy link

On our rails project that has about 3000 images and 3000 scss files the asset sync time takes over 20 minutes to go up to Amazon S3.

We currently only upload the md5 tagged files.

I'd really like a feature that only uploads the files with md5 tags that are different (i.e. filename not present in last sync). For example every time a file is successfully uploaded, it logs that filename so that it doesn't have to keep checking the server for the presence of a file or reupload the same file. If we can skip directory listing and uploading unnecessary files this would speed up this process to a matter of minutes.

@davidjrice
Copy link
Contributor

@ChrisHughes hey, this feature is accomplished by #110

#110

Check it out using the asset_sync#turbosprockets branch and the turbosprockets-rails-3 gem!

I would class this as experimental as it is a backport of upcoming Rails 4 functionality.

@ndbroadbent
Copy link

@davidjrice - I'm not sure that turbo-sprockets-rails3 solves this problem. It definitely speeds up asset compilation, but it won't change any file uploading logic for asset_sync.

@ChrisHughes was talking about maintaining a local index of uploaded files, so that you don't need to keep requesting the directory listing from the server. It could be done with a new file at public/assets/uploaded_assets.yml. Non-digest assets would also need to be supported, so the file would need to contain their digests.

I'm not too familiar with S3 or asset_sync, but isn't the directory listing only fetched once per run? If so, that shouldn't take much time, and is nothing to worry about since the compilation bottleneck is fixed by the turbo-sprockets-rails3 gem.

@ndbroadbent
Copy link

P.S. It turns out that sprockets and sprockets-rails are heading in a different direction (while also speeding up assets), so my gem is only going to be relevant for Rails 3.2.x.

@davidjrice
Copy link
Contributor

Okay, reopening.

@ndbroadbent yes, listing is only fetched once per run.

@ChrisHughes did you try turbosprockets? improvement?

Also, @ChrisHughes how come you have so many assets! Understanding the source of the problem might help with further solution.

Regarding keeping track of what files have been uploaded by asset_sync, I get it! However it's not something I would be interested in implementing... as we don't have that problem. Maintaining a list of files that have been uploaded would infer that the assets must be generated locally and uploaded, an asset_sync uploads file generated and committed.

The benefit of asset_sync is that we do not have to version control bundled assets, or any other files relating to such. We just git push. There is no other step.

@davidjrice davidjrice reopened this Oct 25, 2012
@ChrisHughes
Copy link
Author

@davidjrice I did try using turbosprockets, gave a very small improvement in speed. Compile time isn't the issue as I have timed that versus the syncing part. The reason we have so many files is that we have an app with > 150 themes, compiled at ~ 20 mobile screen resolutions to ensure the designs are looking correct on every iOS / android phone out there. Not much I can do about the number of files unfortunately, it's already fully optimized.

@dashbitla
Copy link

We are in the same situation with 500+ themes - using themes_for_rails gem.
Would like to know how you guys solved the issue!?

The no. of files are even more in our case, using turbosprockets-rails-3 gem - giving stack level too deep error. Even increasing ulimit didn't help either.

Anyways of reducing the pre-compile time and syncing only the changed files to S3?

@ChrisHughes
Copy link
Author

We could not do much to improve it, but we found that the directory listing (i.e. reading what existing files are in the S3 bucket) was very slow, also uploading already existing files can be avoided. So extended the AssetSync::Storage.upload_files function to cache the directory listing (i.e. list of synced files) on a local drive. Then, as each are hashed based on content, only upload the files that are modified. Here is a working example: https://gist.github.com/ChrisHughes/f89a48743c99e135f285

@h0jeZvgoxFepBQ2C
Copy link

this is still prevalent in 2020... If you add webpacker, it uploads all files again, even when they are already existing..

@PikachuEXE
Copy link
Member

The new remote_file_list_cache_file_path option released a few version ago might be helpful
Try it

@hopewise
Copy link

We have similar case, were we build using AWS Codebuild, pre-compilation takes about 20 minutes each time, any advice is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@davidjrice @h0jeZvgoxFepBQ2C @dashbitla @ndbroadbent @PikachuEXE @hopewise @ChrisHughes and others