New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the handling of binary items #1321
Conversation
b444824
to
9d505ca
Compare
Hi, thanks for the PR! I’ve thought about this optimisation in the past, too. In the end, I decided against using hard links for this purpose, for two reasons:
For the reasons listed above, I’ll close the PR now, but I’m definitely keen on keeping the conversation going! |
Hi, thanks for your feedback. Indeed the change in semantics breaks a few assumptions that we usually assume with nanoc. My use case is this: I maintain a podcast with nanoc, in which the files are published together with the site itself. Each audio file is usually between 50 and 100MB. overall, I have so far ~2.4GB of audio files. I am concerned with both the speed of the builds and the space it uses. The builds for publication are done on the server side, so space efficiency is a little bit more relevant to me. I admit I hadn't tried with Nanoc 4.9 before, and just did it. indeed the speedup is considerable, and that is very nice. So IMO speed is fixed. Now, my only remaining concern is the duplicated space. Since the audio files are just copied bit-by-bit with no processing in Nanoc, it would be nice to be able to not duplicate that space. if everything goes right at some point this podcast will have tens of GBs of audio files. I am already considering removing the files from the Nanoc website, but maybe there is a low-hanging fruit in Nanoc that we could explore so that keeping the files in the nanoc website itself could be viable. |
Thanks for the reply! Your reasoning makes sense. I imagine that most writes are atomic anyway (meaning that file modification equals create file + rename, rather than truncate and write), which means that updating a file in I’ll reopen the PR. If you want, can you add a test to ensure that the files are hardlinked? (If not, I can do that as well.) |
I also wonder what the implications will be on Windows where ... hold your breath ... {hard,sym}links can by default only be created by administrators. |
As of Fall Creators Update 2017, when developer mode is enabled you don't need privileges to create links |
Is there a way to make hardlinks only if an item is subject to a |
9d505ca
to
17eb486
Compare
hey @ddfreyne, thanks for reconsidering. I have updated this PR with a test, however I am a bit suspiciouis of it, because 1) it will definitively fail if executed on a system where /tmp is a tmpfs and 2) it will probably fail on windows as well based on the input from @agross @agross could you please test this on Windows? I don't have access to a Windows system. |
@terceiro As I said, Ruby 2.5.0 on Windows just copies the file, I guess it won't pose a problem. |
yes, I understand that it will _work_. my point is that the test that I wrote, as I wrote it, will probably fail when running on Windows.
|
@terceiro Alright, I missed that part. The problem is that I can't even install the nanoc's dependencies on Windows. Perhaps you can just skip the test if Here's what I tested manually in irb: require 'fileutils'
=> true
IO.write('test', 'foo')
=> 3
FileUtils.ln('test', 'linked')
=> 0
File.stat('test').ino
=> 1125899906918010
File.stat('linked').ino
=> 1125899906918010
FileUtils.ln('test', 'linked-f', force: true)
=> 0
File.stat('linked').ino
=> 1125899906918010 |
well, I don't know enough about Windows to say that that is a hardlink, but it looks a lot like a hard link ;-) (so the test will pass) |
I think this PR needs a bit of (manual) testing. I’ll take a stab at it this weekend. (I’m paranoid about something breaking that we haven’t thought of.) I’ve been meaning to find a way to automatically test Nanoc on Windows (Travis CI doesn’t have Ruby on Windows). Not sure whether that exists as a Saas!
A passthrough rule is a syntactic shorthand, so it wouldn’t be easy to make an exception for |
https://github.com/agross/nanoc/blob/appveyor/appveyor.yml
|
@agross Ugh, I missed your comment before I started working on setting up AppVeyor! Anyway, not much work lost — and I’ll definitely take some stuff from your setup! |
note that given the tests that @agross did on Windows, hardlinks seem to
behave pretty much like on GNU/Linux, or at least similarly enough that
the test that I wrote will for sure pass on Windows.
|
@terceiro Can you rebase your branch on top of |
When writing large binary items (e.g. audio or video files), creating a hard link pointing to the original avoids duplicating the data on disk. This is specially useful for large binary items. Hard linking will fail when the source file is not on the same device as the output directory (e.g. when /tmp is a tmpfs). In that case, fallback to the previous behavior of copying, what does duplicate the space, but should always work.
17eb486
to
017dd43
Compare
@ddfreyne rebased |
Huh, strangely the AppVeyor build is not running. Will take a look at it later! |
Everything checks out. Merging! Thanks for your contribution! |
No description provided.