Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nanoc (almost) always crashing at build #1338

Closed
Doshirae opened this issue Apr 23, 2018 · 33 comments
Closed

Nanoc (almost) always crashing at build #1338

Doshirae opened this issue Apr 23, 2018 · 33 comments

Comments

@Doshirae
Copy link

Doshirae commented Apr 23, 2018

Well, the title says it all : when I happen to build my nanoc site, I have to build it several times before it eventually works

Steps to reproduce

Really I don't know what is causing it, so I can't help on this part…

Expected behavior

I expect the site to build properly

Actual behavior

First build attempt :
Errno::ENOENT: No such file or directory @ rb_sysopen - output/2018/04/le-post-rock/index.html

Second build attempt :
Errno::ENOENT: No such file or directory @ rb_sysopen - output/2018/04/le-post-rock/index.html

Third build attempt :
Errno::ENOENT: No such file or directory @ rb_sysopen - output/2018/01/le-blog-est-lisible-sur-mobile/index.html

Fourth build attempt :
Now it builds

Details

Obviously, all the files that nanoc thinks are missing are here on my filesystem

I also sometimes get errors like Errno::ENOENT: No such file or directory @ rb_file_s_link - (/tmp/nanoc20180423-21843-10r5xok/text_items/0, output/2018/04/le-post-rock/index.html) with this crashlog

And sometimes, it just builds normally

Crash log

Sorry, I'm using my own paste thingy, because Gist just deletes the newlines of the crash log, and I don't have the time or the patience to add them back

First crash log : https://p.patate-douce.me/4c93ff
Second crash log : https://p.patate-douce.me/e40cd0
Third crash log : https://p.patate-douce.me/045d3f

@denisdefreyne
Copy link
Member

Yikes!

Can you reproduce this problem with Nanoc 4.9.1 too? (Rather than the current, 4.9.2.)

@Doshirae
Copy link
Author

Ok, I can definitely reproduce it with 4.9.1, and even sometimes with 4.9.0 (but in that version, it is quite rare)

Could it be Ruby 2.5 that messed it up ? Or am I one of the only ones that get to have that ?

@denisdefreyne
Copy link
Member

Hmm — there are two possibly relevant changes in Nanoc recently that could be related:

  • In 4.9.0, Nanoc started using asynchronous writes.
  • In 4.9.2, Nanoc started using hardlinks instead of copying files.

But from the looks of it, even asynchronous writes shouldn’t be causing this particular kind of error.

Could you test out with Nanoc 4.8.9? That one might not be easy because of new features introduced in 4.9.0, so no worries if that’s not simple.

Some questions:

  • Does your filesystem have a setup where your Nanoc directory is spread over multiple filesystems? (Unlikely, I suppose…)

  • Might there be some process running that automatically removes empty directories from output/?

@Doshirae
Copy link
Author

After a bit of testing, it seems that Nanoc 4.8.9 is ok, no bug

Also, I had a preprocess rule that removed non-published files, but it didn't change anything having it or not :/

@denisdefreyne
Copy link
Member

Huh, this bug has me stumped. Some observations:

  • The exception happens in either

    • item_rep_writer.rb:48 in write_single
    • item_rep_writer.rb:55 in rescue in write_single

    write_single: Here it fails in FileUtils.identical?, because raw_path (or a component of the path) does not exist. But the code around it:

    is_created || !FileUtils.identical?(raw_path, temp_path)

    … means that FileUtils.identical? is only called when is_created is false, and

    is_created = !File.file?(raw_path)

    Put all together, FileUtils.identical? is only called when raw_path exists, yet the exception is raised when it does not exist. I’m confused.


    rescue in write_single: This exception happens further down the line, which means that FileUtils.identical?(raw_path, temp_path) sometimes works. The rescue happens because of Errno::EXDEV, which means /tmp is on a separate partition — which should be handled just fine.

    The exception is raised in FileUtils.cp(temp_path, raw_path), because raw_path supposedly cannot be written to, but before that line, there’s

    FileUtils.mkdir_p(File.dirname(raw_path))

    … which succeeds (otherwise it’d have thrown an exception).


@Doshirae A question for you: can you double-check that no other Nanoc process is running that could interfere?

@denisdefreyne
Copy link
Member

@Doshirae Is this still an issue for you?

@Doshirae
Copy link
Author

Doshirae commented Jun 8, 2018

Yes, nothing have changed.
I downgraded nanoc to 4.8.9 where the problem doesn't happens
I updated it to check, but the problem is still there it seems.
I tried removing the output/ folder to recompile everything, but it kept on crashing.
I made a simple loop to see if it would end at some point : while [[ $? -ne 0 ]]; do nanoc; done
It does end.

I guess I'm gonna try to patch the code to stop that from happening :/

@denisdefreyne
Copy link
Member

@Doshirae Is 4.8.9 the most recent version where this problem does not happen? (In other words, is 4.8.10 the first version where it happens?)

Narrowing down the problem to a specific version would help me pinpoint the problem — I’ve not had much luck identifying the problem so far.

@Doshirae
Copy link
Author

Doshirae commented Jun 8, 2018

I retried with 4.8.9
I removed the output/ directory to see if nanoc still crashes when compiling (Error::ENOENT each time, but not always the same error message.
Exemples :
No such file or directory @ rb_sysopen - output/archive/index.html
No such file or directory @ rb_file_s_link - (/tmp/nanoc20180608-15201-185jvuf/text_items/4, output/2018/02/le-blog-est-desormais-w3c-valide/index.html)
)

It does crash every first time and it seems to be uncorellated with the release I'm using.
I think there may just be a problem with my filesystem.
The first guess that comes to my mind may be something related with my /tmp directory

@denisdefreyne
Copy link
Member

@Doshirae This is still quite odd, because Nanoc supposedly creates the directory before creating files inside the directory. Which filesystem(s) are you using?

@Doshirae
Copy link
Author

I'm on a good ol' ext4, on an SSD drive

@denisdefreyne
Copy link
Member

@Doshirae Can you run the following script to see whether it crashes?

require 'fileutils'

dirs = [
  '/tmp',
  File.expand_path('~'),
]

TMP = 'jeOsTZxRlHRXwmJG6soAMM3LB2705b'

dirs.each do |dir|
  Dir.chdir(dir) do
    print "Testing in #{dir} (reusing dir)… "
    20.times do
      FileUtils.mkdir_p(TMP)
      File.write(TMP + '/a.txt', 'stuff')
      FileUtils.rm_rf(TMP)
    end
    puts "ok"

    print "Testing in #{dir} (new dir)… "
    20.times do |i|
      dir = "#{TMP}#{i}"
      FileUtils.mkdir_p(dir)
      File.write(dir + '/a.txt', 'stuff')
      FileUtils.rm_rf(dir)
    end
    puts "ok"
  end
end

puts "done"

It repeatedly creates directories and files in them, in quick succession. None of it should fail, and it doesn’t on my machine, but I’m wondering whether there’s any configuration of your filesystem that makes it not work.

@Doshirae
Copy link
Author

Doshirae commented Jul 7, 2018

It works fine and doesn't crash :/

@denisdefreyne
Copy link
Member

@Doshirae Could I get a copy of your site so that I can investigate by myself? I’m not making progress on this issue.

@Doshirae
Copy link
Author

Sure, I've made the files available here : https://files.doshi.re/Blog-files/
It's just a wget -np -r -nH https://files.doshi.re/Blog-files/ away (you really want those np and nH options, trust me)

@denisdefreyne
Copy link
Member

@Doshirae wget works — although I had to remove the index.htmls that wget creates.

I can build the site properly on my machine (macOS 10.13.6). I’ve also tested it out Debian with ext4 (Debian 9) where it also works properly. In both setups, I’ve done the compilation a few dozen times in quick succession, with no failures.

Can you share your /etc/fstab with me?

@denisdefreyne
Copy link
Member

Unrelated: I recommend Bundler for specifying the dependencies you use; it makes it easy to install dependencies on a new system. The Gemfile that I’ve used for your site is the following:

source 'https://rubygems.org'

gem 'builder'
gem 'kramdown'
gem 'nanoc'
gem 'nokogiri'
gem 'rouge'
gem 'sass'
gem 'stringex'

@Doshirae
Copy link
Author

  1. My fstab :
# Static information about the filesystems.
# See fstab(5) for details.

# <file system> <dir> <type> <options> <dump> <pass>
# /dev/mapper/vg-Root
UUID=92f29d6b-0d2d-4111-a473-496f81c06323	/         	ext4      	rw,relatime,data=ordered,discard	0 1

# /dev/mapper/vg-Home
UUID=8970b07e-e57d-429b-8999-0d76e901bd00	/home     	ext4      	rw,relatime,data=ordered,discard	0 2

# /dev/sdb1 LABEL=Boot
UUID=63DC-D712      	                        /boot/efi       vfat      	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro	0 2

# /dev/sda1
UUID=20a88671-980e-475b-a224-2f5c315edabd	/var      	ext4      	rw,relatime,data=ordered	0 2

# /dev/sda3
UUID=08d7f6ee-a19f-4ac4-8c8c-c0a2df2cbf67	/media    	ext4      	rw,relatime,data=ordered	0 2

# /dev/sda2
UUID=484454a6-f95d-459b-8c45-c4f685f886a3	none      	swap      	defaults  	0 0
  1. Why nokogiri and builder ?

@denisdefreyne
Copy link
Member

The colorize_syntax uses Nokogiri, and the #atom_feed helper uses Builder.

Going to take another look at this soon!

@denisdefreyne
Copy link
Member

With the same ext4 options that you’re using, I also still can’t reproduce the issue :(

@Doshirae
Copy link
Author

What a nightmare ><

Maybe is it because I'm on an encrypted partition ?

Anyway, my system is quite bloated and have some issues, I was planning on reinstalling it (but my laziness kept me).
Do you want to try to resolve the issue as is, or do you want me to reinstall to see if the bug persists ?

@denisdefreyne
Copy link
Member

@Doshirae I’m out of ideas as to what could cause this — if you feel like digging into the code of item_rep_writer.rb and doing some debugging myself, that would be quite useful.

Though if the issue no longer happens after a system reinstall, it might be worth closing it — and reopen it when/if the issue reappears.

@Doshirae
Copy link
Author

I have new clues from my computer !
I didn't reinstall yet, but I found out that nanoc creates output directories in ../ and ../../ respectively to the root folder of my blog. And I don't understand why it would do that.
Maybe it has to do with the rake tasks that I use to manage my blog ? I don't think that is the case, but that's the only trail I have right now
Here is the Rakefile

@ftemmerm
Copy link

For what it is worth, I am having exactly the same issue for some time (same crashes and output folders in the parent folders). Issue occurs on macOS 10.13.6 with nanoc 4.9.0 and Ruby 2.5. I don't have the issue compiling exactly the same site on my older MacBook installation with macOS 10.13.6 Nanoc 4.8.11 and Ruby 2.4.2.

@ftemmerm
Copy link

ftemmerm commented Sep 6, 2018

Downgraded to 4.8 for now, will follow this discussion to see when I can upgrade again.

@denisdefreyne
Copy link
Member

Thanks for the extra information I’m on holiday at the moment, but will take another closer look at this afterwards. The output problem definitely seems related.

@Doshirae The link to https://doshi.re/Rakefile.txt is broken, I’m afraid.

@denisdefreyne
Copy link
Member

Nanoc at the moment has an implicit dependency on the current working directory. This might be the cause of this problem. I’m working on removing that dependency now; this should hopefully fix this issue.

@denisdefreyne
Copy link
Member

@Doshirae @ftemmerm Can you check whether the problem has disappeared in master?

You can use Nanoc from master by changing the Gemfile to say

gem 'nanoc', github: 'nanoc/nanoc'

… then running bundle update nanoc, and finally bundle exec nanoc will run Nanoc from master.

@Doshirae
Copy link
Author

My issue seems solved, from what I tested
Great job Denis !

@denisdefreyne
Copy link
Member

Excellent… and sorry for taking so long to identify the source of the problem!

@denisdefreyne
Copy link
Member

The fix is released in Nanoc 4.9.5.

@Fjan
Copy link
Contributor

Fjan commented Sep 20, 2018

Just adding a little bit of info to this thread in case anyone comes Googling here: The current working directory in Ruby is global across threads. So if you have any code depending on the current working directory then a ruby process running in another thread can crash that.

In my case I had rake task running a second task in parallel to nanoc that crashed it, so completely outside nanoc.

There has been some discussion on the Ruby mailing list about this, but apparently it's not something that can be changed as the underlying OS is what defines this behaviour.

@denisdefreyne
Copy link
Member

@Fjan Good follow-up!

The consensus on global mutable state is that it’s bad, but global mutable state creeps up in non-obvious ways. Global mutable state isn’t just global variables ($foo in Ruby) — there’s more:

  • the current working directory
  • the environment
  • the filesystem
  • etc

I was aware that depending on the current working directory is problematic, but it slipped my mind and ended up not fixing the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants