Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: elasticsearch PID file writing fails #8771

Closed
t-lo opened this issue Dec 3, 2014 · 6 comments · Fixed by #8775
Closed

Bug: elasticsearch PID file writing fails #8771

t-lo opened this issue Dec 3, 2014 · 6 comments · Fixed by #8775

Comments

@t-lo
Copy link

t-lo commented Dec 3, 2014

Prerequisites

  • elasticsearch master branch source tree (bug exists since a6510f9)

Steps to reproduce

  • build elasticsearch
    mvn -DskipTests=true package
  • extract the tgz:
    cd target/releases; tar xzf elasticsearch-2.0.0-SNAPSHOT.tar.gz
  • start elasticsearch, request a PID file:
    elasticsearch-2.0.0-SNAPSHOT/bin/elasticsearch -p /var/run/es-fail.pid

Expected result

  • elasticsearch starts up and writes its PID to the PID file.

Actual result
elasticsearch instantly produces a stack trace and crashes:

{2.0.0-SNAPSHOT}: pid Failed ...
- FileAlreadyExistsException[/var/run]
java.nio.file.FileAlreadyExistsException: /var/run
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:727)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:155)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)

Further information
Checking Bootstrap.java yields:

$  git blame -L 154,156 src/main/java/org/elasticsearch/bootstrap/Bootstrap.java
a6510f92 (Simon Willnauer 2014-12-02 21:28:51 +0100 154)                 Path fPidFile = Paths.get(pidFile);
a6510f92 (Simon Willnauer 2014-12-02 21:28:51 +0100 155)                 Files.createDirectories(fPidFile.getParent());
a6510f92 (Simon Willnauer 2014-12-02 21:28:51 +0100 156)                 OutputStream outputStream = Files.newOutputStream(fPidFile, StandardOpenOption.DELETE_ON_CLOSE);

Ping @s1monw :)

@s1monw
Copy link
Contributor

s1monw commented Dec 3, 2014

is it possible that /var/run already exists and it's not a directory?

@t-lo
Copy link
Author

t-lo commented Dec 3, 2014

/var/run already exists, and it is a directory (otherwise I'd meet all kinds of weirdness with my Linux). I tested this in a Debian 7 VM, too, with the same result.

I got two more failure modes:
1 If I use a path where I have write access to (say, /tmp/fail.pid) I get

$ bin/elasticsearch -p /tmp/bla
{2.0.0-SNAPSHOT}: pid Failed ...
- NoSuchFileException[/tmp/bla]
java.nio.file.NoSuchFileException: /tmp/bla
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
        at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434)
        at java.nio.file.Files.newOutputStream(Files.java:216)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:156)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)

2 when I use no path at all (i.e. specifying a PID file in the local working directory) I get

bin/elasticsearch -p bla
{2.0.0-SNAPSHOT}: pid Failed ...
- NullPointerException[null]
java.lang.NullPointerException
        at java.nio.file.Files.provider(Files.java:97)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:727)
        at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:155)
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32)

@s1monw were you able to reproduce the issue?

t-lo pushed a commit to t-lo/elasticsearch that referenced this issue Dec 4, 2014
This change fixes various failure modes on PID file creation which have been
introduced by using nio:
- check if PID file has a directory part
- Allow PID directory creation fail w/ FileAlreadyExistsException
  (thrown if directory exists and is a soft link)
- create PID file if it doesn't exist, truncate it if it does

fixes elastic#8771

Signed-off-by: Thilo Fromm <github@thilo-fromm.de>
@t-lo
Copy link
Author

t-lo commented Dec 4, 2014

@s1monw wrote

is it possible that /var/run already exists and it's not a directory?

Looking at this agan, actually, yes, it exists and is a soft link to a directory. D'uh.

The other failure modes' causes were

  • a PID file with no "path" part (creating the path fails with a Null pointer exception)
  • the file did not exist (open fails b/c the CREATE flag was not used)

All gone if you accept my pull request.

@s1monw
Copy link
Contributor

s1monw commented Dec 4, 2014

Looking at this agan, actually, yes, it exists and is a soft link to a directory. D'uh.

yes that is what I figured... yet there are more problems with this as you also found out. The real problem is that there is no tests for this at all which is a pain and needs to be fixed. I didn't see your PR until now and I fixed it this morning as well. I factored it out and added unittests so this doesn't happen again. I also found out that DELETE_ON_EXIST doesn't work on all OS so i went the shutdownhook path. Can you maybe try this PR too #8775

@t-lo
Copy link
Author

t-lo commented Dec 4, 2014

@s1monw I just verified #8775: Your pull request fixes all three failure modes.

Concerning the testing I think this class of problems is best addressed (in the long term) by automated system / deployment tests - you can't catch everything with unit testing.

@s1monw
Copy link
Contributor

s1monw commented Dec 4, 2014

Concerning the testing I think this class of problems is best addressed (in the long term) by automated system / deployment tests - you can't catch everything with unit testing.

I disagree - integration tests for this exists but they run too late and don't have enough variation. We need to have tests that fail while you develop everything else is error prone and from the last decade IMO. Stuff can happen once but never more than once because we need to add tests for it.

s1monw added a commit to s1monw/elasticsearch that referenced this issue Dec 4, 2014
This commit factors out the PID file creation from bootstrap and adds
tests for error conditions etc. We also can't rely on DELETE_ON_CLOSE
since it might not even write the file depending on the OS and JVM implementation.
This impl uses a shutdown hook to best-effort remove the pid file if it was written.

Closes elastic#8771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants