New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resiliency: Throw exception if the JVM will corrupt data. #7580
Conversation
Detect the worst-offender, known hotspot versions that can cause index corruption, and fail on startup.
@@ -195,11 +198,19 @@ public static void main(String[] args) { | |||
} | |||
|
|||
String stage = "Initialization"; | |||
try { | |||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes are unavoidable and auto-created by the eclipse configuration in elasticsearch :(
I left two comments - looks good in general |
"Please upgrade" is the last message the user gets in the exception, without having any clue where to upgrade to and which other versions are affected (maybe even the latest one is). Can we point the user somewhere, where he gets help instead of being unspecific here? A documentation page which contains unsafe versions we could link to or something? |
+1 for @spinscale comment. I really like when projects log errors with links to documentation. |
Who will make and maintain such a page and ensure it doesnt go 404? |
@rmuir the docs build process checks all links within the docs before pushing any changes live. it also has hooks for custom checks, eg we also check the links in the config.yml file to ensure that they are still correct. We could add a custom hook for this page too. |
I'm not really a webpage designer. I think this is blowing up the scope of the issue with unnecessary requirements. |
I think we should think of something somehow generic. Such as what Twitter4J does. The log says:
I think we should think about something similar but in another issue as we somehow hijacked this one :) |
Well, we need to document good versions anyway. It doesn't require any design, just a page of docs. This is versioned along with the code. You tell me what info needs to be on the page and I'll write it. |
Crashing with an empty SIGSEGV is better than today. It prevents data corruption. We don't even need an error message at all, a webpage is over the top. Its stupid to block this issue on such things. |
Rather than adding a link to the error message, which may well change in the future, let's just add "Please see the documentation" to the error message. Searching for "java version" or "jvm version" will bring up the docs section added in #7591 |
@clintongormley watch out for how personalized google can be - that probably would work for you or me but it might not work for someone who's searched for elasticsearch fewer times. Besides, if you move the page it should get a redirect. Or you could go all out and point to a version on the internet archive or something.... |
Ah! Cool. Can you link to a page that searches the Elasticsearch docs? Something like [this](http://www.elasticsearch.org/guide/?s=java version)? |
@rmuir I left some comments - I think it's close though! |
@rmuir can we move forward here it is marked as a blocker an I'd really like to move forward here |
I don't think the change will make it in. Please understand my point of view: I didn't just take a few minutes and whip up a quick patch here, i tested extensively with all impacted hotspot versions (both Oracle JDK and openjdk/IcedTea), including testing the workaround bypass, and testing that "good versions" are ok, etc. This took me a good bit of time. This is an exceptionally dangerous piece of functionality. With all the scope creep of this issue ("extra features" and "nice to haves" etc), comes the need to retest. There is no way I would touch this logic in a functional way, such as renaming parameters, without retesting. So we should decide what is really a priority, and what can be a followup issue. Currently I am fighting other fires, and I just dont have enough time to do it right, sorry. |
moved out to 1.5 |
I really think we should use what we have and expand in other issues. This PR is a good improvement and it's been sitting here for way too long. @rmuir can you rebase this branch and add maybe missing JVMs? all the other requests can be added later. |
+1 to get this in as is. |
Conflicts: src/main/java/org/elasticsearch/bootstrap/Bootstrap.java
I updated the check since i had to re-test all JVMs anyway. I refactored this out of bootstrap, made messaging more verbose, added support for compiler workaround flags, and added logic for IBM (because users have hit this on the mailing list). Example output with different versions: 1.7.0: $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck Exception in thread "main" java.lang.RuntimeException: Java version: 1.7.0 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-7070134 which can cause data corruption. $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -XX:-UseLoopPredicate org.elasticsearch.bootstrap.JVMCheck Mar 21, 2015 1:48:58 AM org.elasticsearch.bootstrap.JVMCheck check $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck Mar 21, 2015 1:59:57 AM org.elasticsearch.bootstrap.JVMCheck check 1.7.0_40: $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck Exception in thread "main" java.lang.RuntimeException: Java version: 1.7.0_40 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-8024830 which can cause data corruption. $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -XX:-UseSuperWord org.elasticsearch.bootstrap.JVMCheck Mar 21, 2015 1:51:03 AM org.elasticsearch.bootstrap.JVMCheck check $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck Mar 21, 2015 1:58:52 AM org.elasticsearch.bootstrap.JVMCheck check IBM: $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck Exception in thread "main" java.lang.RuntimeException: IBM runtimes suffer from several bugs which can cause data corruption. $ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck Mar 21, 2015 1:57:16 AM org.elasticsearch.bootstrap.JVMCheck check |
LGTM |
Detect the worst-offenders, all IBM versions and several known hotspot versions that can cause index corruption, and fail on startup. Provide/detect compiler workarounds when they exist, but warn about performance degradation. In all cases the check can be bypassed completely with a safety switch via undocumented system property (es.bypass.vm.check=true) Closes #7580
Detect the worst-offenders, all IBM versions and several known hotspot versions that can cause index corruption, and fail on startup. Provide/detect compiler workarounds when they exist, but warn about performance degradation. In all cases the check can be bypassed completely with a safety switch via undocumented system property (es.bypass.vm.check=true) Closes #7580
Detect the worst-offenders, all IBM versions and several known hotspot versions that can cause index corruption, and fail on startup. Provide/detect compiler workarounds when they exist, but warn about performance degradation. In all cases the check can be bypassed completely with a safety switch via undocumented system property (es.bypass.vm.check=true) Closes #7580
w00t! |
Detect the worst-offenders, all IBM versions and several known hotspot versions that can cause index corruption, and fail on startup. Provide/detect compiler workarounds when they exist, but warn about performance degradation. In all cases the check can be bypassed completely with a safety switch via undocumented system property (es.bypass.vm.check=true) Closes elastic#7580
Detect the worst-offenders, all IBM versions and several known hotspot versions that can cause index corruption, and fail on startup. Provide/detect compiler workarounds when they exist, but warn about performance degradation. In all cases the check can be bypassed completely with a safety switch via undocumented system property (es.bypass.vm.check=true) Closes elastic#7580
Detect the worst-offender, known hotspot versions that can
cause index corruption, and fail on startup.