Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resiliency: Throw exception if the JVM will corrupt data. #7580

Closed
wants to merge 3 commits into from

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Sep 4, 2014

Detect the worst-offender, known hotspot versions that can
cause index corruption, and fail on startup.

Detect the worst-offender, known hotspot versions that can
cause index corruption, and fail on startup.
@@ -195,11 +198,19 @@ public static void main(String[] args) {
}

String stage = "Initialization";
try {
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed change :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes are unavoidable and auto-created by the eclipse configuration in elasticsearch :(

@s1monw
Copy link
Contributor

s1monw commented Sep 4, 2014

I left two comments - looks good in general

@spinscale
Copy link
Contributor

"Please upgrade" is the last message the user gets in the exception, without having any clue where to upgrade to and which other versions are affected (maybe even the latest one is).

Can we point the user somewhere, where he gets help instead of being unspecific here? A documentation page which contains unsafe versions we could link to or something?

@dadoonet
Copy link
Member

dadoonet commented Sep 4, 2014

+1 for @spinscale comment. I really like when projects log errors with links to documentation.

@rmuir
Copy link
Contributor Author

rmuir commented Sep 4, 2014

Who will make and maintain such a page and ensure it doesnt go 404?

@clintongormley
Copy link

@rmuir the docs build process checks all links within the docs before pushing any changes live. it also has hooks for custom checks, eg we also check the links in the config.yml file to ensure that they are still correct. We could add a custom hook for this page too.

@rmuir
Copy link
Contributor Author

rmuir commented Sep 4, 2014

I'm not really a webpage designer. I think this is blowing up the scope of the issue with unnecessary requirements.

@rmuir rmuir closed this Sep 4, 2014
@dadoonet
Copy link
Member

dadoonet commented Sep 4, 2014

I think we should think of something somehow generic. Such as what Twitter4J does.
For each error you get, you have an associated Error code which is unique.

The log says:

Relevant discussions can be found on the Internet at:
    http://www.google.co.jp/search?q=93f2523c or
    http://www.google.co.jp/search?q=8b74a4e3
TwitterException{exceptionCode=[93f2523c-8b74a4e3], statusCode=401, message=Invalid or expired token, code=89, retryAfter=-1, rateLimitStatus=null, version=3.0.3}

I think we should think about something similar but in another issue as we somehow hijacked this one :)

@clintongormley
Copy link

Well, we need to document good versions anyway. It doesn't require any design, just a page of docs. This is versioned along with the code. You tell me what info needs to be on the page and I'll write it.

@rmuir
Copy link
Contributor Author

rmuir commented Sep 4, 2014

Crashing with an empty SIGSEGV is better than today. It prevents data corruption. We don't even need an error message at all, a webpage is over the top. Its stupid to block this issue on such things.

@clintongormley
Copy link

Rather than adding a link to the error message, which may well change in the future, let's just add "Please see the documentation" to the error message. Searching for "java version" or "jvm version" will bring up the docs section added in #7591

@nik9000
Copy link
Member

nik9000 commented Sep 4, 2014

@clintongormley watch out for how personalized google can be - that probably would work for you or me but it might not work for someone who's searched for elasticsearch fewer times. Besides, if you move the page it should get a redirect. Or you could go all out and point to a version on the internet archive or something....

@clintongormley
Copy link

@nik9000 i meant the search in the ES docs :)

won't work now, but will when #7591 is merged

@nik9000
Copy link
Member

nik9000 commented Sep 4, 2014

@nik9000 i meant the search in the ES docs :)

Ah! Cool. Can you link to a page that searches the Elasticsearch docs? Something like [this](http://www.elasticsearch.org/guide/?s=java version)?

@s1monw s1monw removed the review label Sep 5, 2014
@s1monw
Copy link
Contributor

s1monw commented Sep 5, 2014

@rmuir I left some comments - I think it's close though!

@clintongormley clintongormley changed the title [Bootstrap] Throw exception if the JVM will corrupt data. Resiliency: Throw exception if the JVM will corrupt data. Sep 8, 2014
@s1monw
Copy link
Contributor

s1monw commented Sep 10, 2014

@rmuir can we move forward here it is marked as a blocker an I'd really like to move forward here

@rmuir
Copy link
Contributor Author

rmuir commented Sep 10, 2014

I don't think the change will make it in. Please understand my point of view:

I didn't just take a few minutes and whip up a quick patch here, i tested extensively with all impacted hotspot versions (both Oracle JDK and openjdk/IcedTea), including testing the workaround bypass, and testing that "good versions" are ok, etc. This took me a good bit of time.

This is an exceptionally dangerous piece of functionality. With all the scope creep of this issue ("extra features" and "nice to haves" etc), comes the need to retest. There is no way I would touch this logic in a functional way, such as renaming parameters, without retesting.

So we should decide what is really a priority, and what can be a followup issue.

Currently I am fighting other fires, and I just dont have enough time to do it right, sorry.

@s1monw
Copy link
Contributor

s1monw commented Sep 11, 2014

moved out to 1.5

@s1monw
Copy link
Contributor

s1monw commented Mar 20, 2015

@rmuir @rjernst should we revisit this or should I just close it for now?

@s1monw
Copy link
Contributor

s1monw commented Mar 20, 2015

I really think we should use what we have and expand in other issues. This PR is a good improvement and it's been sitting here for way too long. @rmuir can you rebase this branch and add maybe missing JVMs? all the other requests can be added later.

@rjernst
Copy link
Member

rjernst commented Mar 20, 2015

+1 to get this in as is.

@rmuir
Copy link
Contributor Author

rmuir commented Mar 21, 2015

I updated the check since i had to re-test all JVMs anyway.

I refactored this out of bootstrap, made messaging more verbose, added support for compiler workaround flags, and added logic for IBM (because users have hit this on the mailing list).

Example output with different versions:

1.7.0:

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck

Exception in thread "main" java.lang.RuntimeException: Java version: 1.7.0 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-7070134 which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.
If you absolutely cannot upgrade, please add -XX:-UseLoopPredicate to the JVM_OPTS environment variable.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -XX:-UseLoopPredicate org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:48:58 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: Workaround flag -XX:-UseLoopPredicate for bug https://bugs.openjdk.java.net/browse/JDK-7070134 found.
This will result in degraded performance!
Upgrading is preferred, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:59:57 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: bypassing jvm version check for version [1.7.0], this can result in data corruption!

1.7.0_40:

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck

Exception in thread "main" java.lang.RuntimeException: Java version: 1.7.0_40 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-8024830 which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.
If you absolutely cannot upgrade, please add -XX:-UseSuperWord to the JVM_OPTS environment variable.
Upgrading is preferred, this workaround will result in degraded performance.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -XX:-UseSuperWord org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:51:03 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: Workaround flag -XX:-UseSuperWord for bug https://bugs.openjdk.java.net/browse/JDK-8024830 found.
This will result in degraded performance!
Upgrading is preferred, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:58:52 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: bypassing jvm version check for version [1.7.0_40], this can result in data corruption!

IBM:

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck

Exception in thread "main" java.lang.RuntimeException: IBM runtimes suffer from several bugs which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:57:16 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: bypassing jvm version check for version [1.7.0], this can result in data corruption!

@kimchy
Copy link
Member

kimchy commented Mar 21, 2015

LGTM

@rmuir rmuir closed this in 6da99b3 Mar 21, 2015
rmuir added a commit that referenced this pull request Mar 21, 2015
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes #7580
rmuir added a commit that referenced this pull request Mar 21, 2015
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes #7580
rmuir added a commit that referenced this pull request Mar 21, 2015
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes #7580
@clintongormley
Copy link

w00t!

@clintongormley clintongormley added the :Core/Infra/Core Core issues without another label label Apr 11, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes elastic#7580
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes elastic#7580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants