Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull failure #11755

Closed
rmuir opened this issue Sep 6, 2022 · 4 comments · Fixed by #11757
Closed

TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull failure #11755

rmuir opened this issue Sep 6, 2022 · 4 comments · Fixed by #11757
Milestone

Comments

@rmuir
Copy link
Member

rmuir commented Sep 6, 2022

Description

From jenkins:

> Task :lucene:core:test

org.apache.lucene.index.TestIndexWriterOnDiskFull > testAddDocumentOnDiskFull FAILED
    java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot commit
        at __randomizedtesting.SeedInfo.seed([8E19B38FADF450D3:2018BF050618617]:0)
        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:5441)
        at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3716)
        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:4044)
        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:4006)
        at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:80)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:833)

        Caused by:
        java.io.IOException: fake disk full at 7762 bytes when writing _2.fnm (file length=0)
            at org.apache.lucene.tests.store.MockIndexOutputWrapper.checkDiskFull(MockIndexOutputWrapper.java:93)
            at org.apache.lucene.tests.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:139)
            at org.apache.lucene.tests.store.MockIndexOutputWrapper.writeByte(MockIndexOutputWrapper.java:132)
            at org.apache.lucene.store.RateLimitedIndexOutput.writeByte(RateLimitedIndexOutput.java:66)
            at org.apache.lucene.codecs.CodecUtil.writeBEInt(CodecUtil.java:653)
            at org.apache.lucene.codecs.CodecUtil.writeHeader(CodecUtil.java:82)
            at org.apache.lucene.codecs.CodecUtil.writeIndexHeader(CodecUtil.java:125)
            at org.apache.lucene.codecs.lucene94.Lucene94FieldInfosFormat.write(Lucene94FieldInfosFormat.java:333)
            at org.apache.lucene.index.SegmentMerger.mergeFieldInfos(SegmentMerger.java:171)
            at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:293)
            at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:161)
            at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5141)
            at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4681)
            at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6430)
            at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:638)
            at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:699)

Gradle command to reproduce

NOTE: this seed reproduces on branch_9x (commit 2895887) only about half of the time.. there may be a race involved. if it does not fail, try it again.

/gradlew :lucene:core:test --tests "org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull" -Ptests.jvms=5 -Ptests.haltonfailure=false "-Ptests.jvmargs=-XX:-UseCompressedOops -XX:+UseSerialGC" -Ptests.seed=8E19B38FADF450D3 -Ptests.badapples=false -Ptests.gui=true -Ptests.file.encoding=UTF-8

@rmuir rmuir added the type:test label Sep 6, 2022
@rmuir
Copy link
Member Author

rmuir commented Sep 8, 2022

The issue is caused by an exception-handling inconsistency in IndexWriter: if a fatal error (tragedy) happens to IndexWriter, you may receive exception in one of two ways:

  • IOException from ensureOpen(), this is the most common case for calls to indexwriter such as addDocument()
  • IllegalStateException from methods such as commit(), this is the uncommon, inconsistent case that happens here.

Test has code like this (summarizing):

try {
  for (int i = 0; i < 200; i++) {
    addDoc(writer);
  }
  // CMS hits disk full below, while we are in `startCommit()`. It is a bit racy and doesn't always reproduce due to threads.
  // IllegalStateException, not IOException is thrown: "this writer hit an unrecoverable error; cannot commit"
  // Test logic does not expect this exc class, so the test fails.
  writer.commit();
} catch (IOException e) {
  // test logic
}

I'm not sure what to do yet. Obviously if we also catch IllegalStateException, we can fix the test. But I don't like that we deliver the exception two different ways, sometimes as IOException, other times as IllegalStateException, so I don't want to shove it under the rug quite yet.

@rmuir
Copy link
Member Author

rmuir commented Sep 8, 2022

The failure doesn't reproduce/debug very easily because CMS needs to hit disk full after prepareCommit() (which calls ensureOpen and delivers IOException for a tragedy), and instead trip the check in startCommit() that throws IllegalStateException for the tragedy. So if you add verbose or even try to manually just enable too many prints, reproducing it is impossible :)

@rmuir
Copy link
Member Author

rmuir commented Sep 8, 2022

Maybe i'm being overly picky, if you look at the rest of the tests.

This test is named testAddDocumentOnDiskFull and looks for an ACE (IOException) on addDocument(). But rarely here, you can get IllegalStateException when it calls commit().

testAddIndexOnDiskFull() has catch (IllegalStateException | IOException | MergePolicy.MergeException e) {, so its ok with any of these varieties.

We can fix the test to stop failing to reflect reality at least as a small step.
Happy to open a followup-issue to see if we can improve the exception-handling to be more consistent.

rmuir added a commit to rmuir/lucene that referenced this issue Sep 8, 2022
…egalStateException from startCommit()

If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.

The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()

Closes apache#11755
@rmuir
Copy link
Member Author

rmuir commented Sep 8, 2022

I made a PR just fixing this test to handle this around the commit() call so it won't randomly fail in CI again.

rmuir added a commit that referenced this issue Sep 8, 2022
…egalStateException from startCommit() (#11757)

If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.

The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()

Closes #11755
@rmuir rmuir added this to the 9.5.0 milestone Sep 8, 2022
asfgit pushed a commit that referenced this issue Sep 8, 2022
…egalStateException from startCommit() (#11757)

If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.

The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()

Closes #11755
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant