Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestIndexWriterOnDiskFull.testAddIndexOnDiskFull reproducible test failure #13116

Closed
mikemccand opened this issue Feb 19, 2024 · 11 comments · Fixed by #13172
Closed

TestIndexWriterOnDiskFull.testAddIndexOnDiskFull reproducible test failure #13116

mikemccand opened this issue Feb 19, 2024 · 11 comments · Fixed by #13172
Labels

Comments

@mikemccand
Copy link
Member

Description

I was smoke testing 9.10.0 and my first attempt hit this test failure, which repros on 695c0ac:

org.apache.lucene.index.TestIndexWriterOnDiskFull > test suite's output saved to /s1/l/99x/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriterOnDiskFull.txt, copied below:
   >     java.io.UncheckedIOException: java.io.IOException: fake disk full at 47681 bytes when writing _1.kdi (file length=78; wrote 27 of 155 bytes)
   >         at __randomizedtesting.SeedInfo.seed([D7C2553CB8A4ADB5:AB1DE403F5150502]:0)
   >         at org.apache.lucene.util.bkd.BKDWriter.lambda$makeWriter$0(BKDWriter.java:1013)
   >         at org.apache.lucene.tests.index.RandomCodec$1$1.writeField(RandomCodec.java:155)
   >         at org.apache.lucene.codecs.PointsWriter.mergeOneField(PointsWriter.java:57)
   >         at org.apache.lucene.codecs.PointsWriter.merge(PointsWriter.java:231)
   >         at org.apache.lucene.codecs.lucene90.Lucene90PointsWriter.merge(Lucene90PointsWriter.java:187)
   >         at org.apache.lucene.tests.codecs.asserting.AssertingPointsFormat$AssertingPointsWriter.merge(AssertingPointsFormat.java:133)
   >         at org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:185)
   >         at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
   >         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:145)
   >         at org.apache.lucene.index.IndexWriter.addIndexesReaderMerge(IndexWriter.java:3454)
   >         at org.apache.lucene.index.IndexWriter$AddIndexesMergeSource.merge(IndexWriter.java:3350)
   >         at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
   >         at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
   >
   >         Caused by:
   >         java.io.IOException: fake disk full at 47681 bytes when writing _1.kdi (file length=78; wrote 27 of 155 bytes)
   >             at org.apache.lucene.tests.store.MockIndexOutputWrapper.checkDiskFull(MockIndexOutputWrapper.java:92)
   >             at org.apache.lucene.tests.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:138)
   >             at org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:60)
   >             at org.apache.lucene.util.bkd.BKDWriter.writeIndex(BKDWriter.java:1266)
   >             at org.apache.lucene.util.bkd.BKDWriter.writeIndex(BKDWriter.java:1236)
   >             at org.apache.lucene.util.bkd.BKDWriter.lambda$makeWriter$0(BKDWriter.java:1011)
   >             ... 12 more
  2> NOTE: reproduce with: gradlew test --tests TestIndexWriterOnDiskFull.testAddIndexOnDiskFull -Dtests.seed=D7C2553CB8A4ADB5 -Dtests.nightly=true -Dtests.locale=fr-MQ -Dtests.timezone=Indian/Chagos -Dtests.asser\
ts=true -Dtests.file.encoding=UTF-8
  2> NOTE: leaving temporary files on disk at: /s1/l/99x/lucene/core/build/tmp/tests-tmp/lucene.index.TestIndexWriterOnDiskFull_D7C2553CB8A4ADB5-001
  2> NOTE: test params are: codec=Asserting(Lucene99): {id=FST50, content=BlockTreeOrds(blocksize=128)}, docValues:{numericdv=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=21, maxMBSortInHeap=7.62463929475\
6359, sim=Asserting(RandomSimilarity(queryNorm=true): {id=IB LL-D2, content=IB SPL-L3(800.0)}), locale=fr-MQ, timezone=Indian/Chagos
  2> NOTE: Linux 6.6.9-arch1-1 amd64/Oracle Corporation 11.0.21 (64-bit)/cpus=1,threads=1,free=177536496,total=303562752
  2> NOTE: All tests run in this JVM: [TestIndexWriterOnDiskFull]

:lucene:core:test (FAILURE): 1 test(s), 1 failure(s)

Version and environment details

No response

@mikemccand
Copy link
Member Author

This was on Java 11:

raptorlake:99x[branch_9_10]$ java -fullversion
openjdk full version "11.0.21+9"

@easyice
Copy link
Contributor

easyice commented Feb 19, 2024

bisect shows it related to : 9ab84f4

@mikemccand
Copy link
Member Author

Oooh thanks for digging @easyice!

Hmm, that may just be a seed-shifting change, and not e.g. bringing the root cause for this exception? I.e. 9ab84f4 just changed the interpretation of the random seed ("seed shifting").

The exception looks like a test bug to me: disk full is arriving to a merge thread (ConcurrentMergeScheduler), which is expected (this is a disk full test), so maybe the test is failing to properly recognize that this is "normal" for it?

@easyice
Copy link
Contributor

easyice commented Feb 19, 2024

Thanks for explaining @mikemccand , I haven't fully understand the related between this exception and 9ab84f4, I agree with you they may not have a direct related.

In general, the "fake disk full" exception may be due to the absence of some exception type in the catch, such as UncheckedIOException , But if revert 9ab84f4 it will doesn't have this exception.

@easyice
Copy link
Contributor

easyice commented Feb 23, 2024

This is indeed a test bug, 9ab84f4 just slightly changes the index size. There was a similar issue: #11755, the UncheckedIOException might be throw by BKDWriter but not catch in this test case. This should likely be fixed e.g.

@@ -364,7 +365,7 @@ public class TestIndexWriterOnDiskFull extends LuceneTestCase {
               done = true;
             }

-          } catch (IllegalStateException | IOException | MergePolicy.MergeException e) {
+          } catch (IllegalStateException | IOException | MergePolicy.MergeException | UncheckedIOException e) {
             success = false;
             err = e;
             if (VERBOSE) {

The testAddDocumentOnDiskFull should also have the same issue.

@mikemccand
Copy link
Member Author

I like that proposed solution @easyice! Exception handling is hard. Likely many Lucene tests are missing/failing to catch UncheckedIOException now...

@easyice
Copy link
Contributor

easyice commented Feb 29, 2024

Likely many Lucene tests are missing/failing to catch UncheckedIOException now...

emmm... yeah, should we open another issue about this?

@dweiss
Copy link
Contributor

dweiss commented Feb 29, 2024

A much better solution would be to try to track down why UncheckedIOException is thrown in the stack - something is likely missing a proper IOException signature (or something higher up the stack should catch uncheckedIOE and rethrow the original cause?).

Those unchecked exceptions are hell. If they can indeed happen under certain circumstances, it would be better to handle these cases as plan (predictable) IOE. At least that's my opinion - the discussion concerning checked/ unchecked exceptions is as old as Java (my opinion being it was a design mistake to introduce checked exceptions; it's particularly brutal when you're trying to use lambdas with I/O).

@easyice
Copy link
Contributor

easyice commented Mar 6, 2024

In BKDWriter, it seems we not pass a class that implements Runnable into the Thread framework, it just call run method directly, so maybe we can consider introducing CheckedRunnable to replace Runnable, like:

@FunctionalInterface
public interface CheckedRunnable<E extends Exception> {
  void run() throws E;
}

In this way, IOException can be thrown in CheckedRunnable.

@dweiss
Copy link
Contributor

dweiss commented Mar 6, 2024

I am not familiar with this code but there are many options to choose from. A Callable (this throws an Exception), Lucene's IOConsumer, adding an IORunnable similar to IOConsumer, IOFunction or IOSupplier we already have.. I would opt for an IORunnable or IOCallable, which could look like this:

interface IOCallable<T> extends Callable<T> {
      @Override
      public T call() throws IOException;
  }

narrowing the type of exceptions thrown to only IOException and keeping an option to return something from the inside, at the same time allowing submission of such blocks to plain Java concurrent infrastructure? Just a thought.

@easyice
Copy link
Contributor

easyice commented Mar 6, 2024

This is also a good idea! If we use CheckedRunnable, we also need to narrow the type of exceptions thrown to IOException only. like:

- Runnable finalizer = writer.writeField(...)
+ CheckedRunnable<IOException> finalizer = writer.writeField(...)

IORunnable will makes the code more concise, the use of Runnable outside of the thread framework seems not widespread.

@mikemccand , what do you think of this way of avoiding throwing an UncheckedIOException rather than catching it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants