Skip to content

Conversation

@rahulgoswami
Copy link
Contributor

@rahulgoswami rahulgoswami commented Nov 18, 2025

Description

Backport #14607 to branch_10x

@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Nov 18, 2025

gradlew check passes fine, but the nightly tests in TestBinaryBackwardsCompatibility are failing. Specifically TestBinaryBackwardsCompatibility.testReadNMinusTwoCommit and TestBinaryBackwardsCompatibility.testReadNMinusTwoSegmentInfos.

These try to test opening N-2 version (=8.x) and expect success. This used to work earlier with VERSION_74 in checkHeaderNoMagic(), but fails now (for version < 8.6.0) since we moved to VERSION_86. Staying with VERSION_74 complains about missing Lucene_70 codec (same as main). "main" doesn't have this problem because N-2=9.x

I have not yet understood why we test for N-2 when we anyway don't support the index. And hence still contemplating the right way forward on these failures.

To reproduce:
./gradlew test --tests TestBinaryBackwardsCompatibility.testReadNMinusTwoSegmentInfos -Dtests.seed=738894B1606DB252 -Dtests.nightly=true -Dtests.locale=en-IM -Dtests.timezone=Australia/Queensland -Dtests.asserts=true -Dtests.file.encoding=UTF-8

@msokolov
Copy link
Contributor

I'm trying to understand what's going on here. One thing that confuses me is why we have:

 Version.MIN_SUPPORTED_MAJOR = Version.LATEST.major - 1;

and also

 TestBinaryBackwardsCompatibility.MIN_BINARY_SUPPORTED_MAJOR = Version.MIN_SUPPORTED_MAJOR - 1;

what is the difference between "supported" and "binary supported"?

@msokolov
Copy link
Contributor

msokolov commented Nov 21, 2025

I think what happened is that CheckIndex is now able to read some more of the back-compat indexes that we previously said were incompatible. But this doesn't really make sense since the 10x branch does not include any additional backwards codecs that were removed from main. Some of these indexes cannot be opened, but CheckIndex is able to check them and reports they are clean.

I was able to get tests passing by relaxing a few version numbers and by I changing the exception type when we are unable to read the segments file from IllegalArgumentException to IndexFormatTooOldException to match the expectations of the test. Maybe that was bad, but it seems pretty harmless to me? I'm not sure if this change is safe or not.

diff --git a/lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestAncientIndicesCompatibility.java b/lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestAncientIndicesCompatibility.java
index a06a96b2ed5..56608b0b506 100644
--- a/lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestAncientIndicesCompatibility.java
+++ b/lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestAncientIndicesCompatibility.java
@@ -199,7 +199,7 @@ public class TestAncientIndicesCompatibility extends LuceneTestCase {
       checker.setInfoStream(new PrintStream(bos, false, UTF_8));
       checker.setLevel(CheckIndex.Level.MIN_LEVEL_FOR_INTEGRITY_CHECKS);
       CheckIndex.Status indexStatus = checker.checkIndex();
-      if (getVersion(version).onOrAfter(Version.fromBits(8, 6, 0))) {
+      if (getVersion(version).onOrAfter(Version.fromBits(8, 0, 0))) {
         assertTrue(indexStatus.clean);
       } else {
         assertFalse(indexStatus.clean);
@@ -209,10 +209,9 @@ public class TestAncientIndicesCompatibility extends LuceneTestCase {
         boolean formatTooOld =
             bos.toString(UTF_8).contains(IndexFormatTooOldException.class.getName());
         boolean missingCodec = bos.toString(UTF_8).contains("Could not load codec");
-        assertTrue(formatTooOld || missingCodec);
+        assertTrue("version=" + version, formatTooOld || missingCodec);
       }
       checker.close();
-
       dir.close();
     }
   }
diff --git a/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java b/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java
index 131518983a8..621ecb0b529 100644
--- a/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java
+++ b/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java
@@ -328,7 +328,7 @@ public final class SegmentInfos implements Cloneable, Iterable<SegmentCommitInfo
         throw new IndexFormatTooOldException(
             input, magic, CodecUtil.CODEC_MAGIC, CodecUtil.CODEC_MAGIC);
       }
-      format = CodecUtil.checkHeaderNoMagic(input, "segments", VERSION_86, VERSION_CURRENT);
+      format = CodecUtil.checkHeaderNoMagic(input, "segments", VERSION_74, VERSION_CURRENT);
       byte[] id = new byte[StringHelper.ID_LENGTH];
       input.readBytes(id, 0, id.length);
       CodecUtil.checkIndexHeaderSuffix(input, Long.toString(generation, Character.MAX_RADIX));
@@ -529,11 +529,13 @@ public final class SegmentInfos implements Cloneable, Iterable<SegmentCommitInfo
     } catch (IllegalArgumentException e) {
       // maybe it's an old default codec that moved
       if (name.startsWith("Lucene")) {
-        throw new IllegalArgumentException(
+        throw new IndexFormatTooOldException(
+            input,
             "Could not load codec '"
                 + name
-                + "'. Did you forget to add lucene-backward-codecs.jar?",
-            e);
+                + "'. "
+                + e.getMessage()
+                + " Did you forget to add lucene-backward-codecs.jar?");
       }
       throw e;
     }

@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Nov 25, 2025

Thanks for taking a look and for your thoughts on this @msokolov .

I think what happened is that CheckIndex is now able to read some more of the back-compat indexes that we previously said were incompatible. But this doesn't really make sense since the 10x branch does not include any additional backwards codecs that were removed from main.

Both main and branch_10 include backward codecs starting 8.x. The tests in TestBinaryBackwardsCompatibility test for binary compatibility of indexes for version X-2. The list of versions tested is in https://github.com/apache/lucene/blob/main/lucene/backward-codecs/src/test/org/apache/lucene/backward_index/versions.txt. It begins with 9.x for main, and 8.x for branch_10x, which explains why we don't hit this issue on main, but encounter this on 10x the moment we bump up the min version of SegmentInfos to VERSION_86 in checkHeaderNoMagic().

Some of these indexes cannot be opened, but CheckIndex is able to check them and reports they are clean.

I think this is since CheckIndex seems to be testing for whether it can actually read individual segments while ignoring the MIN_SUPPORTED_VERSION and is able to do so since the codecs are present (https://github.com/apache/lucene/blob/branch_10x/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L637)

…odec() throws IndexFormatTooOldException if a default Lucene codec is not found
@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Nov 25, 2025

I think I agree with your solution here. Since certain tests are checking for N-2 version , might be prudent to not break that compatibility on a minor version. So I reverted the header check on SegmentInfos to VERSION_74. Looks like a lot of these N-2 tests were introduced in 9.10 in #13046. I get it that we probably want to see if we can read such unsupported indexes with the supplied backward-codes, but the motivation behind that is still not super clear to me.

Also, one could argue that if a default backward Lucene codec is not found on the class path, the probability of it being a too old index is much higher than a missing backward-codecs.jar . So throwing an IndexFormatTooOldException in readCodec() should be acceptable. I have re-worked this patch. I may or may not have spent an unhealthy amount of time reasoning about this problem 😁

@msokolov msokolov merged commit fe59fde into apache:branch_10x Nov 25, 2025
5 checks passed
@rahulgoswami
Copy link
Contributor Author

Thanks for merging this Mike. I was thinking maybe I can create another PR for main and push the SegmentInfos min version in checkHeaderNoMagic() back to VERSION_74 ? And throw the same IndexFormatTooOldException in readCodec() as here.

That would help continue the support for 8.x indexes since the backward codec is present anyway?

@msokolov
Copy link
Contributor

I think it makes sense, also to keep similar code between the two branches will help people in the future!

@rahulgoswami
Copy link
Contributor Author

Created #15454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants