Implement VectorUtilProvider with Java 21 Project Panama Vector API #12363

ChrisHegarty · 2023-06-12T20:24:55Z

This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR.

Initial outdated approach (not merged):

cut'n'paste VectorUtilPanamaProvider - there are opportunities to eventually remove some workarounds, but this is ok for now
Updated jdk21.apijar by ./gradlew :lucene:core:regenerate -Porg.gradle.java.installations.paths=/Users/chegar/binaries/jdk-21.jdk/Contents/Home/
Widen the version checks to include exactly 20 and 21

$ /Users/chegar/binaries/jdk-21.jdk/Contents/Home/bin/java -version
openjdk version "21-ea" 2023-09-19
OpenJDK Runtime Environment (build 21-ea+26-2328)
OpenJDK 64-Bit Server VM (build 21-ea+26-2328, mixed mode, sharing)

ChrisHegarty · 2023-06-12T20:41:17Z

I verified this locally by running the tests (pretending to be the "CI" so as to enable the Panama code at runtime):

$ JENKINS_XX=true ./gradlew :lucene:core:test --tests "org.apache.lucene.util.TestVectorUtil**" \
   -Pvalidation.git.failOnModified=false --info

$ JENKINS_XX=true ./gradlew check -Pvalidation.git.failOnModified=false

I can see from the logs that the JDK 21 version is picked up.

lucene/CHANGES.txt

uschindler · 2023-06-12T20:52:32Z

That was fast. 😍🐇

I will check tomorrow morning but this looks great. @rmuir wanted to run the benchmark with 21, too.

Did you use latest openjdk 21-ea build to extract?

uschindler · 2023-06-12T20:55:12Z

lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java

+          "Vector bit size is less than 128: " + INT_SPECIES_PREF_BIT_SIZE);
+    }
+
+    // hack to work around for JDK-8309727:


We can remove this check if you are confident that your change makes it into the release.
On the other hand this check does not hurt.

If we remove, the doPrivileged forbidden wrapper can go away, too.

The JDK change was merged, it’ll be in the next jdk 21 EA build.

Great, so we can remove it for 21. It is not urgent!

uschindler · 2023-06-12T20:59:11Z

lucene/CHANGES.txt

@@ -139,8 +139,8 @@ New Features
 * GITHUB#12257: Create OnHeapHnswGraphSearcher to let OnHeapHnswGraph to be searched in a thread-safety manner. (Patrick Zhai)

 * GITHUB#12302, GITHUB#12311: Add vectorized implementations of VectorUtil.dotProduct(),


Add this PR number here.

uschindler · 2023-06-12T21:01:35Z

That was fast. 😍🐇

I will check tomorrow morning but this looks great. @rmuir wanted to run the benchmark with 21, too.

Did you use latest openjdk 21-ea build to extract?

Oh you pasted it. Same like mine.

uschindler

In general this looks fine. Observations:

It looks like the vector API did not change at all, the old code just compiles. So theoretically we could remove the Java 21 implementation completely and just enable it in the if/else checks on the provider resolve. The JDK will then load Java 20 impl from the MR-JAR. In that case we would not even need the apijar with vector classes :-) I am not sure if this is a good idea, but it may work fine.
I regenerated the apijars, no change on my machine.

So this looks fine. If we stay with the Java 20 impl and not remove checks/hacks, we could revert most of this PR and just use the Java 20 class file also for Java 21 through the already working MRJAR mechanism.

uschindler · 2023-06-12T21:30:59Z

I tried it out: I reverted the APIJAR changes and only left in following changes:

VectorUtilsProvider.java to enable of 21
vectorIncubatorJavaVersions = [ JavaVersion.VERSION_20, JavaVersion.VERSION_21 ] as Set

Everything else was reverted and the tests worked fine. So to me it looks like we can spare the separate implementation as it is 100% identical. The hack does not hurt.

ChrisHegarty · 2023-06-12T22:07:16Z

I tried it out: I reverted the APIJAR changes and only left in following changes:

VectorUtilsProvider.java to enable of 21

vectorIncubatorJavaVersions = [ JavaVersion.VERSION_20, JavaVersion.VERSION_21 ] as Set

Everything else was reverted and the tests worked fine. So to me it looks like we can spare the separate implementation as it is 100% identical. The hack does not hurt.

That’s great - I had a similar thought. 👍

uschindler · 2023-06-12T22:25:17Z

Maybe just leave a readme file in the folder of the Java file stating that the impl is identical to Java 20.

uschindler · 2023-06-12T22:51:14Z

I checked the commits: https://github.com/openjdk/jdk21/commits/master/src/jdk.incubator.vector/share/classes

There were some changes, but nothing that affects us. It is mostly addition of VectorMask.XOR and improvementa in VectorShuffle. This aligns with the JEP: https://openjdk.org/jeps/448

And the Turkish locale bug detection is in the provider lookup, so we handle it already.

ChrisHegarty · 2023-06-13T07:52:07Z

@uschindler I just refreshed this PR as you suggested - the changes are now minimal and it tests fine.

While we're carrying two small workarounds for JDK bugs that are fixed in to-be-released versions, I think that it's worth the small cost of these workarounds so as to minimise the code code duplication, etc. 👍

uschindler

Looks fine. Maybe rename the readme file to VectorUtilPanamaProvider.txt to make clear which one was left out.

I think this is ready.

uschindler · 2023-06-13T08:11:36Z

BTW, much shorter and faster to type is:

$ CI=true ./gradlew :lucene:core:test ...

ChrisHegarty · 2023-06-13T08:36:25Z

Once the PR checks complete, I'll merge this PR and then port the merged commit to the 9x branch.

…pache#12363) This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR.

…12363) (#12365) This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR.

…dc8ca633e8bcf`) (#20) * Add next minor version 9.7.0 * Fix SynonymQuery equals implementation (apache#12260) The term member of TermAndBoost used to be a Term instance and became a BytesRef with apache#11941, which means its equals impl won't take the field name into account. The SynonymQuery equals impl needs to be updated accordingly to take the field into account as well, otherwise synonym queries with same term and boost across different fields are equal which is a bug. * Fix MMapDirectory documentation for Java 20 (apache#12265) * Don't generate stacktrace in CollectionTerminatedException (apache#12270) CollectionTerminatedException is always caught and never exposed to users so there's no point in filling in a stack-trace for it. * add missing changelog entry for apache#12260 * Add missing author to changelog entry for apache#12220 * Make query timeout members final in ExitableDirectoryReader (apache#12274) There's a couple of places in the Exitable wrapper classes where queryTimeout is set within the constructor and never modified. This commit makes such members final. * Update javadocs for QueryTimeout (apache#12272) QueryTimeout was introduced together with ExitableDirectoryReader but is now also optionally set to the IndexSearcher to wrap the bulk scorer with a TimeLimitingBulkScorer. Its javadocs needs updating. * Make TimeExceededException members final (apache#12271) TimeExceededException has three members that are set within its constructor and never modified. They can be made final. * DOAP changes for release 9.6.0 * Add back-compat indices for 9.6.0 * `ToParentBlockJoinQuery` Explain Support Score Mode (apache#12245) (apache#12283) * `ToParentBlockJoinQuery` Explain Support Score Mode --------- Co-authored-by: Marcus <marcuseagan@gmail.com> * Simplify SliceExecutor and QueueSizeBasedExecutor (apache#12285) The only behaviour that QueueSizeBasedExecutor overrides from SliceExecutor is when to execute on the caller thread. There is no need to override the whole invokeAll method for that. Instead, this commit introduces a shouldExecuteOnCallerThread method that can be overridden. * [Backport] GITHUB-11838 Add api to allow concurrent query rewrite (apache#12197) * GITHUB-11838 Change API to allow concurrent query rewrite (apache#11840) Replace Query#rewrite(IndexReader) with Query#rewrite(IndexSearcher) Co-authored-by: Patrick Zhai <zhaih@users.noreply.github.com> Co-authored-by: Adrien Grand <jpountz@gmail.com> Backport of apache#11840 Changes from original: - Query keeps `rewrite(IndexReader)`, but it is now deprecated - VirtualMethod is used to correct delegate to the overridden methods - The changes to `RewriteMethod` type classes are reverted, this increased the backwards compatibility impact. ------------------------------ ### Description Issue: apache#11838 #### Updated Proposal * Change signature of rewrite to `rewrite(IndexSearcher)` * How did I migrate the usage: * Use Intellij to do preliminary refactoring for me * For test usage, use searcher whenever is available, otherwise create one using `newSearcher(reader)` * For very few non-test classes which doesn't have IndexSearcher available but called rewrite, create a searcher using `new IndexSearcher(reader)`, tried my best to avoid creating it recurrently (Especially in `FieldQuery`) * For queries who have implemented the rewrite and uses some part of reader's functionality, use shortcut method when possible, otherwise pull out the reader from indexSearcher. * Backport: Concurrent rewrite for KnnVectorQuery (apache#12160) (apache#12288) * Concurrent rewrite for KnnVectorQuery (apache#12160) - Reduce overhead of non-concurrent search by preserving original execution - Improve readability by factoring into separate functions --------- Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> * adjusting for backport --------- Co-authored-by: Kaival Parikh <46070017+kaivalnp@users.noreply.github.com> Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> * toposort use iterator to avoid stackoverflow (apache#12286) Co-authored-by: tangdonghai <tangdonghai@meituan.com> # Conflicts: # lucene/CHANGES.txt * Fix test to compile with Java 11 after backport of apache#12286 * Update Javadoc for topoSortStates method after apache#12286 (apache#12292) * Optimize HNSW diversity calculation (apache#12235) * Minor cleanup and improvements to DaciukMihovAutomatonBuilder (apache#12305) * GITHUB-12291: Skip blank lines from stopwords list. (apache#12299) * Wrap Query rewrite backwards layer with AccessController (apache#12308) * Make sure APIJAR reproduces with different timezone (unfortunately java encodes the date using local timezone) (apache#12315) * Add multi-thread searchability to OnHeapHnswGraph (apache#12257) * Fix backport error * [MINOR] Update javadoc in Query class (apache#12233) - add a few missing full stops - update wording in the description of Query#equals method * [Backport] Integrate the Incubating Panama Vector API apache#12311 (apache#12327) Leverage accelerated vector hardware instructions in Vector Search. Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously). Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21. --------- Co-authored-by: Uwe Schindler <uschindler@apache.org> Co-authored-by: Robert Muir <rmuir@apache.org> * Parallelize knn query rewrite across slices rather than segments (apache#12325) The concurrent query rewrite for knn vectory query introduced with apache#12160 requests one thread per segment to the executor. To align this with the IndexSearcher parallel behaviour, we should rather parallelize across slices. Also, we can reuse the same slice executor instance that the index searcher already holds, in that way we are using a QueueSizeBasedExecutor when a thread pool executor is provided. * Optimize ConjunctionDISI.createConjunction (apache#12328) This method is showing up as a little hot when profiling some queries. Almost all the time spent in this method is just burnt on ceremony around stream indirections that don't inline. Moving this to iterators, simplifying the check for same doc id and also saving one iteration (for the min cost) makes this method far cheaper and easier to read. * Update changes to be correct with ARM (it is called NEON there) * GH#12321: Marked DaciukMihovAutomatonBuilder as deprecated (apache#12332) Preparing to reduce visibility of this class in a future release * add BitSet.clear() (apache#12268) # Conflicts: # lucene/CHANGES.txt * Clenaup and update changes and synchronize with 9.x * Update TestVectorUtilProviders.java (apache#12338) * Don't generate stacktrace for TimeExceededException (apache#12335) The exception is package private and never rethrown, we can avoid generating a stacktrace for it. * Introduced the Word2VecSynonymFilter (apache#12169) Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io> * Word2VecSynonymFilter constructor null check (apache#12169) * Use thread-safe search version of HnswGraphSearcher (apache#12246) Addressing comment received in the PR apache#12246 * Word2VecSynonymProvider to use standard Integer max value for hnsw searches (apache#12235) We observed this change was not ported previously from main in an old cherry-pick * Fix searchafter high latency when after value is out of range for segment (apache#12334) * Make memory fence in `ByteBufferGuard` explicit (apache#12290) * Add "direct to binary" option for DaciukMihovAutomatonBuilder and use it in TermInSetQuery#visit (apache#12320) * Add updateDocuments API which accept a query (reopen) (apache#12346) * GITHUB#11350: Handle backward compatibility when merging segments with different FieldInfo This commits restores Lucene 9's ability to handle indices created with Lucene 8 where there are discrepancies in FieldInfos, such as different IndexOptions * [Tessellator] Improve the checks that validate the diagonal between two polygon nodes (apache#12353) # Conflicts: # lucene/CHANGES.txt * feat: soft delete optimize (apache#12339) * Better paging when random reads go backwards (apache#12357) When reading data from outside the buffer, BufferedIndexInput always resets its buffer to start at the new read position. If we are reading backwards (for example, using an OffHeapFSTStore for a terms dictionary) then this can have the effect of re-reading the same data over and over again. This commit changes BufferedIndexInput to use paging when reading backwards, so that if we ask for a byte that is before the current buffer, we read a block of data of bufferSize that ends at the previous buffer start. Fixes apache#12356 * Work around SecurityManager issues during initialization of vector api (JDK-8309727) (apache#12362) * Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth (apache#12249) * Implement MMapDirectory with Java 21 Project Panama Preview API (apache#12294) Backport incl JDK21 apijar file with java.util.Objects regenerated * remove relic in apijar folder caused by vector additions * Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance (apache#12324) * Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors (apache#12281) --------- Co-authored-by: Uwe Schindler <uschindler@apache.org> * Implement VectorUtilProvider with Java 21 Project Panama Vector API (apache#12363) (apache#12365) This commit enables the Panama Vector API for Java 21. The version of VectorUtilPanamaProvider for Java 21 is identical to that of Java 20. As such, there is no specific 21 version - the Java 20 version will be loaded from the MRJAR. * Add CHANGES.txt for apache#12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector (apache#12368) Signed-off-by: gashutos <gashutos@amazon.com> * Move TermAndBoost back to its original location. (apache#12366) PR apache#12169 accidentally moved the `TermAndBoost` class to a different location, which would break custom sub-classes of `QueryBuilder`. This commit moves it back to its original location. * GITHUB-12252: Add function queries for computing similarity scores between knn vectors (apache#12253) Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io> * hunspell (minor): reduce allocations when processing compound rules (apache#12316) (cherry picked from commit a454388) * hunspell (minor): reduce allocations when reading the dictionary's morphological data (apache#12323) there can be many entries with morph data, so we'd better avoid compiling and matching regexes and even stream allocation (cherry picked from commit 4bf1b94) * TestHunspell: reduce the flakiness probability (apache#12351) * TestHunspell: reduce the flakiness probability We need to check how the timeout interacts with custom exception-throwing checkCanceled. The default timeout seems not enough for some CI agents, so let's increase it. Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com> (cherry picked from commit 5b63a18) * This allows VectorUtilProvider tests to be executed although hardware may not fully support vectorization or if C2 is not enabled (apache#12376) --------- Signed-off-by: gashutos <gashutos@amazon.com> Co-authored-by: Alan Woodward <romseygeek@apache.org> Co-authored-by: Luca Cavanna <javanna@apache.org> Co-authored-by: Uwe Schindler <uschindler@apache.org> Co-authored-by: Armin Braun <me@obrown.io> Co-authored-by: Mikhail Khludnev <mkhludnev@users.noreply.github.com> Co-authored-by: Marcus <marcuseagan@gmail.com> Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Kaival Parikh <46070017+kaivalnp@users.noreply.github.com> Co-authored-by: Kaival Parikh <kaivalp2000@gmail.com> Co-authored-by: tang donghai <tangdhcs@gmail.com> Co-authored-by: Patrick Zhai <zhaih@users.noreply.github.com> Co-authored-by: Greg Miller <gsmiller@gmail.com> Co-authored-by: Jerry Chin <metrxqin@gmail.com> Co-authored-by: Patrick Zhai <zhai7631@gmail.com> Co-authored-by: Andrey Bozhko <andybozhko@gmail.com> Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com> Co-authored-by: Robert Muir <rmuir@apache.org> Co-authored-by: Jonathan Ellis <jbellis@datastax.com> Co-authored-by: Daniele Antuzi <daniele.antuzi@gmail.com> Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io> Co-authored-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com> Co-authored-by: Petr Portnov | PROgrm_JARvis <pportnov@ozon.ru> Co-authored-by: Tomas Eduardo Fernandez Lobbe <tflobbe@apache.org> Co-authored-by: Ignacio Vera <ivera@apache.org> Co-authored-by: fudongying <30896830+fudongyingluck@users.noreply.github.com> Co-authored-by: Chris Fournier <chris.fournier@shopify.com> Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com> Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Elia Porciani <e.porciani@sease.io> Co-authored-by: Peter Gromov <peter@jetbrains.com>

ChrisHegarty added 2 commits June 12, 2023 21:14

initial panama vector 21 port changes

b9a3b6b

update changes (since all in same minor release)

5ad93e1

ChrisHegarty requested a review from uschindler June 12, 2023 20:45

ChrisHegarty commented Jun 12, 2023

View reviewed changes

lucene/CHANGES.txt Show resolved Hide resolved

uschindler reviewed Jun 12, 2023

View reviewed changes

ChrisHegarty added 3 commits June 13, 2023 08:43

changelog

a4f47a8

reverts

22f2ec8

add readme

4eafc9f

uschindler approved these changes Jun 13, 2023

View reviewed changes

uschindler added this to the 9.7.0 milestone Jun 13, 2023

uschindler added the type:enhancement label Jun 13, 2023

rename txt file

b25c8b0

ChrisHegarty merged commit 1090928 into apache:main Jun 13, 2023
4 checks passed

ChrisHegarty deleted the panama_vector_21 branch June 13, 2023 08:45

ChrisHegarty mentioned this pull request Jun 13, 2023

Implement VectorUtilProvider with Java 21 Project Panama Vector API #12365

Merged

ChrisHegarty changed the title ~~Implement VectorUtilProvider with Java 21 Project Pamana Vector API~~ Implement VectorUtilProvider with Java 21 Project Panama Vector API Jun 13, 2023

alessandrobenedetti added the vector-based-search label Jun 15, 2023

ChrisHegarty mentioned this pull request Jun 20, 2023

ThirdPartyAuditTask - Add vector module when building with JDK 21 elastic/elasticsearch#96949

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement VectorUtilProvider with Java 21 Project Panama Vector API #12363

Implement VectorUtilProvider with Java 21 Project Panama Vector API #12363

ChrisHegarty commented Jun 12, 2023 •

edited

ChrisHegarty commented Jun 12, 2023 •

edited

uschindler commented Jun 12, 2023

uschindler Jun 12, 2023

uschindler Jun 12, 2023

ChrisHegarty Jun 12, 2023 •

edited

uschindler Jun 12, 2023

uschindler Jun 12, 2023

uschindler commented Jun 12, 2023

uschindler left a comment

uschindler commented Jun 12, 2023

ChrisHegarty commented Jun 12, 2023

uschindler commented Jun 12, 2023

uschindler commented Jun 12, 2023 •

edited

ChrisHegarty commented Jun 13, 2023

uschindler left a comment

uschindler commented Jun 13, 2023

ChrisHegarty commented Jun 13, 2023

		@@ -139,8 +139,8 @@ New Features
		* GITHUB#12257: Create OnHeapHnswGraphSearcher to let OnHeapHnswGraph to be searched in a thread-safety manner. (Patrick Zhai)

		* GITHUB#12302, GITHUB#12311: Add vectorized implementations of VectorUtil.dotProduct(),

Implement VectorUtilProvider with Java 21 Project Panama Vector API #12363

Implement VectorUtilProvider with Java 21 Project Panama Vector API #12363

Conversation

ChrisHegarty commented Jun 12, 2023 • edited

ChrisHegarty commented Jun 12, 2023 • edited

uschindler commented Jun 12, 2023

uschindler Jun 12, 2023

Choose a reason for hiding this comment

uschindler Jun 12, 2023

Choose a reason for hiding this comment

ChrisHegarty Jun 12, 2023 • edited

Choose a reason for hiding this comment

uschindler Jun 12, 2023

Choose a reason for hiding this comment

uschindler Jun 12, 2023

Choose a reason for hiding this comment

uschindler commented Jun 12, 2023

uschindler left a comment

Choose a reason for hiding this comment

uschindler commented Jun 12, 2023

ChrisHegarty commented Jun 12, 2023

uschindler commented Jun 12, 2023

uschindler commented Jun 12, 2023 • edited

ChrisHegarty commented Jun 13, 2023

uschindler left a comment

Choose a reason for hiding this comment

uschindler commented Jun 13, 2023

ChrisHegarty commented Jun 13, 2023

ChrisHegarty commented Jun 12, 2023 •

edited

ChrisHegarty commented Jun 12, 2023 •

edited

ChrisHegarty Jun 12, 2023 •

edited

uschindler commented Jun 12, 2023 •

edited