Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative version: Implement MMapDirectory with Java 19/20 Project Panama Preview API #12188

Merged
merged 31 commits into from
Mar 9, 2023

Conversation

uschindler
Copy link
Contributor

@uschindler uschindler commented Mar 6, 2023

This is an altenative version to PR #12042. The problem with the previous PR (and also the Java 19 version) is mainly the following issues:

  • We need to add a toolkit entry for each Java version we want to support. As Panama-Foreign is not yet finished, any developer who wants to build Lucene needs to either download Java 19+20+21 to actually compile the JARs or the autodownloader of Gradle will do it.
  • We need to patch the class files to remove the preview bits after compilation

The approach here is a new idea that came to me when I studied, how the --release flag is implemented in javac. It works like that: Java has basically a copy of all class files of previous Java versions in some specially designed JAR file. When you request --release 11 from Java compiler, it will overlay the compile classpath wit this JAR file that contains all classes (only signatures). Of course this will spend a lot of space, so there are 2 optimizations: it removes all bytecode from those JAR files, so all class files in it have empty method bodies. In addition, it will only add modified class files. After each relaese of Java they build this special signatures JAR file and store it in the JDK folder. FYI, preview API are excluded there, this is why we can't compile.

The PR here implements the following:

  • It stores a similar (but simplified) version of the relevant classes in a "panama-foreign-jdk19.apijar" and "panama-foreign-jdk20.apijar"
  • The apijar files are regenerated by the "regenerate task" or by calling gradlew :lucene:core:generatePanamaForeignApiJar19 and/or gradlew :lucene:core:generatePanamaForeignApiJar20. This task uses Gradle's toolkit API to launch the correct Java version. So only people who regenerate files need to have Java 19, 20, 21,... installed. For Java 20 at moment you need it really installed locally, as it is impossible to autodownload it.
  • When generating the APIJAR (a zip file in standard JAR format) it uses ASM to load the class files from the current JVM using the small tool ExtractForeignAPI.java which is executed using the toolkit API. It will then strip all bytecode and also strips the @PreviewFeature annotation (this allows to compile against the class file without Java adding the preview flags to compilation result. The outputted class files also get the version number for Java 11 class files. It also removes the option for nesthosts (not needed).
  • When we compile our Java 19, Java 20, Java 21 MR-JAR parts, it will compile our sourceSet with the normal Java 11 or Java 17 compiler, but supplies --patch-module java.base=panama-foreign-jdk20.apijar. It also open the new foreign package, so it gets visible. For Javac compiler it is just a JAR file with the API signatures, so it can compile our classes against it. So it is like a JAR file on Maven central without any bytecode (theoretically we could push those apijar files to Maven central... so we would not have them in our checkout). As those class files in our signatures JAR have no preview bits, the compiler output is not marked as "preview-only".

This PR allows anybody to compile the MR-JAR classes without the JDK installed. We can also merge this PR now, as the JDK 20 release is already in RC phase, so the API is finalized. The APIJAR files don't get outdated anymore. We can also release Lucene now with JDK 20 support!

Downsides:

  • We have 2 or 3 binary files in out Git checkout (they are each approx 20 Kilobytes). An alternative would be to dump a txt-only signatures file, but that complicates it really. We can do that for licensing purposes.
  • Licensing is hard, this is the main reason why I asked for comments: Is it ok to have a JAR file with method signatures in our source checkout? There's no code included (bytecode stripped)! The license is the GNU GPL with Classpath extension, but we compile against that already-

If anybody has an idea, speak up. If we agree, this would be the easiest to support MR-JARs of MMapDirectory in future (also with Java 21 where it is not yet fully sure that the API gets to its final stage! ....unfortunately.... damn!

The same approach could be used for vector APIs in future. Please add a comment about if you agree to do it like that!

@rmuir
Copy link
Member

rmuir commented Mar 6, 2023

The same approach could be used for vector APIs in future. Please add a comment about if you agree to do it like that!

when the vector api finally comes out, don't interrupt my shuffleboard match to tell me about it!

@dweiss
Copy link
Contributor

dweiss commented Mar 6, 2023

image

@uschindler
Copy link
Contributor Author

The same approach could be used for vector APIs in future. Please add a comment about if you agree to do it like that!

when the vector api finally comes out, don't interrupt my shuffleboard match to tell me about it!

It has not even reached preview phase, so you can game without stopping for the next 10 years, for sure!

@uschindler
Copy link
Contributor Author

As you see this PR has passed all Github precommit CI checks (it does not need any Java version installed except the compilation one, not even 20).

As you see from the commit I started this approach already around new year's eve. I regenerated the APIJAR files today with the RC candidate. No changes (I only had to fix a bug later regarding the allowedSubclasses attribute that is now stripped).

@uschindler
Copy link
Contributor Author

I moved the signatures files to src/generated/jdk folder. The lib folder was just a quick hack at beginning.

@msokolov
Copy link
Contributor

msokolov commented Mar 6, 2023

In this case, would it make sense to remove the logging message warning us that we don't have JDK19 so we're missing out on the goodness? (Currently I have hacked my logger to suppress - wonder if we need to encourage people to upgrade their JDK though? Maybe I'm misunderstanding the purpose).

@uschindler
Copy link
Contributor Author

uschindler commented Mar 6, 2023

In this case, would it make sense to remove the logging message warning us that we don't have JDK19 so we're missing out on the goodness? (Currently I have hacked my logger to suppress - wonder if we need to encourage people to upgrade their JDK though? Maybe I'm misunderstanding the purpose).

This is about how to compile the classes with Gradle without having JDK 19 or 20 installed.

You mean the runtime message in Lucene 9.5 on Java 20 or 21 at moment? Yes, we can think about removing that log message on startup, but thats unrelated to this issue. This is about building Lucene.

Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, Uwe. I have no idea about legal implications (whether interfaces alone are something that can be copied over).

The only thing I've been wondering about is this: if these apis are still shaping and something changes upstream in the JDK, this change would go unnoticed. Do we rely on a periodic refresh of those API jars only? Or should we actually verify (in check) that the APIs are identical if the given JDK is available?

gradle/generation/panama-foreign.gradle Show resolved Hide resolved
@uschindler
Copy link
Contributor Author

uschindler commented Mar 7, 2023

I like it, Uwe. I have no idea about legal implications (whether interfaces alone are something that can be copied over).

The only thing I've been wondering about is this: if these apis are still shaping and something changes upstream in the JDK, this change would go unnoticed. Do we rely on a periodic refresh of those API jars only? Or should we actually verify (in check) that the APIs are identical if the given JDK is available?

For JDK 19 and JDK 20 the "API changes" phase is over. The final release candidate is out and if it would be rebuilt after a bug fix and we get a new release, the public API won't change anymore. Of course we can execute "regenerate" for safety on March 21, but the above files were generated with the version to be released on that day already. As said before I created the apijar files before new year for the first time (after the Panama API v20 was released), and regenerated yesterday with final release candidate. The resulting apijar file was 100% identical (git showed no changes).

I would only add Java 21 once we are at a place when the API is finalized, which is not yet the sagte.

Chronological checks are not needed, as once JDK 19 / 20 are/were released the APIs are final.

@uschindler
Copy link
Contributor Author

I like it, Uwe. I have no idea about legal implications (whether interfaces alone are something that can be copied over).

I had the same mental problem already when I released forbiddenapis in 2012. The deprectaed signatures were generated by scanning through rt.jar/module system and adding the signatures to file, e.g, for Java 17: https://github.com/policeman-tools/forbidden-apis/blob/main/src/main/resources/de/thetaphi/forbiddenapis/signatures/jdk-deprecated-17.txt

The only difference here is that the file is binary. As said before I have a code path here that uses ASM's Textifier to create a text file with all signatures in a similar way like forbiddenapis. We could also add a README.txt in the generated files folder.

@uschindler
Copy link
Contributor Author

uschindler commented Mar 7, 2023

FYI, here is the possible way to handle those licensing issues. A local patch that I have at hand may add a panama-foreign-jdk20.txt file next to the panama-foreigen-jdk20.apijar with the following content, so there is a plain text about what the JAR file contains and we can add a better "license header" stating OpenJDK.. Both file are each like 70 Kilobytes vs 20 Kilobytes of apijar:

// These API signatures of Panama Foreign API were extracted from JDK 20
// Do not modify this file or the corresponding .apijar file which is used for MR-JAR compilation!

// class version 55.0 (55)
// access flags 0x601
public abstract interface java/lang/foreign/Arena implements java/lang/foreign/SegmentAllocator java/lang/AutoCloseable {


  // access flags 0x1
  public default allocate(JJ)Ljava/lang/foreign/MemorySegment;

  // access flags 0x401
  public abstract scope()Ljava/lang/foreign/SegmentScope;

  // access flags 0x401
  public abstract close()V

  // access flags 0x401
  public abstract isCloseableBy(Ljava/lang/Thread;)Z

  // access flags 0x9
  public static openConfined()Ljava/lang/foreign/Arena;

  // access flags 0x9
  public static openShared()Ljava/lang/foreign/Arena;
}

// class version 55.0 (55)
// access flags 0x601
public abstract interface java/lang/foreign/FunctionDescriptor {


  // access flags 0x401
  // signature ()Ljava/util/Optional<Ljava/lang/foreign/MemoryLayout;>;
  // declaration: java.util.Optional<java.lang.foreign.MemoryLayout> returnLayout()
  public abstract returnLayout()Ljava/util/Optional;
[...]

@uschindler
Copy link
Contributor Author

uschindler commented Mar 7, 2023

This is my local stashed patch (I used it to review the output):

 gradle/generation/panama-foreign.gradle            |  6 ++++-
 .../panama-foreign/ExtractForeignAPI.java          | 30 ++++++++++++++--------
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/gradle/generation/panama-foreign.gradle b/gradle/generation/panama-foreign.gradle
index 694c4656e2f..a4038961ea0 100644
--- a/gradle/generation/panama-foreign.gradle
+++ b/gradle/generation/panama-foreign.gradle
@@ -28,7 +28,7 @@ configure(project(":lucene:core")) {
   }
 
   dependencies {
-    apiextractor "org.ow2.asm:asm:${scriptDepVersions['asm']}"
+    apiextractor "org.ow2.asm:asm-util:${scriptDepVersions['asm']}"
   }
 
   for (jdkVersion : panamaJavaVersions) {
@@ -54,9 +54,13 @@ configure(project(":lucene:core")) {
       
       classpath = configurations.apiextractor
       mainClass = file("${resources}/ExtractForeignAPI.java") as String
+      systemProperties = [
+        'line.separator': "\n",
+      ]
       args = [
         jdkVersion,
         new File(apijars, "panama-foreign-jdk${jdkVersion}.apijar"),
+        new File(apijars, "panama-foreign-jdk${jdkVersion}.txt"),
       ]
     }
 
diff --git a/gradle/generation/panama-foreign/ExtractForeignAPI.java b/gradle/generation/panama-foreign/ExtractForeignAPI.java
index 3eecbe56149..2f01c6d5a55 100644
--- a/gradle/generation/panama-foreign/ExtractForeignAPI.java
+++ b/gradle/generation/panama-foreign/ExtractForeignAPI.java
@@ -15,6 +15,7 @@
  * limitations under the License.
  */
 import java.io.IOException;
+import java.io.PrintWriter;
 import java.net.URI;
 import java.nio.file.Files;
 import java.nio.file.Path;
@@ -34,32 +35,39 @@ import org.objectweb.asm.FieldVisitor;
 import org.objectweb.asm.MethodVisitor;
 import org.objectweb.asm.Opcodes;
 import org.objectweb.asm.Type;
+import org.objectweb.asm.util.TraceClassVisitor;
 
 public final class ExtractForeignAPI {
   
   private static final FileTime FIXED_FILEDATE = FileTime.from(Instant.parse("2022-01-01T00:00:00Z"));
   
   public static void main(String... args) throws IOException {
-    if (args.length != 2) {
-      throw new IllegalArgumentException("Need two parameters: java version, output file");
+    if (args.length != 3) {
+      throw new IllegalArgumentException("Need three parameters: java version, apijar output file, text output file");
     }
     if (Integer.parseInt(args[0]) != Runtime.version().feature()) {
       throw new IllegalStateException("Incorrect java version: " + Runtime.version().feature());
     }
-    var outputPath = Paths.get(args[1]);
+    var apijarPath = Paths.get(args[1]);
+    var signaturesPath = Paths.get(args[2]);
     var javaBaseModule = Paths.get(URI.create("jrt:/")).resolve("java.base").toRealPath();
     var fileMatcher = javaBaseModule.getFileSystem().getPathMatcher("glob:java/{lang/foreign/*,nio/channels/FileChannel}.class");
-    try (var out = new ZipOutputStream(Files.newOutputStream(outputPath)); var stream = Files.walk(javaBaseModule)) {
-      var filesToExtract = stream.map(javaBaseModule::relativize).filter(fileMatcher::matches).sorted().collect(Collectors.toList());
+    try (var apijar = new ZipOutputStream(Files.newOutputStream(apijarPath));
+        var signatures = new PrintWriter(Files.newBufferedWriter(signaturesPath));
+        var classes = Files.walk(javaBaseModule)) {
+      signatures.println("// These API signatures of Panama Foreign API were extracted from JDK " + Runtime.version().feature());
+      signatures.println("// Do not modify this file or the corresponding .apijar file which is used for MR-JAR compilation!");
+      var filesToExtract = classes.map(javaBaseModule::relativize).filter(fileMatcher::matches).sorted().collect(Collectors.toList());
       for (Path relative : filesToExtract) {
+        signatures.println();
         System.out.println("Processing class file: " + relative);
         try (var in = Files.newInputStream(javaBaseModule.resolve(relative))) {
           final var reader = new ClassReader(in);
           final var cw = new ClassWriter(0);
-          reader.accept(new Cleaner(cw), ClassReader.SKIP_CODE | ClassReader.SKIP_DEBUG | ClassReader.SKIP_FRAMES);
-          out.putNextEntry(new ZipEntry(relative.toString()).setLastModifiedTime(FIXED_FILEDATE));
-          out.write(cw.toByteArray());
-          out.closeEntry();
+          reader.accept(new Cleaner(cw, signatures), ClassReader.SKIP_CODE | ClassReader.SKIP_DEBUG | ClassReader.SKIP_FRAMES);
+          apijar.putNextEntry(new ZipEntry(relative.toString()).setLastModifiedTime(FIXED_FILEDATE));
+          apijar.write(cw.toByteArray());
+          apijar.closeEntry();
         }
       }
     }
@@ -69,8 +77,8 @@ public final class ExtractForeignAPI {
     private static final String PREVIEW_ANN = "jdk/internal/javac/PreviewFeature";
     private static final String PREVIEW_ANN_DESCR = Type.getObjectType(PREVIEW_ANN).getDescriptor();
     
-    Cleaner(ClassWriter out) {
-      super(Opcodes.ASM9, out);
+    Cleaner(ClassWriter out, PrintWriter writer) {
+      super(Opcodes.ASM9, new TraceClassVisitor(out, writer));
     }
     
     private static boolean isHidden(int access) {

@uschindler
Copy link
Contributor Author

I don't think this verbosity is needed, as the API is already available as Javadocs (there's no additional information that is not also available as Javadocs). FYI, the package private classes need to be part of the apijar, as they are referred to. An alternative would be to remove those from the implements/superclasses, but thats to risky.

I think we should place a note in our NOTICE.txt with Licensing and/or leave a README in the generated files folder.

What do you think? @dweiss

@uschindler
Copy link
Contributor Author

I further minimized file sizes by removing contents of hidden classes

@dweiss
Copy link
Contributor

dweiss commented Mar 7, 2023

I'm not a lawyer but I think a note in README next to those apijars is sufficient. There's no wrongdoing here and it's not a critical part of Lucene. I don't see how we could infringe anybody's rights here by extracting those interfaces just to ease the pain of compilation/ building Lucene.

@uschindler
Copy link
Contributor Author

uschindler commented Mar 7, 2023

Hi @dweiss,
I added a README.md: https://github.com/apache/lucene/blob/a4de92b25c976ad97aada554da2cd1ccfd396cd3/lucene/core/src/generated/jdk/README.md

@uschindler
Copy link
Contributor Author

FYI,
I also checked this on the branch_9x with Java 11. It works the same way, the only issue is that the apijar files need to contain also signatures of java.util.Objects, because checkIndex() with long arguments is missing in Java 11, so to actually compile with long indexes we need that class in the module patch, too. But the compiler complains easy, so it was easy to fix.

So I think this is ready to go into main and 9.x. I will merge this till end of week.

@uschindler uschindler merged commit e4d8a5c into apache:main Mar 9, 2023
@uschindler uschindler deleted the dev/java20-panama-mmap-apijar branch March 9, 2023 20:27
asfgit pushed a commit that referenced this pull request Mar 9, 2023
…apDirectory with Java 20 Project Panama Preview API (#12188)
@uschindler uschindler added this to the 9.6.0 milestone Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants