Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with AOT compilation. #8

Closed
jszakmeister opened this issue Oct 14, 2016 · 13 comments
Closed

Possible issue with AOT compilation. #8

jszakmeister opened this issue Oct 14, 2016 · 13 comments

Comments

@jszakmeister
Copy link

While writing an application, I ran into an issue where I'd get an exception thrown:

Exception in thread "main" java.lang.ClassCastException: clojure.pprint.proxy$java.io.Writer$IDeref$PrettyFlush$4923d848 cannot be cast to clojure.pprint.PrettyFlush

I've been able to reproduce this with a much smaller example. I created two versions of the project, a Gradle version and a Leiningen version. The code is the same in both, the only difference here are the build tools. You can find them here:

https://github.com/jszakmeister/aot-pprint-gradle
https://github.com/jszakmeister/aot-pprint-lein

I'm using the shadow plugin to create an uberjar for gradle. To see the problem, checkout aot-pprint-gradle, run gradle shadowJar, and then run java -jar build/libs/aot-pprint-gradle-0.1.0-SNAPSHOT-all.jar. For the Leiningen version, check it out, run lein uberjar, and then run java -jar target/uberjar/aot-pprint-0.1.0-SNAPSHOT-standalone.jar. You'll see that the Gradle version fails, but the Leiningen version succeeds.

It's not clear what's going on here, and why the difference. I don't know enough about the under-the-hood bits between the implementations to point you in the right direction or I'd supply a patch (sorry!). But hopefully, it helps you to identify the problem.

@jszakmeister
Copy link
Author

I did a little more looking into this, but I'm still not sure what the difference actually is. The 2 project generate essentially the same binary outputs. A couple of generated classes have slightly different names between the two (not sure why that is, but javap shows that everything but the names appears to be the same). The only other file really in question is core__init.class. diff shows the binaries being different, but javap shows the same bytecode in both. I'm at a loss for why one works and the other doesn't. :-(

Here's the result of diffing the two unpacked uberjars:

Only in aot-pprint-lein/target/uberjar/tmp/aot_pprint: core$fn__38.class
Only in aot-pprint-lein/target/uberjar/tmp/aot_pprint: core$fn__38.class.bc
Only in aot-pprint-gradle/build/libs/tmp/aot_pprint: core$fn__5.class
Only in aot-pprint-gradle/build/libs/tmp/aot_pprint: core$fn__5.class.bc
Binary files aot-pprint-lein/target/uberjar/tmp/aot_pprint/core__init.class and aot-pprint-gradle/build/libs/tmp/aot_pprint/core__init.class differ
Only in aot-pprint-lein/target/uberjar/tmp/aot_pprint: core$loading__5569__auto____36.class
Only in aot-pprint-lein/target/uberjar/tmp/aot_pprint: core$loading__5569__auto____36.class.bc
Only in aot-pprint-gradle/build/libs/tmp/aot_pprint: core$loading__5569__auto____3.class
Only in aot-pprint-gradle/build/libs/tmp/aot_pprint: core$loading__5569__auto____3.class.bc
Only in aot-pprint-lein/target/uberjar/tmp/META-INF: leiningen
diff -ur aot-pprint-lein/target/uberjar/tmp/META-INF/MANIFEST.MF aot-pprint-gradle/build/libs/tmp/META-INF/MANIFEST.MF
--- aot-pprint-lein/target/uberjar/tmp/META-INF/MANIFEST.MF 2016-10-20 12:22:00.000000000 -0400
+++ aot-pprint-gradle/build/libs/tmp/META-INF/MANIFEST.MF   2016-10-20 10:38:56.000000000 -0400
@@ -1,6 +1,5 @@
 Manifest-Version: 1.0
-Built-By: jszakmeister
-Created-By: Leiningen 2.7.1
-Build-Jdk: 1.8.0_65
+Implementation-Title: maven-mirror
+Implementation-Version: 0.1.0-SNAPSHOT
 Main-Class: aot_pprint.core

Only in aot-pprint-lein/target/uberjar/tmp/META-INF/maven: aot-pprint
Only in aot-pprint-lein/target/uberjar/tmp: project.clj

@jszakmeister
Copy link
Author

I should add that I turned on coping the source files into the jar too, just so that both versions would be a little closer in file output. But it's not necessary to do so.

@jszakmeister
Copy link
Author

There is definitely a difference in class loading between the two. In the Gradle version, when running with java -verbose:class -jar JARFILE, I see this in the output:

[Loaded clojure.pprint.PrettyFlush from __JVM_DefineClass__]
...
[Loaded clojure.pprint.PrettyFlush from file:/home/jszakmeister/projects/aot-pprint-gradle/build/libs/aot-pprint-gradle-0.1.0-SNAPSHOT-all.jar]

With the Leiningen version, I see only:

[Loaded clojure.pprint.PrettyFlush from file:/home/jszakmeister/projects/aot-pprint-lein/target/uberjar/aot-pprint-0.1.0-SNAPSHOT-standalone.jar]

@jszakmeister
Copy link
Author

I should also add that when running with -XX:+TraceClassLoading, I can see clojure.core__init being loaded in the Leiningen version, but not the Gradle one. I'm not sure why that would be.

@jszakmeister
Copy link
Author

Ran into this again trying to use component (stuartsierra/component#54). In this case the Lifecycle protocol was getting loaded again later and subsequently caused my components to fail to be recognized as implementing the correct protocol. So components failed to start silently. :-(

For the life of me, I can't see what the difference is between the resultant jars between Leiningen and the gradle-clojure plugin. :-(

@jszakmeister
Copy link
Author

It could be how things are packaged too, I suppose. But I've tried re-packaging the jar file in several different ways and haven't been able to get it to work correctly--short of leaving out all the source files present in the dependencies.

@jszakmeister
Copy link
Author

So I've run a number of experiments here:

  • I monkeyed with the file order in the zip archive:
    • I tried repackaging the jar in the same order as Leiningen had done it.
    • I tried making sure class files appeared before .clj files
    • None of my experiments here helped.
  • I set a few more options up on javap and disassembled even more of the class files and compared them. There are two files that end up with slightly different names between the leiningen and gradle builds, but the core content is the same.
  • I experimented with enabling direct-linking to see if that made some difference. It did not.

Finally, I took the clojure class files and replaced them with the ones from the original jar file. That worked. Why it works, I still don't know. :-( But it appears things are unhappy when we re-compile the core Clojure library.

@jszakmeister
Copy link
Author

jszakmeister commented Oct 26, 2016

I've got a little more information. After comparing the clojure/ directories from the shadowjar and the actual 1.8.0 Clojure release, they have identical content. I think it's a timestamp-related issue. Here are the steps:

  1. unpack the shadowjar
  2. run java -cp . aot_pprint.core and see it fail
  3. run touch clojure/pprint/utilities__init.class
  4. run java -cp . aot_pprint.core and watch it work

I'm not quite sure why things are this way, but I guess the steps to copy the dependencies into the shadowjar aren't preserving the original timestamps. So it may actually be the shadowjar plugin that's broken here. :-( Though, it probably means that we need to be careful the gradle-clojure to make sure things happen in the right order too--and that timestamps are preserved.

@jszakmeister
Copy link
Author

I've put up a proposed fix for the shadow plugin in GradleUp/shadow#260. Stuart Halloway also left an interesting comment in CLJ-1544 saying:

The person who runs a particular Clojure app can solve this problem by making sure their own consumption of AOT compilation is "infectious", i.e. if you want to AOT-compile library A which uses library B, then you need to AOT-compile library B as well.

I'm not sure how well we can implement that kind of strategy in Gradle, so I wonder if it's worth mentioning that it's problematic to take that approach.

At any rate, hopefully some form of my patch will be accepted for the shadow plugin and this can be resolved.

@cursive-ide
Copy link
Owner

Thanks for all the great investigation @jszakmeister, I'm sorry I've been busy.

Just so that I'm clear from reading all that: the theory is that in the shadow jar, the timestamps were not preserved so Clojure thought the source files were more recent than the AOT'ed class files. It loaded and compiled the source files, but then the original AOT'ed classes can not see the resulting JIT-compiled classes because the JITed classes are loaded into a DynamicClassLoader which is not visible to the AOT'ed code. Is that right? I think that sounds plausible.

@jszakmeister
Copy link
Author

No worries @cursive-ide. Yes. Clojure itself is compiled with direct linking, and I think what was happening is that some parts of Clojure were essentially re-compiled because of the timestamp being newer on some sources. But some other parts had already referenced the AOT-compiled classes, and when the two parts interacted, things fell apart. Kind of like when working at the REPL and reloading namespaces that contains protocols.

FWIW, my fix to the shadow plugin was merged, though I don't know when a release containing the fix will be cut.

@jszakmeister
Copy link
Author

Version 1.2.4 of the shadow plugin contains my fix for the timestamps, and things now work as expected. Whew! :-)

@cursive-ide
Copy link
Owner

Whew indeed! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants