Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LTW with a parallel-capable class loader and generated classes #278

Merged

Conversation

urisimchoni
Copy link
Contributor

This PR fixes the situation where AspectJ LTW Java agent is invoked by the instrumentation interface simultaneously in two threads, for the same class in the same class loader, and the weaving generates an inner class.

Parallel-capable class loaders have been around since Java 7, and they might be loading the same class twice in two threads.

AspectJ LTW weaver sometimes generates classes to handle the closure of around advices. With the same class being loaded twice, AspectJ wants to avoid generating the class twice. It consults a generateClasses map to see if a class by the same name with associated generated classes has already been woven by this weaver.

However, if a match is found in the map, AspectJ would return the original bytes, and that causes the thread that "lost the race" to use an unwoven class. This was discovered with WildFly when defining multiple DataSources with the same driver - it looks like WildFly is using some parallel executor to create the DataSource instances, and when trying to weave the DataSource it would sometimes not work.

The fix here returns the woven bytes of the previous weaving operation.

The PR also includes a simplification of the code, of not storing the names of generated classes in the generateClasses map, only names of woven classes with associated generated classes. This was done when I realized that this map is never consulted for the generated class key, only for the woven class name.

@kriegaex
Copy link
Contributor

kriegaex commented Feb 6, 2024

Hi @urisimchoni. Thanks for this PR. Your explanation sounds reasonable. I hope I can look into this in more detail soon. It would be nice to have one or more regression tests reproducing the problem without this patch. A best effort reproducer would also help already. I can take care of incorporating it into the AspectJ test suite with its strange XML configuration mechanics, if that is what stops you from providing a reproducer.

@urisimchoni
Copy link
Contributor Author

Thanks for looking into this @kriegaex. I compiled a reproducer here - https://gitlab.com/urisimchoni/aspectj-parallel-issue

If you can point me to a close-enough test I can have a go at it with adapting it for AspectJ test suite. Because this is fundamentally a race condition, the reproduce doesn't reproduce 100% of the time, but I've seen it happen after a few runs (less than 10), whereas with the fix it does not reproduce in 1000 runs.

One thing I discovered while putting this together was that the Java documentation WRT parallel class loading (https://docs.oracle.com/javase/8/docs/technotes/guides/lang/cl-mt.html) is a bit vague about whether it's possible to be loading the same class into the same class loader in two threads simultaneously. One paragraph says "...a class by a parallel capable class loader now synchronizes on the pair consisting of the class loader and the class name", Whereas another paragraph is saying that when constructing a parallel class loader one should "Decide upon an internal locking scheme", without specifying what this scheme is.

Obviously, if parallel loading of same class was forbidden, there would not be an issue, but the WildFly loader seems to avoid locking altogether (instead it defines classes and catches the LinkageError exception that occurs if the class already exists), which brings us to this issue.

@urisimchoni
Copy link
Contributor Author

I opened issue #279 so that we can associate the test code with an issue.

@kriegaex kriegaex self-assigned this Feb 7, 2024
@kriegaex kriegaex added bug Something isn't working enhancement New feature or request labels Feb 7, 2024
@kriegaex kriegaex added this to the 1.9.21.1 milestone Feb 7, 2024
@kriegaex
Copy link
Contributor

kriegaex commented Feb 7, 2024

@urisimchoni, sorry for the inconvenience, but practically simultaneously with your PR, another issue came in, #277, which I just fixed with #280. Unfortunately, I touched the same classes your PR does. Even if it merges without conflicts, I kindly request you to rebase this one and verify that your change still does what it is supposed to in the new configuration.

FYI, my change was about returning null from the weaver when it is sure that there were no changes. This is according to the contract of java.lang.instrument.ClassFileTransformer::transform and, while not strictly necessary, helps to avoid certain CDS problems and might result in an instrumentation speed-up in general (untested), possibly avoiding unnecessary redefinitions (or retransformations, if we use the latter in the future).

Thanks in advance. Your change is next in line, if I have free cycles next time.

@urisimchoni
Copy link
Contributor Author

I've rebased the PR. According to my analysis, the recent change is not affected-by and does not affect this PR:

  • The early return with already woven bytes could be checked against unwoven bytes, but this is not really necessary as the woven bytes represent addition of a generated inner class, and this necessarily modifies the class.
  • Returning of null in various places instead of unmodified bytes also does not affect this fix, because this returning-of-null happens at an outer "layer" than the one that populates the generatedClasses cache.

I also re-tested with the reproducer.

Regarding turning this reproducer to a test - maybe I'll have cycles for this in a few days, but a few pointers might come in handy:

  • How to run a single test instead of the entire suite
  • Is there a specific test I should be looking into for reference
    Thanks,
    Uri.

@kriegaex
Copy link
Contributor

kriegaex commented Feb 8, 2024

Thanks for the reproducer. I already converted a modified version of it with renamed classes and a loop doing the parallel weaving several times into an integration test which reliably reproduces the problem before your fix.

I then rebased your (slightly reworded) commits on the commit with the test and verified, that the problem is really fixed by them. I also reviewed your code, but did not fully immerse myself into it due to missing cycles. I hope not to regret that later, committing something I do not fully understand, but the build is green, and so I am confident that nothing important is broken.

As soon as the CI build passes, I will merge.

kriegaex and others added 6 commits February 8, 2024 12:03
Test "same class woven concurrently in parallel-capable classloader"
reproduces eclipse-aspectj#279. Originally taken and modified from:
https://gitlab.com/urisimchoni/aspectj-parallel-issue

Co-authored-by: Uri Simchoni <urisimchoni@gmail.com>
Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
Store the woven class and not a generated (inner) class as value of the
woven class name. This has no effect in this commit, because the value
is not consulted at all, but will be used later on.

Relates to eclipse-aspectj#279.
A parallel-capable class loader might load the same class multiple times
in multiple threads, see
https://docs.oracle.com/javase/7/docs/technotes/guides/lang/cl-mt.html.

In this scenario, the JVM instrumentation mechanism invokes the AspectJ
weaver twice on the same class. In simple weaving scenarios, this would
just cause weaving logic to run multiple times and produce the same
result for each thread, but if the weaver is generating new classes,
it is undesirable to generate those classes multiple times.

To avoid this, the generatedClasses map in WeavingAdaptor keeps track of
classes for which inner classes were generated and avoids re-weaving
them. However, before this change, if a class was found in the generated
map, the weaver would return the un-woven bytes, which would cause
AspectJ not to work for some of the threads.

This change returns the woven bytes instead of re-running weaving.

Fixes eclipse-aspectj#279.
WeavingAdaptor.couldWeave() tested two things: whether this specific
class (by its name) should be excluded from weaving, and whether the
class has generated classes associated with it (in which case we avoid
weaving to avoid re-creating those generated classes). However, if the
class has generated classes associated, couldWeave() is not called at
all because of the new wovenWithGeneratedClass() test, so we have
only the name test which is now called directly.

Relates to eclipse-aspectj#279.
The generatedClasses map contained both keys of woven classes that had
generated classes associated with them, and the keys of the generated
classes themselves. It seems like this map is never consulted for the
generated class key - the generated class is generated from within the
context of woven class loading / retransformation, and thus no transform
event is generated for it by the JVM. Because of that, we do not need to
guard against re-weaving it. Other uses of generatedClasses map are for
full/empty tests, where the woven class key is sufficient. This change
simplifies deletion of a class because we do not have to look for its
associated generated classes.

Relates to eclipse-aspectj#279.
Drive-by cosmetics in the context of eclipse-aspectj#279.

Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
@kriegaex kriegaex merged commit dbce4ca into eclipse-aspectj:master Feb 8, 2024
1 of 3 checks passed
@kriegaex
Copy link
Contributor

kriegaex commented Feb 8, 2024

Thank you so much for your exemplary PR and the very nice test template, which I could convert into an IT with reasonable effort. Thanks for helping to make AspectJ better.

@urisimchoni
Copy link
Contributor Author

Thank you for landing this so quickly and for turning the reproducer into a test - realizing that you can loop instead of forking a new JVM per iteration because you can create a new class loader instance per iteration was brilliant!

@kriegaex
Copy link
Contributor

kriegaex commented Apr 8, 2024

@urisimchoni, OSGI weaving in Eclipse is broken in AJDT due to this PR. I was looking for the root cause for days and thought it was to be found elsewhere in other changes I made in the last few months. Just now, I experimentally reverted this PR, and suddenly a weaving test in AJDT that was failing before is passing again. I have yet to analyse why that is, but maybe you can enlighten me. Are you an AJDT user, or do you work with an IDE other than Eclipse?

Correction: For some reason, the revert fixes the test, but not the whole problem when using OSGi weaving in isolation. Maybe, it was a mix of your changes and mine, or maybe it is about changes in Eclipse JDT or Equinox. It is quite difficult to debug, basically I am using trial and error, because I know next to nothing about Eclipse.

@urisimchoni
Copy link
Contributor Author

@kriegaex Alas, I'm not an AJDT user. If there's some way for me to reproduce I can have a look - I didn't spot any recent issue.
Perhaps my change broke some reflective access or generated-code access to the generateClasses map? I did change what's stored in that map, based on the premise that current code (before change) is only concerned with keys of that map. I recall that this struck me as odd that none is looking at the map values (why not use a set), but I figured this must be some left-over or preparation for something that never materialized. Still, it's hard to believe that this map is accessed directly from generated code without some accessor function.

@kriegaex
Copy link
Contributor

kriegaex commented Apr 8, 2024

@urisimchoni, it is accessed from child class OSGiWeavingAdaptor, hence the protected field. I inherited all that stuff, I did not implement it myself. See also eclipse-aspectj/ajdt#57.

kriegaex added a commit that referenced this pull request Apr 8, 2024
This was introduced in commit 8a4aa03 of PR #278 contribution as part
of the #279 fix. The contributor thought that the generated closure
class entries were never used, but in fact AJDT class OSGiWeavingAdaptor
relies on the presence of those entries.

To the best of my present knowledge, it looks as if this change was the
root cause of eclipse-aspectj/ajdt#57.
Therefore, I reverted it, simultaneously refactoring Iterator::remove
usage to delete entries from the map to Collection::removeIf.

Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
kriegaex added a commit that referenced this pull request Apr 8, 2024
This was introduced in commit 8a4aa03 of PR #278 contribution as part
of the #279 fix. The contributor thought that the generated closure
class entries were never used, but in fact AJDT class OSGiWeavingAdaptor
relies on the presence of those entries.

To the best of my present knowledge, it looks as if this change was the
root cause of eclipse-aspectj/ajdt#57.
Therefore, I reverted it, simultaneously refactoring Iterator::remove
usage to delete entries from the map to Collection::removeIf.

Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants