Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant References in Compiler IL #16616

Open
jdmpapin opened this issue Jan 27, 2023 · 5 comments
Open

Constant References in Compiler IL #16616

jdmpapin opened this issue Jan 27, 2023 · 5 comments
Assignees

Comments

@jdmpapin
Copy link
Contributor

Motivation

First, for the purpose of the following examples, let's assume that in principle it's possible to constant-fold anything that's considered final. The reality is more complex, but this will be a useful shorthand. With that out of the way...

When the JIT compiler finds a constant integral expression, it simply replaces it with the constant value, just as one would expect, e.g.

iload NUM_LUFTBALLONS I (static final)

becomes

iconst 99

However, for a constant reference-typed expression, assuming the value is non-null, the compiler has (so far) no constant operation with which to replace it. Instead, the original expression is left in-place and it gets annotated with additional information telling us which object is the result, e.g.

aload Unsafe.theUnsafe LUnsafe; (static final)

remains, but gets a known object index attached to it:

aload Unsafe.theUnsafe LUnsafe; (static final) (obj1)

Consider a larger expression that has been annotated in this way:

n5n     aloadi Quux.xyzzy LXyzzy; (final) (obj5)
n4n       aloadi Baz.quux LQuux; (final) (obj4)
n3n         aloadi Bar.baz LBaz; (final) (obj3)
n2n           aload Foo.BAR LBar; (static final) (obj2)

If the only use of these objects is to feed into further loads that get (actually) constant-folded, then everything is fine. For example, if the following is the only use,

        iloadi Xyzzy.answer I (final)
n5n       ==>aloadi (obj5)

then it will be folded, at which point the loads can be eliminated, resulting in just the integral constant:

iconst 42

On the other hand, if for some reason one of the references is actually needed at runtime - for example, to mutate it, to synchronize on it, to store it somewhere, etc. - there are some problems with this approach. For the sake of argument, let's assume that obj5 is needed at runtime, and that each of the others is only needed for this chain of loads leading to obj5.

The most obvious problem is with performance. We only need obj5, but to get it at runtime we have to run a series of four dependent loads, even though we already know which object we want. Ideally, we would be able to get hold of obj5 directly.

It's likely that HotSpot is not affected by this same inefficiency, in which case it will be necessary to address it in order to close performance gaps in some scenarios.

I have hacked together a preliminary prototype just for the purpose of gauging the performance benefit. So far I've tried it in a troublesome JavaScript microbenchmark, where it provided roughly a 15% reduction in time per operation.

The other problem is that if any of the fields are mutable, and if they are later modified, then by repeating the loads at runtime we open ourselves to the possibility that the result of the load isn't the expected reference. This has two ramifications:

  1. The current approach cannot be used for reference-typed fields that we know to be mutable, but for which true constant folding would nonetheless be allowed. This affects at least MethodHandle.form and LambdaForm.vmentry, which I have had to work around with Make the JIT use a single consistent vmentry for each MethodHandle #14172.
  2. Such mutations effectively result in undefined behaviour (UB), since the JIT compiler assumes they won't happen, and therefore their consequences can be arbitrary. For example, if the compiler sees that obj5 is a particular type, e.g. FrobnicatingXyzzy, then it can generate code that accesses the reference it finds at runtime as though it's a FrobnicatingXyzzy even if it isn't one, say because a checkcast was removed, or because a particular method implementation was inlined.

This last point about UB is especially troublesome considering that the rules for what can and cannot be treated as constant by the compiler are not clearly specified. Instead, we try to suss out what we can get away with in a way that is part motivated reasoning and part trial and error. Getting it wrong means that something we thought was constant can actually be mutated, and whenever that happens, we get UB.

Solution

Create known object constants in the IL, e.g. aknown obj5, the implementation of which is guaranteed to produce the expected object reference at runtime.

Because aknown obj5 is guaranteed to produce obj5, constant folding something that turns out to be mutated later does not in itself cause undefined behaviour. Instead it causes the program to misbehave in a more easily understandable way: a load will sometimes remember an earlier value and ignore subsequent updates. This is probably the failure mode that compiler developers would intuitively expect to occur when something has been incorrectly constant-folded. And while this can still cause mayhem, that mayhem can mostly be conceptualized at source/bytecode level. There is no way for it to (directly) cause out of bounds accesses to occur, for example.

Furthermore, in cases where we would like to intentionally fold despite mutability, aknown would allow us to do so straightforwardly. The aforementioned workaround would no longer be necessary, since MethodHandle.form and LambdaForm.vmentry (and anything of the sort) would be handled naturally.

Finally, for performance, aknown obj5 would remove the chain of dependent loads in the motivating example above by directly producing a reference to obj5 and leaving the other references unused. At runtime, only one load is required to obtain the reference.

IL Representation

The JIT compiler already has aconst, which is different in that its constant value is a fixed pointer to a non-movable object (or null). It might not be straightforward to treat a new opcode aknown as a constant in quite the same way as other constant opcodes (i.e. satisfying isLoadConst()), since parts of the compiler likely assume that isAddress() && isLoadConst() implies that the opcode is aconst.

Additionally, if a new opcode aknown does not satisfy isLoadConst(), then parts of the compiler that need to identify common subexpressions (e.g. CSE, PRE, loop versioner) may incorrectly ignore the known object index and incorrectly treat different constant references as equivalent. Those would need to be corrected.

So while a new aknown opcode might be the most natural expression of this concept in the IL - and as such it's the way that the idea is presented above - it may be less disruptive to represent these constants in the IL as loads of special known object symbol references.

A load-based representation would also have the advantage that nearly every part of the compiler that already consumes known object information gets it from the symbol reference and would already know how to interpret these constants. The only differences would be:

  • Instead of "improving" the symbol reference of a node and otherwise leaving it as-is, we would replace it with or transmute it into one of these special loads; and
  • The known object information obtained from the symbol reference cannot ever turn out later to have been incorrect.

Code Generation

To generate code for a constant known object node, it suffices to:

  1. Create a fresh reference with the same lifetime as the JIT body,
  2. Initialize the reference to point to the appropriate known object, and
  3. Generate a load from the reference.

Because the fresh reference is not used for any other purpose, nothing will ever mutate it. Of course, only one such reference is needed for each known object in a given compilation, regardless of how many times it occurs.

Ideally the reference should be located somewhere where the JIT-compiled code can most easily load it, e.g. without needing to materialize a full arbitrary 64-bit address. Placing it as data within the JIT body itself (as in data snippets) should generally allow for PC-relative addressing. Assuming that PC-relative addressing is used, the load would naturally be position-independent code that therefore does not need a relocation if/when AOT supports known objects in the future. (Though a relocation would be still be necessary to initialize the reference itself on load.)

GC and Unloading

If a JIT body holds on to a reference because it was constant-folded, the referent may directly or indirectly prevent other memory from being reclaimed, including by preventing code from being unloaded. For example, consider a JIT body B for a method M defined by a class loader L, and suppose that B has a constant reference that refers to an object x. Suppose also that there is a path from x to a different class loader L', e.g. because x is an instance of a class defined by L', or because it has a reference to another such object, etc. And finally, suppose that every path showing that L' is live goes through the reference from B to x, which I will write as Bx. In this situation, it would be possible to unload L if not for Bx.

(More generally, for a given L', there could be any number of loaders LL', methods M, JIT bodies B, and constant referents x satisfying the above, with one caveat: it's sufficient for each path to L' to go through some such Bx. They need not necessarily all go through a single common such Bx. Regardless, for the sake of exposition it's more straightforward to think of the scenario of a single combination of L, M, B, and x described above.)

As long as it remains possible to run the body B, we need to hold x, and therefore also L', live. Often, this should be fine. We've found x based on the code of M, and if we have a good excuse for holding on to x, then that's also a good excuse for holding onto L'. So far so good.

However, things are more complex if cross-loader inlining has occurred. For instance, what if B has inlined another method N defined by a loader other than L, and x was found based on the code of N rather than that of M? Let's consider the case where N was defined by L' specifically. We know that it's impossible to newly enter into the inlined body of N, since otherwise there would be something other than Bx preventing L' from being unloaded. Since execution won't enter into the inlined body of N, it won't make use of Bx, so L' is held live just because it's currently loaded, which is to say, for no good reason. And we must not prevent unloading without a good reason, since that can cause out-of-memory conditions.

So that places constraints on the reference Bx:

  • It must not be a strong root, since that would prevent unloading L'.
  • It must not be a weak root, since x must remain live as long as it's still possible to run B.
  • Therefore, it must not be a root at all. In particular, this means that we can't simply have the JIT compiler create JNI refs for constant references (though doing so may be a useful step in the implementation). Some GC integration is necessary so that Bx will be scanned only when B is live.

Furthermore, today (without the reference Bx existing) we can unload L' and still keep making use of B afterward despite the fact that N was inlined. That is, barring invalidation for some other reason, B lives just as long as L. If with constant references B would continue to live just as long as L, then so would Bx, once again preventing unloading of L'. As such, I propose that we no longer allow B to remain in use after L' is unloaded, so the GC can treat it as dead unless both L and L' are live.

This results in the following design:

  • For each JIT body, the JIT compiler produces some metadata to describe its (constant) references to the GC. (Details TBD.)
  • The GC scans a JIT body for outgoing references if and only if it determines that the JIT body is live.
  • A JIT body is live if and only if either:
    • It's currently on the stack, or
    • Both:
      • It hasn't been invalidated, and
      • The class loaders of the outermost method and the inlined methods are all live.
  • At the end of the GC cycle, before threads are allowed to continue, we invalidate all JIT bodies belonging to live loaders that contain code inlined from a loader that has just been unloaded. (There is no need for any special handling of JIT bodies whose outermost methods have been unloaded.)
  • JIT bodies that have been invalidated in this way should be queued for recompilation, similar to preexistence invalidation.

Note that (as far as I have been able to tell) a JIT body on the stack already prevents unloading of all of the loaders of the methods inlined into it. And just in case I'm mistaken, it should already do that even though we don't yet have constant references. It's important for the correctness of the generated code in cases where the compiler has been able to propagate a constant class pointer forward out of an inlined body and into the surrounding code, since that surrounding code might still be running.

An alternative to the above design might be to go on allowing B to be used after unloading L', but to have the JIT compiler tell the GC which inlined methods use which constant references. Then the outgoing references could be scanned at a finer grain, so that e.g. if only the inlined body of N needs Bx, then Bx is scanned only when L' is live, and even if L' is dead, other references outgoing from B may be scanned. However, tracking and describing the provenance of all of the constant references would be complex and error-prone, and on top of that there would be complications affecting code motion.

Possible gencon Optimization

Since JIT bodies are generally long-lived, and since their outgoing references are immutable, the referents are also long-lived. Furthermore, in a minor GC cycle no classes are unloaded, so all JIT bodies must be treated as live (except those that have been previously invalidated). To avoid needing to scan all JIT bodies in every minor GC cycle, it may make sense to treat them in a way similar to tenured objects. That is, expect them usually not to refer to nursery objects, and keep a remembered list of ones that do. (For regular tenured objects, we remember those that might refer to nursery objects, but because these references are immutable, we can know more precisely for a JIT body.) Any JIT body that is found to refer to a nursery object at time of creation would need to be remembered immediately. During GC, any JIT body that was scanned and that still refers to a nursery object would need to be remembered again.

Going a bit further, because the referents are long-lived, it might also make sense to have the GC automatically tenure them. Then it wouldn't be necessary to remember the same JIT body again, and no particular JIT body would ever need to be scanned in more than one minor GC cycle.

Other GC Policies

Other GC policies present challenges that I have not yet thought through, e.g.

  • Does this pose a problem for read barriers in concurrent scavenge?
  • Can we hope to eliminate many of those read barriers?
  • How do we adjust for GC policies that allow unloading during any GC cycle, such as balanced?

I expect (and hope) that these aren't too hard to figure out.

Preexisting Unloading-Related Problems

Over the years the JIT compiler has had (and to a smaller degree continues to have) some recurring problems related to class unloading, in particular when the application repeatedly loads and unloads very many classes:

  • There could be very long pause times as the JIT compiler deleted runtime assumptions that had been registered within defunct JIT bodies; and
  • Because we can't move JIT-compiled code, and therefore we can't compact it, there could be high levels of fragmentation wasting much of the code cache, which could therefore fill up and prevent newly loaded code from being compiled.

There are mitigations in place for both of these problems, but because the proposed high-level design w.r.t. GC causes additional JIT body invalidation when unloading, we want to be cognizant and try to avoid stressing the mitigations too much.

My unsubstantiated expectation is that most JIT bodies will not inline code that could be unloaded independently from the caller. It isn't really possible for us to determine whether this is true of applications that have encountered one or both of the above problems, so the best we can do is proceed with caution unless we have good reason to expect otherwise. So, supposing that it's true enough...

The main concern will be that some methods may repeatedly get invalidated and recompiled, each time inlining code from a different loader that is later unloaded. To prevent one or a few such methods from causing an outsize problem, we should limit the number of invalidations and recompilations due to unloading. We can count the invalidations, and when a small limit is reached (perhaps even just 1 or 2), forbid later compilations of the same method from inlining code from loaders that may be unloaded independently.

Note that it's possible to repeatedly load and unload methods that are effectively the same each time, but which from the perspective of the VM are distinct. This could allow a sort of "circumvention" of the invalidation/recompilation limit, allowing for an arbitrary number of compilations of what is more or less the same thing. However, that situation is already possible now. Further, the proposed limit would still apply to each "instance" of the method individually, so if it is a problem method that gets invalidated, the limit would still ensure that the number of compilations of it does not grow by more than a small constant factor.

@jdmpapin
Copy link
Contributor Author

@dsouzai
Copy link
Contributor

dsouzai commented Jan 27, 2023

An alternative to the above design might be to go on allowing B to be used after unloading L', but to have the JIT compiler tell the GC which inlined methods use which constant references. Then the outgoing references could be scanned at a finer grain, so that e.g. if only the inlined body of N needs Bx, then Bx is scanned only when L' is live, and even if L' is dead, other references outgoing from B may be scanned. However, tracking and describing the provenance of all of the constant references would be complex and error-prone, and on top of that there would be complications affecting code motion.

A couple of thoughts on this alternative as this was the first thing I thought of:

In terms of keeping track of the provenance, it seems like it should be relatively straightforward considering one could just have a (yet another) table in the J9JITExceptionTable, where each entry contains the inlined site index and then the constant reference. However, I haven't thought deeply on this so I won't make any claims about the complexity.

In terms of the complications affected code motion, why wouldn't the guard for the inlined method be sufficient? That is, shouldn't a guard entirely protect the region it covers? At the moment (where we don't have constant references), how are we able to deal with code motion when an inlined method is invalidated due to its class getting unloaded; for example, how do we deal with the fact that a constant class pointer was propagated forward out of an inlined body that was invalidated due to class unloading (let's say the method was not on the stack when the unloading occurred)?

@jdmpapin
Copy link
Contributor Author

In terms of keeping track of the provenance, it seems like it should be relatively straightforward considering one could just have a (yet another) table in the J9JITExceptionTable, where each entry contains the inlined site index and then the constant reference. However, I haven't thought deeply on this so I won't make any claims about the complexity.

Yeah, we could produce a description of the provenance sort of like that, but I think it would be a bit more complex. I doubt we'd want to tie this to particular inlined methods (or worse, particular inlined sites), when it only matters at class loader granularity. Furthermore, the loader/reference relationship is many-to-many. (The same applies to the inlined method/reference or inlined site/reference relationships, if we were to insist on those.)

The real trouble though is not so much describing the provenance as tracking it accurately at compile time. We'd need to be certain to ascribe the correct provenance whenever we discover a known object from scratch (i.e. from a static), and then we'd need to propagate that provenance in the appropriate way as we discover new known objects based on earlier ones. We'd have to properly take care of those many-to-many relationships as we propagate, and sometimes a newly found known object will have actually been seen before, so we'd have to be sure to always combine the old and new provenance correctly.

That strategy would also further complicate GC, since we couldn't simply conceptualize of JIT bodies being live or dead, with live JIT bodies needing to be scanned. Instead we'd have to think of (JIT body, loader) pairs as being live or dead. Live JIT bodies would need to be scanned multiple times - up to once per loader - and for each body the GC would have to track w.r.t. which loaders it has already been scanned.

So those are the two aspects that would really be error-prone IMO.

In terms of the complications affected code motion, why wouldn't the guard for the inlined method be sufficient? That is, shouldn't a guard entirely protect the region it covers? At the moment (where we don't have constant references), how are we able to deal with code motion when an inlined method is invalidated due to its class getting unloaded; for example, how do we deal with the fact that a constant class pointer was propagated forward out of an inlined body that was invalidated due to class unloading (let's say the method was not on the stack when the unloading occurred)?

Sorry, code motion usually means speculative evaluation, not constant propagation. I'll answer this first and then talk about the problem with speculative evaluation.

In order to propagate a constant class pointer forward out of an inlined body, the part of the surrounding code into which we propagate it has to be dominated by the inlined body. This arrangement can happen in a few ways, e.g. OSR on guard failure, block splitter, virtual guard tail splitter, loop replicator, guard versioning. So let's assume there was an opportune GC cycle during which the JIT body was not on the stack, and that we were therefore able to unload the inlined method in question. Clearly the inlined body needs to be unreachable. Then since the JIT body wasn't on the stack, after the GC cycle that did the unloading, the only way for execution to reach any part of the JIT body is through the entry point. So then the point of occurrence of the propagated constant class pointer is also unreachable, because it's dominated by the unreachable inlined body.

(Maybe technically I should back off this part a bit: "has to be dominated by the inlined body." In full generality, propagating a constant only requires that no matter where we got a value, we can tell that it's always the same constant value. For example, for an auto that would mean that all reaching definitions provide the same constant value. Regardless, it's the same kind of story. If the class is unloaded, the original points of occurrence will all have to be unreachable, and if we need to go through one of those to reach the new point of occurrence, then it's unreachable too.)

As for the code motion that I meant, consider loop versioner (which is the code motion pass I'm most familiar with). It sees some invariant expression in the loop, and it hoists it out before the loop, where it will be evaluated speculatively. That is, it will sometimes be evaluated ahead of the loop even if the loop itself won't evaluate the original expression. If the expression is within an inlined method, this might move the evaluation out of the part of the code protected by the guard. And then in particular, if the inlined method is unloaded, we might still evaluate the hoisted expression even though the guard will necessarily fail.

One example of something we can version is a profiled guard with VFT test, which oddly enough, in order to protect the inlined body, itself needs access to a class that might in general be unloaded. We deal with this by registering a runtime assumption that will patch the code to replace the class pointer with -1. Since no actual receiver will have a class pointer equal to -1, the guard will fail and send execution to the cold loop.

We can also version checkcast by generating an instanceof. I don't know that we create the runtime assumption in this case, since instanceof would need to be able to deal with a -1. Loop versioner tries to avoid versioning checkcast if the class could get unloaded because at one point class unloading caused a crash within such an instanceof. However, I doubt that versioner is therefore immune to this kind of problem today. For one, it can generate instanceof for other reasons, e.g. to check that a hoisted field load is safe. It's also possible that a hoisted expression could contain a loop-invariant but not necessarily constant load relative to a constant class, e.g. aloadi <javaLangClassFromClass> (loadaddr C) would be such a load today due to our lack of constant references.

So I think that even without constant references, my proposal to invalidate JIT bodies that have inlined unloaded methods would still have benefits in the compiler today when it comes to code motion involving constant class pointers. It might fix some current bugs. It would simplify the conditions under which such code motion can be done, since the possibility of unloading would no longer be a concern at all. It would also let us optimize more aggressively since there's no need to back off of due to unloading-related correctness constraints. And we could get rid of the runtime assumptions I mentioned in profiled guards.

@jdmpapin
Copy link
Contributor Author

Once this is complete, we should undo any workarounds that were necessitated by the previous approach:

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Mar 7, 2024

From @amicic on implementation as pertaining to GC:

I believe you should be able to do most of GC changes (which I don't think should be very large - most of it should be outside of GC, and GC simply calling it) with some guidance from me or perhaps Dmitri.

The part about discovering those const references, should be easy (again from GC perspective) - we just need to add another step to class scanning (like there is a step for const pool - look for constantPoolObjectSlotIterator) that will do iteration of JIT bodies and associated reference consts. But that iteration is actually not a real GC work (same as for iteration of const pool, what is core-VM's specific container/iterator). I assume you should be doing this work anyway (even if a GC person ends up hooking this iterator to class scanning).

(Side note: The design I've laid out so far makes scanning JIT bodies a bit more complex, since some JIT bodies belonging to live classes could turn out to be dead. I'm wondering now whether as a simplification we could afford to avoid inlining code that could be unloaded independently from the caller.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants