Allow CodeInstance in Expr(:invoke) #52797

Keno · 2024-01-08T00:36:52Z

This is a quick experiment to see what it would look like to switch Expr(:invoke) to use CodeInstance rather than MethodInstance. There is some unresolved semantic questions here about whether this is a good idea or if there's some other representation that's better, but discussion might be easier with an implementation.

The larger context here is the question of whether and how to do more general method specialization. I had written some thoughts in [1], but as mentioned there remains ongoing discussion of whether this is the correct direction or not.

[1] https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6

I was reading this code as part of looking into #52797. To recap, when we serialize objects for package images, we classify each method as either internal or external, depending on whether the method extends a function in the ambient set of modules ( system image, other previously loaded package images, etc) or whether it is private to the module we're currently serializing (in which case we call it internal). One of my primary confusions in reading this code was that the `new_specializations` object holds only external code instances in most of the code, but towards the end of loading, we used to also add any internal code instances that had dependencies on other external method instances in order to share the code instance validation code between the two cases. I don't think this design is that great. Conceptually, internal CodeInstances are already in the CI cache at the point that validation happens (their max_world is just set to 1, so they're not eligible to run). We do guard the cache insertion by a check whether a code instance already exists, which also catches this case, but I think it's cleaner to just not add any internal code instances to the list and instead simply re-activate the code instance in situ. Another issue with the old code that I didn't observe, but I think is possible in theory is that any CodeInstances that are not part of the cache hierarchy (e.g. because they were created by external abstract interpreters) should not accidentally get added to the cache hierarchy by this code path. I think this was possible in the old code, but I didn't observe it. With this change, `new_specializations` now always only holds external cis, so rename it appropriately. While we're at it, also do some other clarity changes.

Keno · 2024-01-12T02:39:16Z

Random crashes on this branch appear to be fixed by #52852.

I was reading this code as part of looking into #52797. To recap, when we serialize objects for package images, we classify each method as either internal or external, depending on whether the method extends a function in the ambient set of modules ( system image, other previously loaded package images, etc) or whether it is private to the module we're currently serializing (in which case we call it internal). One of my primary confusions in reading this code was that the `new_specializations` object holds only external code instances in most of the code, but towards the end of loading, we used to also add any internal code instances that had dependencies on other external method instances in order to share the code instance validation code between the two cases. I don't think this design is that great. Conceptually, internal CodeInstances are already in the CI cache at the point that validation happens (their max_world is just set to 1, so they're not eligible to run). We do guard the cache insertion by a check whether a code instance already exists, which also catches this case, but I think it's cleaner to just not add any internal code instances to the list and instead simply re-activate the code instance in situ. Another issue with the old code that I didn't observe, but I think is possible in theory is that any CodeInstances that are not part of the cache hierarchy (e.g. because they were created by external abstract interpreters) should not accidentally get added to the cache hierarchy by this code path. I think this was possible in the old code, but I didn't observe it. With this change, `new_specializations` now always only holds external cis, so rename it appropriately. While we're at it, also do some other clarity changes.

Keno · 2024-01-13T23:02:05Z

My general plan here now that this is working to revert the part of this that makes it use CodeInstance by default (so the capability to use CodeInstance is there, but Base doesn't use it yet), experiment with this in external absint and then come back to the decision of what to do in Base later.

Keno · 2024-02-11T23:08:43Z

Oh, huh, I thought I had merged this. In any case, I'll plan to get this in after #53219, since that is a lot more useful as a separation if this is also in.

vtjnash · 2024-02-13T02:26:53Z

src/interpreter.c

+            jl_value_t *result = invoke(argv[1], &argv[2], nargs - 2, codeinst);
+            JL_GC_POP();
+            return result;
+        }


fall-through is illegal here

Good catch, thanks. The else branch went missing somehow.

This is a quick experiment to see what it would look like to switch Expr(:invoke) to use CodeInstance rather than MethodInstance. There is some unresolved semantic questions here about whether this is a good idea or if there's some other representation that's better, but discussion might be easier with an implementation. The larger context here is the question of whether and how to do more general method specialization. I had written some thoughts in [1], but as mentioned there remains ongoing discussion of whether this is the correct direction or not. [1] https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6

vtjnash · 2024-02-19T20:14:46Z

base/compiler/ssair/inlining.jl

+        return compileable_specialization(ci.def, effects, et, info; compilesig_invokes)
+    end
+    add_inlining_backedge!(et, ci.def) # to the dispatch lookup
+    push!(et.edges, ci.def.def.sig, ci.def) # add_inlining_backedge to the invoke call


I think this may need fixing now, to match the fix to the other case

Since this code path never changes the mi, should this call just be deleted then?

Unclear, since the inlining code seems to be required to add an edge here to ci itself, but such an edge is not currently valid to create, mainly due to the unresolved semantic questions mentioned in https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6

So yeah, maybe just drop it, since this PR is just adding the functionality, but not defining the behavior yet?

vtjnash · 2024-02-19T20:25:31Z

Can you add a test to make sure that any use of this feature will triggers setting ci.relocatable = 0 when stored into a containing CodeInstance? Otherwise these seem likely to trigger problems, since this PR tries to add these to the Module.roots array, and that can trigger issues when those get copied during that serialization process.

This changes the canonical source of truth for va handling from `Method` to `CodeInfo`. There are multiple goals for this change: 1. This addresses a longstanding complaint about the way that CodeInfo-returning generated functions work. Previously, the va-ness or not of the returned CodeInfo always had to match that of the generator. For Cassette-like transforms that generally have one big generator function that is varargs (while then looking up lowered code that is not varargs), this could become quite annoying. It's possible to workaround, but there is really no good reason to tie the two together. As we observed when we implemented OpaqueClosures, the vararg-ness of the signature and the `vararg arguments`->`tuple` transformation are mostly independent concepts. With this PR, generated functions can return CodeInfos with whatever combination of nargs/isva is convenient. 2. This change requires clarifying where the va processing boundary is in inference. #54076 was already moving in that direction for irinterp, and this essentially does much of the same for regular inference. As a consequence the constprop cache is now using non-va-cooked signatures, which I think is preferable. 3. This further decouples codegen from the presence of a `Method` (which is already not assumed, since the code being generated could be a toplevel thunk, but some codegen features are only available to things that come from Methods). There are a number of upcoming features that will require codegen of things that are not quite method specializations (See design doc linked in #52797 and things like #50641). This helps pave the road for that. 4. I've previously considered expanding the kinds of vararg signatures that can be described (see e.g. #53851), which also requires a decoupling of the signature and ast notions of vararg. This again lays the groundwork for that, although I have no immediate plans to implement this change. Impact wise, this adds an internal field, which is not too breaking, but downstream clients vary in how they construct their `CodeInfo`s and the current way they're doing it will likely be incorrect after this change, so they will require a small two-line adjustment. We should perhaps consider pulling out some of the more common patterns into a more stable package, since interface in most of the last few releases, but that's a separate issue.

Keno · 2024-05-07T19:47:29Z

Closing this in favor of #54373

This changes the canonical source of truth for va handling from `Method` to `CodeInfo`. There are multiple goals for this change: 1. This addresses a longstanding complaint about the way that CodeInfo-returning generated functions work. Previously, the va-ness or not of the returned CodeInfo always had to match that of the generator. For Cassette-like transforms that generally have one big generator function that is varargs (while then looking up lowered code that is not varargs), this could become quite annoying. It's possible to workaround, but there is really no good reason to tie the two together. As we observed when we implemented OpaqueClosures, the vararg-ness of the signature and the `vararg arguments`->`tuple` transformation are mostly independent concepts. With this PR, generated functions can return CodeInfos with whatever combination of nargs/isva is convenient. 2. This change requires clarifying where the va processing boundary is in inference. #54076 was already moving in that direction for irinterp, and this essentially does much of the same for regular inference. As a consequence the constprop cache is now using non-va-cooked signatures, which I think is preferable. 3. This further decouples codegen from the presence of a `Method` (which is already not assumed, since the code being generated could be a toplevel thunk, but some codegen features are only available to things that come from Methods). There are a number of upcoming features that will require codegen of things that are not quite method specializations (See design doc linked in #52797 and things like #50641). This helps pave the road for that. 4. I've previously considered expanding the kinds of vararg signatures that can be described (see e.g. #53851), which also requires a decoupling of the signature and ast notions of vararg. This again lays the groundwork for that, although I have no immediate plans to implement this change. Impact wise, this adds an internal field, which is not too breaking, but downstream clients vary in how they construct their `CodeInfo`s and the current way they're doing it will likely be incorrect after this change, so they will require a small two-line adjustment. We should perhaps consider pulling out some of the more common patterns into a more stable package, since interface in most of the last few releases, but that's a separate issue.

Keno marked this pull request as draft January 8, 2024 00:37

Keno mentioned this pull request Jan 10, 2024

staticdata: Some refactors for clarity #52852

Merged

Keno force-pushed the kf/codeinstwip branch from a93074c to 05a1661 Compare January 13, 2024 23:01

Keno force-pushed the kf/codeinstwip branch from 05a1661 to a2b365e Compare January 13, 2024 23:06

Keno force-pushed the kf/codeinstwip branch from a2b365e to 9751fe0 Compare January 24, 2024 01:02

Keno changed the title ~~WIP: Allow CodeInstance in Expr(:invoke)~~ Allow CodeInstance in Expr(:invoke) Jan 24, 2024

Keno marked this pull request as ready for review January 24, 2024 01:02

oscardssmith requested review from aviatesk and vtjnash February 12, 2024 18:42

vtjnash reviewed Feb 13, 2024

View reviewed changes

Keno force-pushed the kf/codeinstwip branch from 9751fe0 to 8924193 Compare February 19, 2024 20:09

vtjnash reviewed Feb 19, 2024

View reviewed changes

vchuravy mentioned this pull request Feb 22, 2024

Use pkgimages for coverage & malloc tracking by ignoring native code in tracked packages #52123

Merged

1 task

Keno mentioned this pull request May 3, 2024

Move nargs/isva to CodeInfo #54341

Merged

Keno mentioned this pull request May 6, 2024

RFC: Refactor MethodInstance to allow for more general specialization #54373

Draft

Keno closed this May 7, 2024

aviatesk deleted the kf/codeinstwip branch May 8, 2024 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow CodeInstance in Expr(:invoke) #52797

Allow CodeInstance in Expr(:invoke) #52797

Keno commented Jan 8, 2024

Keno commented Jan 12, 2024

Keno commented Jan 13, 2024

Keno commented Feb 11, 2024

vtjnash Feb 13, 2024

Keno Feb 19, 2024

vtjnash Feb 19, 2024

Keno Feb 19, 2024

vtjnash Feb 19, 2024

vtjnash commented Feb 19, 2024

Keno commented May 7, 2024

Allow CodeInstance in Expr(:invoke) #52797

Allow CodeInstance in Expr(:invoke) #52797

Conversation

Keno commented Jan 8, 2024

Keno commented Jan 12, 2024

Keno commented Jan 13, 2024

Keno commented Feb 11, 2024

vtjnash Feb 13, 2024

Choose a reason for hiding this comment

Keno Feb 19, 2024

Choose a reason for hiding this comment

vtjnash Feb 19, 2024

Choose a reason for hiding this comment

Keno Feb 19, 2024

Choose a reason for hiding this comment

vtjnash Feb 19, 2024

Choose a reason for hiding this comment

vtjnash commented Feb 19, 2024

Keno commented May 7, 2024