Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CodeInstance in Expr(:invoke) #52797

Closed
wants to merge 1 commit into from
Closed

Allow CodeInstance in Expr(:invoke) #52797

wants to merge 1 commit into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Jan 8, 2024

This is a quick experiment to see what it would look like to switch Expr(:invoke) to use CodeInstance rather than MethodInstance. There is some unresolved semantic questions here about whether this is a good idea or if there's some other representation that's better, but discussion might be easier with an implementation.

The larger context here is the question of whether and how to do more general method specialization. I had written some thoughts in [1], but as mentioned there remains ongoing discussion of whether this is the correct direction or not.

[1] https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6

@Keno Keno marked this pull request as draft January 8, 2024 00:37
Keno added a commit that referenced this pull request Jan 10, 2024
I was reading this code as part of looking into #52797. To recap, when we serialize
objects for package images, we classify each method as either internal or external,
depending on whether the method extends a function in the ambient set of modules (
system image, other previously loaded package images, etc) or whether it is private
to the module we're currently serializing (in which case we call it internal).

One of my primary confusions in reading this code was that the `new_specializations`
object holds only external code instances in most of the code, but towards the end
of loading, we used to also add any internal code instances that had dependencies on
other external method instances in order to share the code instance validation code
between the two cases.

I don't think this design is that great. Conceptually, internal CodeInstances are
already in the CI cache at the point that validation happens (their max_world is just
set to 1, so they're not eligible to run). We do guard the cache insertion by a check
whether a code instance already exists, which also catches this case, but I think
it's cleaner to just not add any internal code instances to the list and instead
simply re-activate the code instance in situ.

Another issue with the old code that I didn't observe, but I think is possible in theory
is that any CodeInstances that are not part of the cache hierarchy (e.g. because
they were created by external abstract interpreters) should not accidentally get added
to the cache hierarchy by this code path. I think this was possible in the old code,
but I didn't observe it.

With this change, `new_specializations` now always only holds external cis, so rename
it appropriately. While we're at it, also do some other clarity changes.
Keno added a commit that referenced this pull request Jan 11, 2024
I was reading this code as part of looking into #52797. To recap, when we serialize
objects for package images, we classify each method as either internal or external,
depending on whether the method extends a function in the ambient set of modules (
system image, other previously loaded package images, etc) or whether it is private
to the module we're currently serializing (in which case we call it internal).

One of my primary confusions in reading this code was that the `new_specializations`
object holds only external code instances in most of the code, but towards the end
of loading, we used to also add any internal code instances that had dependencies on
other external method instances in order to share the code instance validation code
between the two cases.

I don't think this design is that great. Conceptually, internal CodeInstances are
already in the CI cache at the point that validation happens (their max_world is just
set to 1, so they're not eligible to run). We do guard the cache insertion by a check
whether a code instance already exists, which also catches this case, but I think
it's cleaner to just not add any internal code instances to the list and instead
simply re-activate the code instance in situ.

Another issue with the old code that I didn't observe, but I think is possible in theory
is that any CodeInstances that are not part of the cache hierarchy (e.g. because
they were created by external abstract interpreters) should not accidentally get added
to the cache hierarchy by this code path. I think this was possible in the old code,
but I didn't observe it.

With this change, `new_specializations` now always only holds external cis, so rename
it appropriately. While we're at it, also do some other clarity changes.
@Keno
Copy link
Member Author

Keno commented Jan 12, 2024

Random crashes on this branch appear to be fixed by #52852.

Keno added a commit that referenced this pull request Jan 12, 2024
I was reading this code as part of looking into #52797. To recap, when
we serialize objects for package images, we classify each method as
either internal or external, depending on whether the method extends a
function in the ambient set of modules ( system image, other previously
loaded package images, etc) or whether it is private to the module we're
currently serializing (in which case we call it internal).

One of my primary confusions in reading this code was that the
`new_specializations` object holds only external code instances in most
of the code, but towards the end of loading, we used to also add any
internal code instances that had dependencies on other external method
instances in order to share the code instance validation code between
the two cases.

I don't think this design is that great. Conceptually, internal
CodeInstances are already in the CI cache at the point that validation
happens (their max_world is just set to 1, so they're not eligible to
run). We do guard the cache insertion by a check whether a code instance
already exists, which also catches this case, but I think it's cleaner
to just not add any internal code instances to the list and instead
simply re-activate the code instance in situ.

Another issue with the old code that I didn't observe, but I think is
possible in theory is that any CodeInstances that are not part of the
cache hierarchy (e.g. because they were created by external abstract
interpreters) should not accidentally get added to the cache hierarchy
by this code path. I think this was possible in the old code, but I
didn't observe it.

With this change, `new_specializations` now always only holds external
cis, so rename it appropriately. While we're at it, also do some other
clarity changes.
@Keno
Copy link
Member Author

Keno commented Jan 13, 2024

My general plan here now that this is working to revert the part of this that makes it use CodeInstance by default (so the capability to use CodeInstance is there, but Base doesn't use it yet), experiment with this in external absint and then come back to the decision of what to do in Base later.

@Keno Keno changed the title WIP: Allow CodeInstance in Expr(:invoke) Allow CodeInstance in Expr(:invoke) Jan 24, 2024
@Keno Keno marked this pull request as ready for review January 24, 2024 01:02
@Keno
Copy link
Member Author

Keno commented Feb 11, 2024

Oh, huh, I thought I had merged this. In any case, I'll plan to get this in after #53219, since that is a lot more useful as a separation if this is also in.

jl_value_t *result = invoke(argv[1], &argv[2], nargs - 2, codeinst);
JL_GC_POP();
return result;
}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fall-through is illegal here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. The else branch went missing somehow.

This is a quick experiment to see what it would look like to switch
Expr(:invoke) to use CodeInstance rather than MethodInstance. There
is some unresolved semantic questions here about whether this is
a good idea or if there's some other representation that's better,
but discussion might be easier with an implementation.

The larger context here is the question of whether and how to do
more general method specialization. I had written some thoughts
in [1], but as mentioned there remains ongoing discussion of whether
this is the correct direction or not.

[1] https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6
return compileable_specialization(ci.def, effects, et, info; compilesig_invokes)
end
add_inlining_backedge!(et, ci.def) # to the dispatch lookup
push!(et.edges, ci.def.def.sig, ci.def) # add_inlining_backedge to the invoke call
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may need fixing now, to match the fix to the other case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code path never changes the mi, should this call just be deleted then?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear, since the inlining code seems to be required to add an edge here to ci itself, but such an edge is not currently valid to create, mainly due to the unresolved semantic questions mentioned in https://hackmd.io/@Og2_pcUySm6R_RPqbZ06JA/S1bqP1D_6

So yeah, maybe just drop it, since this PR is just adding the functionality, but not defining the behavior yet?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 19, 2024

Can you add a test to make sure that any use of this feature will triggers setting ci.relocatable = 0 when stored into a containing CodeInstance? Otherwise these seem likely to trigger problems, since this PR tries to add these to the Module.roots array, and that can trigger issues when those get copied during that serialization process.

Keno added a commit that referenced this pull request May 3, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
Keno added a commit that referenced this pull request May 3, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
Keno added a commit that referenced this pull request May 5, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
Keno added a commit that referenced this pull request May 5, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
Keno added a commit that referenced this pull request May 6, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
Keno added a commit that referenced this pull request May 7, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
@Keno
Copy link
Member Author

Keno commented May 7, 2024

Closing this in favor of #54373

@Keno Keno closed this May 7, 2024
Keno added a commit that referenced this pull request May 7, 2024
This changes the canonical source of truth for va handling from
`Method` to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
   CodeInfo-returning generated functions work. Previously, the
   va-ness or not of the returned CodeInfo always had to match that
   of the generator. For Cassette-like transforms that generally have
   one big generator function that is varargs (while then looking up
   lowered code that is not varargs), this could become quite annoying.
   It's possible to workaround, but there is really no good reason to tie
   the two together. As we observed when we implemented OpaqueClosures, the
   vararg-ness of the signature and the `vararg arguments`->`tuple`
   transformation are mostly independent concepts. With this PR, generated
   functions can return CodeInfos with whatever combination of nargs/isva
   is convenient.

2. This change requires clarifying where the va processing boundary is
   in inference. #54076 was already moving in that direction for irinterp,
   and this essentially does much of the same for regular inference. As a
   consequence the constprop cache is now using non-va-cooked signatures,
   which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
   is already not assumed, since the code being generated could be a
   toplevel thunk, but some codegen features are only available to things
   that come from Methods). There are a number of upcoming features that
   will require codegen of things that are not quite method specializations
   (See design doc linked in #52797 and things like #50641). This helps
   pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures that
   can be described (see e.g. #53851), which also requires a decoupling of
   the signature and ast notions of vararg. This again lays the groundwork
   for that, although I have no immediate plans to implement this change.

Impact wise, this adds an internal field, which is not too breaking,
but downstream clients vary in how they construct their `CodeInfo`s and
the current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps consider
pulling out some of the more common patterns into a more stable package, since
interface in most of the last few releases, but that's a separate issue.
@aviatesk aviatesk deleted the kf/codeinstwip branch May 8, 2024 10:04
Keno added a commit that referenced this pull request May 11, 2024
This changes the canonical source of truth for va handling from `Method`
to `CodeInfo`. There are multiple goals for this change:

1. This addresses a longstanding complaint about the way that
CodeInfo-returning generated functions work. Previously, the va-ness or
not of the returned CodeInfo always had to match that of the generator.
For Cassette-like transforms that generally have one big generator
function that is varargs (while then looking up lowered code that is not
varargs), this could become quite annoying. It's possible to workaround,
but there is really no good reason to tie the two together. As we
observed when we implemented OpaqueClosures, the vararg-ness of the
signature and the `vararg arguments`->`tuple` transformation are mostly
independent concepts. With this PR, generated functions can return
CodeInfos with whatever combination of nargs/isva is convenient.

2. This change requires clarifying where the va processing boundary is
in inference. #54076 was already moving in that direction for irinterp,
and this essentially does much of the same for regular inference. As a
consequence the constprop cache is now using non-va-cooked signatures,
which I think is preferable.

3. This further decouples codegen from the presence of a `Method` (which
is already not assumed, since the code being generated could be a
toplevel thunk, but some codegen features are only available to things
that come from Methods). There are a number of upcoming features that
will require codegen of things that are not quite method specializations
(See design doc linked in #52797 and things like #50641). This helps
pave the road for that.

4. I've previously considered expanding the kinds of vararg signatures
that can be described (see e.g. #53851), which also requires a
decoupling of the signature and ast notions of vararg. This again lays
the groundwork for that, although I have no immediate plans to implement
this change.

Impact wise, this adds an internal field, which is not too breaking, but
downstream clients vary in how they construct their `CodeInfo`s and the
current way they're doing it will likely be incorrect after this change,
so they will require a small two-line adjustment. We should perhaps
consider pulling out some of the more common patterns into a more stable
package, since interface in most of the last few releases, but that's a
separate issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants