Skip to content

GH-132508: Use tagged integers on the evaluation stack for the last instruction offset #132545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Apr 15, 2025

When reraising in a finally block, the exception needs to look as if it were raised from an earlier point in the code.
To do this we save the earlier instruction offset as an integer on the evaluation stack.
Currently, this requires boxing the integer, which can (extremely rarely) fail.
By using a tagged integer we can avoid that failure mode.

This is might seem like an elaborate fix for a very minor issue, and it is, but we will want tagged integers/pointers for many other things and this is a nice small step to that larger change.

See #132509

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments

@markshannon
Copy link
Member Author

As expected, performance is neutral.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the magic number needs to be updated (and everything regenerated too) since the exception_unwind label now pushes tagged ints.

@bedevere-app
Copy link

bedevere-app bot commented Apr 15, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@markshannon
Copy link
Member Author

I believe the magic number needs to be updated (and everything regenerated too) since the exception_unwind label now pushes tagged ints.

Nothing has changed in terms of stack effects. The only change is that we push a tagged int instead of a boxed one.

@markshannon
Copy link
Member Author

The failure on JIT/ARM64 look like a JIT bug. The assertion errors seem impossible and do not occur on any other platform.
Neither the ARM64 (JIT) interpreter or the other JIT builds show any of the failures.

@Fidget-Spinner
Copy link
Member

I think I've seen that JIT failure before on my old PR when I used a different tagging scheme.

@markshannon
Copy link
Member Author

The JIT ARM 64 windows failure is just a timeout for the int repr test that sometimes happens.

@brandtbucher
Copy link
Member

This isn't a JIT problem, it's a tier two problem (I can reproduce on an M2 Mac with --with-pydebug --enable-experimental-jit=interpreter). Digging in deeper...

@brandtbucher
Copy link
Member

When entering tier two, the current frame's f_executable is 0x0000000000000002. Which is, uh, not a valid stackref. It looks suspiciously like BITS_TO_PTR_MASKED(PyStackRef_TagInt(0)).

@brandtbucher
Copy link
Member

Never mind, it looks like the bug is in the JIT code. I believe this is because the Clang is not actually inlining the some _PyFrame_GetCode calls used for stack_pointer sanity checks on this build. Instead, it’s emitting the code for the function alongside the normal bytecode, and making calls to that. My guess is it’s using some clever calling convention or something and we’re messing up how it’s being called.

@brandtbucher
Copy link
Member

@markshannon, this fixes it:

diff --git a/Tools/jit/_stencils.py b/Tools/jit/_stencils.py
index 8faa9e8cac2..639e4bcc793 100644
--- a/Tools/jit/_stencils.py
+++ b/Tools/jit/_stencils.py
@@ -291,6 +291,7 @@ def process_relocations(
                 hole.kind
                 in {"R_AARCH64_CALL26", "R_AARCH64_JUMP26", "ARM64_RELOC_BRANCH26"}
                 and hole.value is HoleValue.ZERO
+                and hole.symbol not in self.symbols
             ):
                 hole.func = "patch_aarch64_trampoline"
                 hole.need_state = True

@brandtbucher
Copy link
Member

For anyone curious: on this particular build, Clang doesn't inline one of the calls to _PyFrame_GetCode, and instead emits it alongside our _JIT_ENTRY function. This is fine, but our handling of jumps on AArch64 macOS (which inserts trampolines for out-of-range jumps) meant that we tried to "link against" the _PyFrame_GetCode function in the main executable, rather than the (slightly different!) version the compiler emitted as part of the stencil.

Though they both should be identical, my guess is that Clang realized that the function was static and only called in one place by the bytecode, and didn't use an ABI-conforming calling convention. If we change the logic to check for duplicate local symbols in the template before trying to link to the main executable (which is what my change does), then we end up calling the intended version of the function.

(And, in case anyone was wondering: lldb is useless in our JIT code, and printf-debugging didn't help at all in this situation. It was basically just looking at the template we generated alongside the disassembly that led to the "aha" moment.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants