Skip to content

Conversation

@tarcieri
Copy link
Member

On targets where asm! is stable but we don't have a backend using architecture-specific predication instructions, uses asm! to insert an fence/barrier which should hopefully accomplish the intended goal as well or better than core::hint::black_box, using a similar approach to what was introduced in zeroize in #1252.

Suggested by @newpavlov in #1332.

On targets where `asm!` is stable but we don't have a backend using
architecture-specific predication instructions, uses `asm!` to insert an
fence/barrier which should hopefully accomplish the intended goal as
well or better than `core::hint::black_box`.
@tarcieri
Copy link
Member Author

@NicsTr can you verify this still accomplishes the intended codegen?

@tarcieri tarcieri requested a review from newpavlov January 14, 2026 22:25
@NicsTr
Copy link
Contributor

NicsTr commented Jan 14, 2026

When testing cmovnz with thumbv6m-none-eabi, it introduces a branch:

.section .text.not_ct::test_ct_cmov,"ax",%progbits
	.globl	not_ct::test_ct_cmov
	.p2align	1
	.type	not_ct::test_ct_cmov,%function
	.code	16
	.thumb_func
not_ct::test_ct_cmov:
	.fnstart
	.cfi_sections .debug_frame
	.cfi_startproc
	.save	{r4, r5, r7, lr}
	push {r4, r5, r7, lr}
	.cfi_def_cfa_offset 16
	.cfi_offset lr, -4
	.cfi_offset r7, -8
	.cfi_offset r5, -12
	.cfi_offset r4, -16
	.setfp	r7, sp, #8
	add r7, sp, #8
	.cfi_def_cfa r7, 8
	.pad	#4
	sub sp, #4
	uxtb r3, r2
	subs r2, r3, #1
	mov r4, r3
	sbcs r4, r2
	ldrb r2, [r0]
	mov r5, sp
	strb r4, [r5]
	@APP
	@ r5
	@NO_APP
	cmp r3, #0
	bne .LBB0_2
	mov r1, r2
.LBB0_2:
	strb r1, [r0]
	add sp, #4
	pop {r4, r5, r7, pc}

Is this approach of black_box (using inline asm), supposed to be more robust than the core::hint::black_box?

@tarcieri
Copy link
Member Author

That was the hypothesis but perhaps it isn't true

@tarcieri tarcieri closed this Jan 14, 2026
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
tarcieri added a commit that referenced this pull request Jan 15, 2026
In #1332 we ran into LLVM inserting branches in this routine for
`thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with
`black_box` but that seems brittle.

In #1334 we attempted a simple portable `asm!` optimization barrier
approach but it did not work as expected.

This instead opts to implement one of the fiddliest bits, mask
generation, using ARM assembly instead. The resulting assembly is
actually more efficient than what rustc/LLVM outputs and avoids touching
the stack pointer.

It's a simple enough function to implement in assembly on other
platforms with stable `asm!` too, but this is a start.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants