Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of code emitted for stack loads/stores from Air on ARM #10263

Merged

Conversation

eugeneia
Copy link

@eugeneia eugeneia commented Feb 17, 2023

6d88e26

Reduce size of code emitted for stack loads/stores from Air on ARM
https://bugs.webkit.org/show_bug.cgi?id=252469

Reviewed by Yusuke Suzuki.

This changes callFrameAddr (AirAllocateRegistersAndStackAndGenerateCode.cpp)
and its users to (which ever succeeds first)

 1. generate an address offset from GPRInfo::callFrameRegister
    (unchanged)
 2. generate an address offset from MacroAssembler::stackPointerRegister
 3. load the offset into extendedOffsetAddrRegister and generate
    a BaseIndex [GPRInfo::callFrameRegister, extendedOffsetAddrRegister]
 4. resort to computing the absolute address in extendedOffsetAddrRegister
    (this was the previous fallback after 1.)

For 3. to work reliably we need to change Air::Arg::isValidForm
to reject BaseIndex addressing when it is not supported on ARM.

The result of these changes is a significant reduction in code size
due to avoiding the fallback behavior in many cases (4).

Code generated for the JetStream2 WASM benchmarks is ~30% smaller on ARMv7
and ~40% smaller on ARM64.

* Source/JavaScriptCore/b3/air/AirAllocateRegistersAndStackAndGenerateCode.cpp:
(JSC::B3::Air::callFrameAddr):
(JSC::B3::Air::GenerateAndAllocateRegisters::flush):
(JSC::B3::Air::GenerateAndAllocateRegisters::alloc):
* Source/JavaScriptCore/b3/air/AirArg.h:
(JSC::B3::Air::Arg::isValidAddrForm):
(JSC::B3::Air::Arg::isValidForm const):

Canonical link: https://commits.webkit.org/260680@main

0a55d14

Misc iOS, tvOS & watchOS macOS Linux Windows
βœ… πŸ§ͺ style βœ… πŸ›  ios βœ… πŸ›  mac βœ… πŸ›  wpe βœ… πŸ›  wincairo
βœ… πŸ›  ios-sim βœ… πŸ›  mac-AS-debug βœ… πŸ›  gtk
βœ… πŸ§ͺ webkitperl βœ… πŸ§ͺ ios-wk2 βœ… πŸ§ͺ api-mac βœ… πŸ§ͺ gtk-wk2
βœ… πŸ§ͺ api-ios βœ… πŸ§ͺ mac-wk1 ❌ πŸ§ͺ api-gtk
βœ… πŸ›  πŸ§ͺ jsc βœ… πŸ›  tv βœ… πŸ§ͺ mac-wk2 βœ… πŸ›  jsc-armv7
βœ… πŸ›  πŸ§ͺ jsc-arm64 βœ… πŸ›  tv-sim βœ… πŸ§ͺ mac-AS-debug-wk2   πŸ§ͺ jsc-armv7-tests
βœ… πŸ›  watch βœ… πŸ§ͺ mac-wk2-stress βœ… πŸ›  jsc-mips
βœ… πŸ›  watch-sim βœ… πŸ§ͺ jsc-mips-tests
βœ… πŸ›  πŸ§ͺ unsafe-merge

if (addr.isValidForm(opcode, width))
return addr;

if (isARM64() || isARM_THUMB2()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have arch condition here. If this is only allowed for these archs, then isValidForm should say false to wrong archs instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed those in da11283

GPRReg reg = extendedOffsetAddrRegister();
jit.move(CCallHelpers::TrustedImmPtr(offsetFromFP), reg);
if (isARM64() || isARM_THUMB2()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

ASSERT(m_map[tmp].spillSlot->byteSize() == bytesForWidth(Width64));
m_jit->loadDouble(callFrameAddr(Air::MoveDouble, *m_jit, offset), reg.fpr());
auto src = callFrameAddr(Air::MoveDouble, *m_jit, offset, m_code.frameSize(), Width64);
ASSERT(src.isAddr());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be an invariant because this patch isValidForm rejected BaseIndex addresses for Air::MoveDouble. I've changed isValidIndexForm in da11283 to only reject Air::MoveDouble and Air::MoveFloat on ARM_THUMB2 (because the MacroAssemblerARMv7 will try to use the addressTempRegister to materialize them.) Hence, I've replaced the ASSERTs with branches to handle BaseIndex addresses for loadDouble/storeDouble/loadVector/storeVector (those can and do occur on ARM64.)

} else {
ASSERT(m_map[tmp].spillSlot->byteSize() == bytesForWidth(Width128));
m_jit->loadVector(callFrameAddr(Air::MoveDouble, *m_jit, offset), reg.fpr());
auto src = callFrameAddr(Air::MoveDouble, *m_jit, offset, m_code.frameSize(), Width128);
ASSERT(src.isAddr());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@@ -1396,7 +1412,7 @@ class Arg {
case CallArg:
return isValidAddrForm(opcode, offset(), width);
case Index:
return isValidIndexForm(scale(), offset(), width);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isValidIndexForm should also take op ode instead

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in da11283

@Constellation
Copy link
Member

Code generated for the JetStream2 WASM benchmarks is ~30% smaller on ARMv7 and ~40% smaller on ARM64.

This is really great!

@eugeneia
Copy link
Author

Do I have to do something to kick off a new EWS job for da11283? IIRC last time I force-pushed to this branch it just started a new job automatically.

@tadeuzagallo
Copy link
Member

@eugeneia it should start automatically, but if it fails for some reason, you can force push again and normally it does the trick.

Copy link
Member

@Constellation Constellation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me

} else {
ASSERT(m_map[tmp].spillSlot->byteSize() == bytesForWidth(Width128));
m_jit->loadVector(callFrameAddr(Air::MoveDouble, *m_jit, offset), reg.fpr());
auto src = callFrameAddr(Air::MoveDouble, *m_jit, offset, m_code.frameSize(), Width128);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is pre-existing issue, but we should use MoveVector (but it does not matter in terms of correctness I think).

} else {
ASSERT(m_map[tmp].spillSlot->byteSize() == bytesForWidth(Width128));
m_jit->storeVector(reg.fpr(), callFrameAddr(Air::MoveDouble, *m_jit, offset));
auto dest = callFrameAddr(Air::MoveDouble, *m_jit, offset, m_code.frameSize(), Width128);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

return addr;

// Try finding a valid offset from the stack pointer register instead.
// (this trick is also done in Air::lowerStackArgs:126)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this line since this line number etc. is fragile.

@@ -571,7 +571,7 @@ class LowerToAir {
Value* right = address->child(1);

auto tryIndex = [&] (Value* index, Value* base) -> Arg {
std::optional<unsigned> scale = scaleForShl(index, offset, width);
std::optional<unsigned> scale = scaleForShl(Air::Move, index, offset, width);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Move correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? I chose Air::Move because earlier in the function there is a comment:

// This function currently is currently only used for loads/stores, so
// using Air::Move is appropriate.

(and a similar choice).

Not sure though. I find it tricky to do it these mappings from Air opcodes to desired isValidForm behavior.

@@ -598,7 +598,7 @@ class LowerToAir {
// amount is greater than 1, then there isn't really anything smart that we could do here.
// We avoid using baseless indexes because their encoding isn't particularly efficient.
if (m_locked.contains(left) || !address->child(1)->isInt32(1)
|| !Arg::isValidIndexForm(1, offset, width))
|| !Arg::isValidIndexForm(Air::Move, 1, offset, width))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Move correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -613,7 +613,7 @@ class LowerToAir {
case WasmAddress: {
WasmAddressValue* wasmAddress = address->as<WasmAddressValue>();
Value* pointer = wasmAddress->child(0);
if (!Arg::isValidIndexForm(1, offset, width) || m_locked.contains(pointer))
if (!Arg::isValidIndexForm(Air::Move, 1, offset, width) || m_locked.contains(pointer))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Move correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -2325,7 +2325,7 @@ class LowerToAir {
}

auto tryShl = [&] (Value* shl, Value* other) -> bool {
std::optional<unsigned> scale = scaleForShl(shl, offset);
std::optional<unsigned> scale = scaleForShl(leaOpcode, shl, offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is leaOpcode correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think so? Not sure what the other candidates would be?

@Constellation Constellation added the unsafe-merge-queue Applied to send a pull request to merge-queue, but skip building and testing label Feb 22, 2023
https://bugs.webkit.org/show_bug.cgi?id=252469

Reviewed by Yusuke Suzuki.

This changes callFrameAddr (AirAllocateRegistersAndStackAndGenerateCode.cpp)
and its users to (which ever succeeds first)

 1. generate an address offset from GPRInfo::callFrameRegister
    (unchanged)
 2. generate an address offset from MacroAssembler::stackPointerRegister
 3. load the offset into extendedOffsetAddrRegister and generate
    a BaseIndex [GPRInfo::callFrameRegister, extendedOffsetAddrRegister]
 4. resort to computing the absolute address in extendedOffsetAddrRegister
    (this was the previous fallback after 1.)

For 3. to work reliably we need to change Air::Arg::isValidForm
to reject BaseIndex addressing when it is not supported on ARM.

The result of these changes is a significant reduction in code size
due to avoiding the fallback behavior in many cases (4).

Code generated for the JetStream2 WASM benchmarks is ~30% smaller on ARMv7
and ~40% smaller on ARM64.

* Source/JavaScriptCore/b3/air/AirAllocateRegistersAndStackAndGenerateCode.cpp:
(JSC::B3::Air::callFrameAddr):
(JSC::B3::Air::GenerateAndAllocateRegisters::flush):
(JSC::B3::Air::GenerateAndAllocateRegisters::alloc):
* Source/JavaScriptCore/b3/air/AirArg.h:
(JSC::B3::Air::Arg::isValidAddrForm):
(JSC::B3::Air::Arg::isValidForm const):

Canonical link: https://commits.webkit.org/260680@main
@webkit-commit-queue
Copy link
Collaborator

Committed 260680@main (6d88e26): https://commits.webkit.org/260680@main

Reviewed commits have been landed. Closing PR #10263 and removing active labels.

@webkit-early-warning-system webkit-early-warning-system merged commit 6d88e26 into WebKit:main Feb 22, 2023
@webkit-commit-queue webkit-commit-queue removed the unsafe-merge-queue Applied to send a pull request to merge-queue, but skip building and testing label Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants