-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make store-to-load forwarding work with FFI Pointers #53124
Comments
It does seem strange that we use |
There are a bunch of test failures which haven't been triaged and addressed: https://dart-ci.firebaseapp.com/cl/305641/3. |
I noticed that a few test failures had to do with |
Interesting, when merging the representations, we merge two non-equal representations:
to
in https://dart-review.googlesource.com/c/sdk/+/231160 cc @alexmarkov I don't believe we use the TypedData data field as int value, but we do use the Pointer data field as int value when doing
Though, that might have some unintended side effects. |
@dcharkes What is the point of having distinct When changing the representation of a slot (such as |
// 'UnboxedFfiIntPtr' should be able to hold a pointer of the target word-size.
// On a 32-bit platform, it's an unsigned 32-bit int because it should be
// zero-extended to 64-bits, not sign-extended (pointers are inherently
// unsigned). Introduced in https://dart-review.googlesource.com/c/sdk/+/94750 and https://dart-review.googlesource.com/c/sdk/+/94860. There is already a TODO as you mention: #36370. I don't remember the motivation for #36370 (comment). @mkustermann do you? Is it that all pointers are always unsigned but we don't have a 64 bit unsigned representation? (e.g. https://dart-review.googlesource.com/c/sdk/+/195920 also defines unboxed-uword as unboxed-word.)
The field is unboxed right? So it doesn't matter whether it's a Smi. It would be a breaking change if we start showing 32 bit To prevent a breaking change, we'd need to insert int-converter instructions at the loads and stores if we use
Which might be less or more labour intensive. then covering all
@blaugold Try adding IntConverter instructions in the places where you've removed |
It's not my PR, I just took a look at the test failures and the code to see what might be going on. :) Hopefully @ds84182 is still interested in working on it. |
I'm still interested in working on it. I thought my latest commit fixed the issue but it seems like it hasn't :c |
@dcharkes then we should diagnose and fix them :-)
I think @blaugold is spot on. Here's a small reproduction for ia32 with the current state of the CL: import 'dart:ffi';
import "package:expect/expect.dart";
import "package:ffi/ffi.dart";
void main() {
while (true) {
testPointerPointerArithmeticSizes();
}
}
Pointer<Int64> int64Array = calloc(2);
void testPointerPointerArithmeticSizes() {
Expect.equals(int64Array.address + 8, int64Array.elementAt(1).address);
}
This happens because once we have a 32-bit address and sign-extend it I think in the end it doesn't matter so much whether we represent C pointer addresses (i.e. |
@dcharkes Did you mean something like that? diff --git a/runtime/vm/compiler/frontend/base_flow_graph_builder.cc b/runtime/vm/compiler/frontend/base_flow_graph_builder.cc
index d152d7b172b..7d69a63d4a9 100644
--- a/runtime/vm/compiler/frontend/base_flow_graph_builder.cc
+++ b/runtime/vm/compiler/frontend/base_flow_graph_builder.cc
@@ -427,6 +427,18 @@ Fragment BaseFlowGraphBuilder::ConvertUnboxedToUntagged(
return Fragment(converted);
}
+Fragment BaseFlowGraphBuilder::ConvertUnboxedIntPtrToFfi() {
+ if (kUnboxedIntPtr == kUnboxedFfiIntPtr) {
+ return Fragment();
+ }
+ Value* value = Pop();
+ auto converted = new (Z) IntConverterInstr(kUnboxedIntPtr, kUnboxedFfiIntPtr,
+ value, DeoptId::kNone);
+ converted->mark_truncating();
+ Push(converted);
+ return Fragment(converted);
+}
+
Fragment BaseFlowGraphBuilder::AddIntptrIntegers() {
Value* right = Pop();
Value* left = Pop();
diff --git a/runtime/vm/compiler/frontend/base_flow_graph_builder.h b/runtime/vm/compiler/frontend/base_flow_graph_builder.h
index e6a13747769..198b4f814e9 100644
--- a/runtime/vm/compiler/frontend/base_flow_graph_builder.h
+++ b/runtime/vm/compiler/frontend/base_flow_graph_builder.h
@@ -185,6 +185,7 @@ class BaseFlowGraphBuilder {
Fragment LoadUntagged(intptr_t offset);
Fragment ConvertUntaggedToUnboxed(Representation to);
Fragment ConvertUnboxedToUntagged(Representation from);
+ Fragment ConvertUnboxedIntPtrToFfi();
Fragment UnboxSmiToIntptr();
Fragment FloatToDouble();
Fragment DoubleToFloat();
diff --git a/runtime/vm/compiler/frontend/kernel_to_il.cc b/runtime/vm/compiler/frontend/kernel_to_il.cc
index 9c31e4d6629..ec9f434862f 100644
--- a/runtime/vm/compiler/frontend/kernel_to_il.cc
+++ b/runtime/vm/compiler/frontend/kernel_to_il.cc
@@ -1381,6 +1381,7 @@ FlowGraph* FlowGraphBuilder::BuildGraphOfRecognizedMethod(
body += LoadLocal(arg_value_not_null);
if (kind == MethodRecognizer::kFfiStorePointer) {
body += LoadNativeField(Slot::PointerBase_data());
+ body += ConvertUnboxedIntPtrToFfi();
} else {
// Avoid any unnecessary (and potentially deoptimizing) int
// conversions by using the representation consumed by StoreIndexed.
@@ -1418,6 +1419,7 @@ FlowGraph* FlowGraphBuilder::BuildGraphOfRecognizedMethod(
body += LoadLocal(parsed_function_->RawParameterVariable(0)); // Pointer.
body += CheckNullOptimized(String::ZoneHandle(Z, function.name()));
body += LoadNativeField(Slot::PointerBase_data());
+ body += ConvertUnboxedIntPtrToFfi();
body += Box(kUnboxedFfiIntPtr);
} break;
case MethodRecognizer::kHas63BitSmis: {
@@ -4712,6 +4714,7 @@ Fragment FlowGraphBuilder::FfiConvertPrimitiveToNative(
Fragment body;
if (marshaller.IsPointer(arg_index)) {
body += LoadNativeField(Slot::PointerBase_data());
+ body += ConvertUnboxedIntPtrToFfi();
} else if (marshaller.IsHandle(arg_index)) {
body += WrapHandle();
} else {
@@ -4848,6 +4851,7 @@ FlowGraph* FlowGraphBuilder::BuildGraphOfFfiNative(const Function& function) {
->context_variables()[0]));
body += LoadNativeField(Slot::PointerBase_data());
+ body += ConvertUnboxedIntPtrToFfi();
if (marshaller.PassTypedData()) {
body += LoadLocal(typed_data); |
Currently, adding
gets turned into
I'll see what I can do to push this PR over the line. |
…es." This reverts commit 06d7a23. Reason for revert: everything crashes on vm-aot-linux-debug-simarm_x64 Original change's description: > [vm/compiler] Change MemoryCopy to also take untagged addresses. > > This CL adds the ability to pass the payload address of the source > and destination directly to the MemoryCopy instruction as an untagged > value. > > The new translation of the _TypedListBase._memMoveN methods use the new > MemoryCopy constructor, retrieving the untagged value of the data field > of both the source and destination. This way, if inlining exposes the > allocation of the object from which the data field is being retrieved, > then allocation sinking can remove the intermediate allocation if there > are no escaping uses of the object. > > Since Pointer.asTypedList allocates such ExternalTypedData objects, > this CL makes that method inlined if at all possible, which removes > the intermediate allocation if the only use of the TypedData object > is to call setRange for memory copying purposes. > > This CL also separates unboxed native slots into two groups: those > that contain untagged addresses and those that do not. The former > group now have the kUntagged representation, which mimics the old > use of LoadUntagged for the PointerBase data field and also ensures > that any arithmetic operations on untagged addresses must first be > explicitly converted to an unboxed integer and then explicitly converted > back to untagged before being stored in a slot that contains untagged > addresses. > > When a unboxed native slot that contains untagged addresses is defined, > the definition also includes a boolean which represents whether > addresses that may be moved by the GC can be stored in this slot or not. > The redundancy eliminator uses this to decide whether it is safe to > eliminate a duplicate load, replace a load with the value originally > stored in the slot, or lift a load out of a loop. > > In particular, the PointerBase data field may contain GC-moveable > addresses, but only for internal TypedData objects and views, not > for external TypedData objects or Pointers. To allow load optimizations > involving the latter, the LoadField and StoreField instructions now > take boolean flags for whether loads or stores from the slot are > guaranteed to not be GC-moveable, to override the information from > the slot argument. > > Notable benchmark changes on x64 (similar for other archs unless noted): > > JIT: > * FfiMemory.PointerPointer: 250.7% > * FfiStructCopy.Copy1Bytes: -26.73% (only x64) > * FfiStructCopy.Copy32Bytes: -25.18% (only x64) > * MemoryCopy.64.setRange.Pointer.Uint8: 19.36% > * MemoryCopy.64.setRange.Pointer.Double: 18.96% > * MemoryCopy.8.setRange.Pointer.Double: 17.59% > * MemoryCopy.8.setRange.Pointer.Uint8: 19.46% > > AOT: > * FfiMemory.PointerPointer: 323.5% > * FfiStruct.FieldLoadStore: 483.3% > * FileIO_readwrite_64kb: 15.39% > * FileIO_readwrite_512kb (Intel Xeon): 46.22% > * MemoryCopy.512.setRange.Pointer.Uint8: 35.20% > * MemoryCopy.64.setRange.Pointer.Uint8: 55.40% > * MemoryCopy.512.setRange.Pointer.Double: 29.45% > * MemoryCopy.64.setRange.Pointer.Double: 60.37% > * MemoryCopy.8.setRange.Pointer.Double: 59.54% > * MemoryCopy.8.setRange.Pointer.Uint8: 55.40% > * FfiStructCopy.Copy32Bytes: 398.3% > * FfiStructCopy.Copy1Bytes: 1233% > > TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list > > Issue: #42072 > Fixes: #53124 > > Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try > Change-Id: I563e0bfac5b1ac6cf1111649934067c12891b631 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324820 > Reviewed-by: Alexander Markov <alexmarkov@google.com> > Commit-Queue: Tess Strickland <sstrickl@google.com> > Reviewed-by: Martin Kustermann <kustermann@google.com> Issue: #42072 Change-Id: I7c31434e01108487de69a32154bbefd1538c6f0f Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/330523 Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Alexander Aprelev <aam@google.com> Auto-Submit: Alexander Markov <alexmarkov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
…es." This is a reland of commit 06d7a23 This version fixes an issue when a phi node has multiple inputs with different unboxed integer representations. The original CL made a change where only the representations were considered, not the range of values for the Phi calculated by range analysis. The reland goes back to the old behavior for this case. Also fixes the new tests on 32-bit architectures. Original change's description: > [vm/compiler] Change MemoryCopy to also take untagged addresses. > > This CL adds the ability to pass the payload address of the source > and destination directly to the MemoryCopy instruction as an untagged > value. > > The new translation of the _TypedListBase._memMoveN methods use the new > MemoryCopy constructor, retrieving the untagged value of the data field > of both the source and destination. This way, if inlining exposes the > allocation of the object from which the data field is being retrieved, > then allocation sinking can remove the intermediate allocation if there > are no escaping uses of the object. > > Since Pointer.asTypedList allocates such ExternalTypedData objects, > this CL makes that method inlined if at all possible, which removes > the intermediate allocation if the only use of the TypedData object > is to call setRange for memory copying purposes. > > This CL also separates unboxed native slots into two groups: those > that contain untagged addresses and those that do not. The former > group now have the kUntagged representation, which mimics the old > use of LoadUntagged for the PointerBase data field and also ensures > that any arithmetic operations on untagged addresses must first be > explicitly converted to an unboxed integer and then explicitly converted > back to untagged before being stored in a slot that contains untagged > addresses. > > When a unboxed native slot that contains untagged addresses is defined, > the definition also includes a boolean which represents whether > addresses that may be moved by the GC can be stored in this slot or not. > The redundancy eliminator uses this to decide whether it is safe to > eliminate a duplicate load, replace a load with the value originally > stored in the slot, or lift a load out of a loop. > > In particular, the PointerBase data field may contain GC-moveable > addresses, but only for internal TypedData objects and views, not > for external TypedData objects or Pointers. To allow load optimizations > involving the latter, the LoadField and StoreField instructions now > take boolean flags for whether loads or stores from the slot are > guaranteed to not be GC-moveable, to override the information from > the slot argument. > > Notable benchmark changes on x64 (similar for other archs unless noted): > > JIT: > * FfiMemory.PointerPointer: 250.7% > * FfiStructCopy.Copy1Bytes: -26.73% (only x64) > * FfiStructCopy.Copy32Bytes: -25.18% (only x64) > * MemoryCopy.64.setRange.Pointer.Uint8: 19.36% > * MemoryCopy.64.setRange.Pointer.Double: 18.96% > * MemoryCopy.8.setRange.Pointer.Double: 17.59% > * MemoryCopy.8.setRange.Pointer.Uint8: 19.46% > > AOT: > * FfiMemory.PointerPointer: 323.5% > * FfiStruct.FieldLoadStore: 483.3% > * FileIO_readwrite_64kb: 15.39% > * FileIO_readwrite_512kb (Intel Xeon): 46.22% > * MemoryCopy.512.setRange.Pointer.Uint8: 35.20% > * MemoryCopy.64.setRange.Pointer.Uint8: 55.40% > * MemoryCopy.512.setRange.Pointer.Double: 29.45% > * MemoryCopy.64.setRange.Pointer.Double: 60.37% > * MemoryCopy.8.setRange.Pointer.Double: 59.54% > * MemoryCopy.8.setRange.Pointer.Uint8: 55.40% > * FfiStructCopy.Copy32Bytes: 398.3% > * FfiStructCopy.Copy1Bytes: 1233% > > TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list > > Issue: #42072 > Fixes: #53124 > > Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try > Change-Id: I563e0bfac5b1ac6cf1111649934067c12891b631 > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324820 > Reviewed-by: Alexander Markov <alexmarkov@google.com> > Commit-Queue: Tess Strickland <sstrickl@google.com> > Reviewed-by: Martin Kustermann <kustermann@google.com> TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list Issue: #42072 Fixes: #53124 Change-Id: Iabb0e910f12636d0ff51e711c8c9c98ad40e5811 Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-debug-simarm_x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/330600 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Tess Strickland <sstrickl@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com>
Missed a couple of LoadUntaggeds for PointerBase.data in the graph intrinsifier during 4d1bdaa. Creates slots for ExternalOneByteString.external_data and ExternalTwoByteString.external_data and uses LoadField with those slots instead of LoadUntagged. TEST=ci Issue: #42072 Fixes: #53124 Change-Id: I900281644c4c42ad303cc7a6121e4c8bb7852cfe Cq-Include-Trybots: luci.dart.try:vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/330787 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Tess Strickland <sstrickl@google.com>
Once we allow inlining force optimized functions (such as
Pointer.fromAddress()
) we seem to generate for this dart codethis flow graph
Instead it should just return a constant.
/cc @alexmarkov
The text was updated successfully, but these errors were encountered: