-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FFI][RUNTIME] Introduce runtime boxed types for int/float/bool #16183
[FFI][RUNTIME] Introduce runtime boxed types for int/float/bool #16183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, love the direction we are going with dedicated boxed types. I made some quick comment. The changes wrt to PackedFunc should have benefit from more pairs of eyes to review as they are centeral to most FFI routines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent sometime to go through packed_func.h logic. Given there are some restructuring, this would need some more careful checks (as they impact everything).
include/tvm/runtime/packed_func.h
Outdated
return static_cast<double>(value_.v_int64); | ||
if (auto opt = TryAsBool()) { | ||
return opt.value(); | ||
} else if (auto opt = TryAsInt()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go in the order of
- TryAsFloat, TryAsInt, TryAsBool
Prioritize the most likely ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would any bool value also work as an int value? Maybe that's the reason for the ordering given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case order does not matter for correctness, mainly as an optimization as the original intended type as the most freq path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I just looked at how they work. Yeah, order doesn't matter then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, and reordered here and following conversions. For each conversion operator, it now checks the expected type first, and only falls back to allowed conversions if the expected type isn't present.
Would any bool value also work as an int value? Maybe that's the reason for the ordering given.
Nope, no specific reason for the ordering. These specific Try*
functions don't apply any conversions, so the order doesn't matter for correctness.
include/tvm/runtime/packed_func.h
Outdated
// Helper function to reduce duplication in the variable integer | ||
// conversions. This is publicly exposed, as it can be useful in | ||
// specializations of PackedFuncValueConverter. | ||
if (auto opt = FromBoxed<double>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling convention note: it would be simpler to require that TVMArgValue and TVMRetValue always NOT hold Boxed values. This would help to reduce the overall cost of the codegen function handling.
That does mean when we assign Boxed Object into TVMArg and TVMRet, they should be unboxed. I know that we can make an exception for bool for now until we introduce a specific bool POD code.
@@ -2129,6 +2315,42 @@ struct PackedFuncValueConverter<::tvm::runtime::String> { | |||
} | |||
}; | |||
|
|||
template <typename T> | |||
struct PackedFuncValueConverter<Array<T>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel automatic conversion here is a bit overkill. Recursive automatic conversion would be too much overhead whe passing through FFI. The developer should ensure the correct typings of the internal contained values
I know it may bring some convenience, but in this case I think we should not do it.
Suggestion: Remove the conversion logic from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need these conversions in order to preserve backwards compatibility with existing user-written code. Otherwise, a user-written function that accepts Array<PrimExpr>
could no longer be called with a python list [1,2,3]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding performance, I think we should be good there as well. The conversion is applied instead of the existing recursive check in ObjectTypeChecker<Array<T>>
, so no additional recursive walks. The mutation is implemented in terms of Array::Map
, which is copy-on-write, so no allocations occur if the user has provided the expected types (the common case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely check some of those cases, and update quite a few of them to Array<runtime::Int>
. But agree some might be useful especially for the mixed symbolic shape and static shape case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments below on optimizing for freq case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, and going through the comments now. I like the idea of updating the call sites over time, so that eventually the backwards compatibility conversion would only be triggered for externally-defined PackedFunc instances.
include/tvm/runtime/packed_func.h
Outdated
|
||
if constexpr (std::is_base_of_v<ContainerType, NDArray::ContainerType> || | ||
std::is_base_of_v<NDArray::ContainerType, ContainerType>) { | ||
if (ptr && (std::is_base_of_v<NDArray::ContainerType, ContainerType> || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice to keep the originals tructure of if (ptr != nullptr)
, they helps to keep the structure of the code, lift out a CSE and also makes the diff more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly restructured it in order to make sure the if constexpr
checks were on the outermost conditional, so that they would be guaranteed to be applied to the entire contents. I agree on the readability and repetition though, and will take a look to see if there's a way to make it more readable overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it look in the latest commit? I moved the repeated if (ptr)
structure out to an early return for if (ptr == nullptr)
, which preserves the if constexpr
structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i have a quick question, can we do the follows?
if (ptr != nullptr) {
if constexpr (condition) {
}
}
I feel it should still allow compiler to eliminate the content and would be equivalent. the main reason to put ptr != nullptr
first is that it is a more likely path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do, and updated. I mostly pulled it out to avoid having too many levels of nesting for ease of readability.
I also realized that the checks may not be intuitive to future maintainers, and added a comment describing which cases are handled at C++ compile-time and which are handled at runtime.
cc @junrushao |
Just highlight some of the main comments that are high level
|
Thank you!
Definitely agreed, and the more eyes the better. Given that this PR has the potential to break any use of the PackedFunc interface, the surface area is very broad.
I like this readability improvement, and will update accordingly
Agreed. Assuming I have it implemented correctly, this is handled in the The conversions that occur when reading from a |
e844ced
to
c214b76
Compare
include/tvm/runtime/packed_func.h
Outdated
// special handling for unwrapping boxed primitives, | ||
// PackedFunc, runtime::Module, etc, which should be checked | ||
// before delegating to the array element's | ||
// PackedFuncValueConverter implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not quite get what we are intending to handle, maybe state we would like to allow conversion to classes like PrimExpr, so would like to trigger that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I changed the comment to describe the use case that it enables.
|
||
return untyped_array.Map([](ObjectRef item) { | ||
TVMRetValue item_val; | ||
item_val = std::move(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have same quick short cut
if (auto* ptr = item->as<T::ContainerType>()) {
return GetRef<T>(ptr);
}
the fallback conversion can remain the same?
@@ -2129,6 +2315,42 @@ struct PackedFuncValueConverter<::tvm::runtime::String> { | |||
} | |||
}; | |||
|
|||
template <typename T> | |||
struct PackedFuncValueConverter<Array<T>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely check some of those cases, and update quite a few of them to Array<runtime::Int>
. But agree some might be useful especially for the mixed symbolic shape and static shape case
eaaea52
to
440568e
Compare
cd03dbc
to
9c19c12
Compare
4bda95c
to
b0bc242
Compare
After slowly working through all the unit tests and transforms that needed to be updated for this change, this PR is now ready for review! I've squashed the incremental commits on the PR branch that went into this PR, since the (very) long history of this dev branch wasn't that useful. (This should also prevent a recurrence of #16549, by having a shorter PR branch.) |
Prior to this commit, the `Array::Map` member function could only be applied to nullable object types. This was due to the internal use of `U()` as the default value for initializing the output `ArrayNode`, where `U` is the return type of the mapping function. This default constructor is only available for nullable types, and would result in a compile-time failure for non-nullable types. This commit replaces `U()` with `ObjectRef()` in `Array::Map`, removing this limitation. Since all items in the output array are overwritten before returning to the calling scope, initializing the output array with `ObjectRef()` does not violate type safety.
Prior to this commit, `int`, `float`, and `bool` arguments from Python were converted to `IntImm`, `FloatImm`, and `Bool`. These are subtypes of `PrimExpr`, and should only be used at compile-time. By automatically applying this conversion as part of the FFI, these types are required to be present whenever a primitive is converted to a `tvm::ObjectRef`. This can become especially fragile for an end-user when storing objects into a TVM container. Because TVM containers require all contents to be `ObjectRef` subclasses, an automatic conversion may be applied on storing into a container, resulting in an unexpected type being retrieved from the container. For example, this currently occurs in Relax when extracting a `R.Prim` from a `R.Tuple`. This commit introduces a `Box<T>` type for storage of boxed primitives at runtime, distinct from the IR types. * Primitive arguments provided to a PackedFunc that requires an `ObjectRef` will be converted to the corresponding boxed type. (e.g. Passing a Python `int` to a C++ function accepting `ObjectRef` produces a `Box<int64_t>`. * Boxed primitives provided to a PackedFunc that requires an unboxed primitive will be converted to the corresponding primitive. * PackedFunc return values of `ObjectRef` are converted to the corresponding primitive, if present. (e.g. If a `tuple_getitem` with static return type `ObjectRef` returns a `Box<int64_t>`, it will be unwrapped to a python `int`.) Together, these three rules provide backwards compatibility for existing PackedFunc definitions, while avoiding exposing the user to any container-induced type conversions betweeen primitive types and `ObjectRef`.
8f34004
to
bc7cdcc
Compare
@@ -207,6 +209,7 @@ typedef DLTensor* TVMArrayHandle; | |||
*/ | |||
typedef union { | |||
int64_t v_int64; | |||
bool v_bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely v_bool is not needed and we can simply store value in v_int64
, this can help to reduce the switch cases, and also make all value types as 64 bit, might help kernel passing handling to reduce cases as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, and implemented in #17240.
This is a follow-up to apache#16183, which added handling of boolean values in the TVM FFI. The initial implementation added both a new type code (`kTVMArgBool`) and a new `TVMValue::v_bool` variant. This commit removes the `TVMValue::v_bool` variant, since the `kTVMArgBool` type code is sufficient to handle boolean arguments. Removing the `TVMValue::v_bool` variant also makes all `TVMValue` variants be 64-bit (assuming a 64-bit CPU). This can simplify debugging in some cases, since it prevents partial values from inactive variants from being present in memory.
This is a follow-up PR to apache#16183, which updated the FFI with explicit integer types. As part of that change, many internal functions were updated to accept non-IR types (e.g. `Array<runtime::Int>` instead of `Array<IntImm>`). For backwards compatibility with callees that provided the IR types, a specialization of `PackedFuncValueConverter` unwrapped the `IntImm` into a `runtime::Int`. This commit removes the backwards-compatibility specialization of `PackedFuncValueConverter`. Breakages that are found in CI as a result will then be updated at the caller side, removing the need for the backwards-compatibility handler altogether.
We just find out some a perf regression introduced by this PR, specifically, during LLM decode function calling overhead(before the first kernel launch) goes up to 1.4 ms. The likely cause is due to increased PackedFunc calling overhead in general. Likely this is due to some of the automatic conversion logic introduced that increased the overhead per PackedFunc call. Such overhead can impact the perf of downstream to optimize for low latency. As a temp measure, I am going to first revert this PR. The changes of the PR(boxed types and bool) are valuable. Given the smart auto conversion might introduce extra overhead and the regression we see. I think it is helpful to isolate the PR into two pieces, with one that introduces the boxed types/bool, and another one that introduce a possiblity more conservative version of automatic conversion. Thanks @Lunderberg for great effort and sorry for the extra trouble. As PackedFunc call gets into the center of our runtime execution, being able to reduce the execution time is now become an important topic, so we need to be more careful balancing the runtime logics |
Thank you, and I'm looking into where the performance overhead comes from. Unfortunately, I'm not sure if the de-coupling of the boxed types from the automatic conversions is possible.
Breaking any one of these three steps would avoid the error, but each comes with downsides:
|
Looking into the origin of the performance difference, since it may just result from an increased number of FFI calls, and (maybe) could be resolved without removing the automatic conversions. |
maybe it is not too bad to do manual conversion to the array of ir types in python wrapper then leave ffi part lightweight (per your sugggsted option 3) the objectref stored target attribute are mostly intended as boxed int rather than Intimm so we should be fine |
Here is an example, the number of weights are large enough to see the difference (1.2ms vs 6.2ms) https://gist.github.com/vinx13/ea7a8c785d8d0d5ae5318e8ace085db2 |
…ol (apache#16183)" This reverts commit 5f22be4.
Thank you for the example, @vinx13, and I can reproduce the difference with and without the FFI changes. My numbers roughly agree with yours (1.1 ms vs 5.2 ms), so I can start narrowing down on the cause. |
FOUND IT! The root cause was in the type validation, but ended up being pretty easily solveable. When converting from the untyped In the test case, this performance overhead largely occurred during the calls to With this addition, the FFI update went from being 5x slower than baseline (increase from 1 ms to 5 ms) to being 30% faster than baseline (decrease from 1ms to 700 us). I'll make a PR shortly that re-applied the FFI changes, with the additional fix. |
Initially introduced in apache#16183, these changes were reverted in apache#17252 due to performance degredation in some Relax models. This could occur when a model contained a large number of calls to `"vm.builtin.tuple_getitem"`, which may occur when model weights are provided as a tuple. This PR re-applies the changes from apache#16183, but with the performance degredation resolved. The root cause was unnecessary type-checking when converting from an untyped `tvm::ArrayNode*` to the typed `tvm::Array<T>`, in the case where `T` is `ObjectRef`.
…ol (apache#16183)" This reverts commit 5f22be4.
* Revert "Revert "[FFI][RUNTIME] Introduce runtime boxed types for int/float/bool" (#17252)" This reverts commit 11be832. * [FFI] Re-introduce the boxed primitive values Initially introduced in #16183, these changes were reverted in #17252 due to performance degredation in some Relax models. This could occur when a model contained a large number of calls to `"vm.builtin.tuple_getitem"`, which may occur when model weights are provided as a tuple. This PR re-applies the changes from #16183, but with the performance degredation resolved. The root cause was unnecessary type-checking when converting from an untyped `tvm::ArrayNode*` to the typed `tvm::Array<T>`, in the case where `T` is `ObjectRef`. * Correct typo from T to U
This is a follow-up to apache#16183, which added handling of boolean values in the TVM FFI. The initial implementation added both a new type code (`kTVMArgBool`) and a new `TVMValue::v_bool` variant. This commit removes the `TVMValue::v_bool` variant, since the `kTVMArgBool` type code is sufficient to handle boolean arguments. Removing the `TVMValue::v_bool` variant also makes all `TVMValue` variants be 64-bit (assuming a 64-bit CPU). This can simplify debugging in some cases, since it prevents partial values from inactive variants from being present in memory.
) * [FFI][Runtime] Use TVMValue::v_int64 to represent boolean values This is a follow-up to #16183, which added handling of boolean values in the TVM FFI. The initial implementation added both a new type code (`kTVMArgBool`) and a new `TVMValue::v_bool` variant. This commit removes the `TVMValue::v_bool` variant, since the `kTVMArgBool` type code is sufficient to handle boolean arguments. Removing the `TVMValue::v_bool` variant also makes all `TVMValue` variants be 64-bit (assuming a 64-bit CPU). This can simplify debugging in some cases, since it prevents partial values from inactive variants from being present in memory. * Update MakePackedAPI, less special handling required for boolean
This is a follow-up PR to apache#16183, which updated the FFI with explicit integer types. As part of that change, many internal functions were updated to accept non-IR types (e.g. `Array<runtime::Int>` instead of `Array<IntImm>`). For backwards compatibility with callees that provided the IR types, a specialization of `PackedFuncValueConverter` unwrapped the `IntImm` into a `runtime::Int`. This commit removes the backwards-compatibility specialization of `PackedFuncValueConverter`. Breakages that are found in CI as a result will then be updated at the caller side, removing the need for the backwards-compatibility handler altogether.
This is a follow-up PR to apache#16183, which updated the FFI with explicit integer types. As part of that change, many internal functions were updated to accept non-IR types (e.g. `Array<runtime::Int>` instead of `Array<IntImm>`). For backwards compatibility with callees that provided the IR types, a specialization of `PackedFuncValueConverter` unwrapped the `IntImm` into a `runtime::Int`. This commit removes the backwards-compatibility specialization of `PackedFuncValueConverter`. Breakages that are found in CI as a result will then be updated at the caller side, removing the need for the backwards-compatibility handler altogether.
Prior to this commit,
int
,float
, andbool
arguments from Python were converted toIntImm
,FloatImm
, andBool
. These are subtypes ofPrimExpr
, and should only be used at compile-time. By automatically applying this conversion as part of the FFI, these types are required to be present whenever a primitive is converted to atvm::ObjectRef
.This can become especially fragile for an end-user when storing objects into a TVM container. Because TVM containers require all contents to be
ObjectRef
subclasses, an automatic conversion may be applied on storing into a container, resulting in an unexpected type being retrieved from the container. For example, this currently occurs in Relax when extracting aR.Prim
from aR.Tuple
.This commit introduces a
Box<T>
type for storage of boxed primitives at runtime, distinct from the IR types. This was based on discussion on #15983, which would have further cemented theIntImm
type inlibtvm_runtime.so
, rather than starting the process of separating it.Primitive arguments provided to a PackedFunc that requires an
ObjectRef
will be converted to the corresponding boxed type. (e.g. Passing a Pythonint
to a C++ function acceptingObjectRef
produces aBox<int64_t>
.Boxed primitives provided to a PackedFunc that requires an unboxed primitive will be converted to the corresponding primitive.
PackedFunc return values of
ObjectRef
are converted to the corresponding primitive, if present. (e.g. If atuple_getitem
with static return typeObjectRef
returns aBox<int64_t>
, it will be unwrapped to a pythonint
.)Together, these three rules provide backwards compatibility for existing PackedFunc definitions, while avoiding exposing the user to any container-induced type conversions betweeen primitive types and
ObjectRef
.