[ROCm] Streamline bazel targets for rocm libraries#803
Merged
hsharsha merged 4 commits intoMay 12, 2026
Conversation
316df98 to
8ccfe8f
Compare
c6811c5 to
2ecbb02
Compare
2ecbb02 to
4de45c7
Compare
Collaborator
|
Please add this that was needed to build and link TF shared library. |
hsharsha
requested changes
May 12, 2026
Collaborator
hsharsha
left a comment
There was a problem hiding this comment.
Please add this that was needed to build and link TF shared library.
diff --git a/third_party/xla/third_party/gpus/rocm/build_defs.bzl.tpl b/third_party/xla/third_party/gpus/rocm/build_defs.bzl.tpl
index 3cd2a18e920..59883e60cf7 100644
--- a/third_party/xla/third_party/gpus/rocm/build_defs.bzl.tpl
+++ b/third_party/xla/third_party/gpus/rocm/build_defs.bzl.tpl
@@ -88,8 +88,7 @@ def get_rbe_amdgpu_pool(is_single_gpu = False):
def rocm_lib_import(name, interface_library, data, deps):
cc_import(
name = name + "_interface",
- interface_library = interface_library,
- system_provided = True,
+ shared_library = interface_library,
visibility = ["//visibility:private"],
)
github-actions Bot
pushed a commit
that referenced
this pull request
May 18, 2026
Commit history for google/XNNPACK (bccfe733 -> ace56b61): - b3e7d5a1 mohammadmseet-hue: Fix stack buffer overflows in NCHW reduce rewrite and ynnpack shim functions - 1dfa3da5 mohammadmseet-hue: Address review: return error instead of clamping, use YNN_LOG_ERROR - 0cd97f2f Ken Unger: add rvv support for f16-vcmul - 1f8d093a Ken Unger: add rvv support for f16-vcmul - d4474df4 velonica0: rvv-f16-activation - e36eb021 Ken Unger: add fp16 rvv kernels for vsin,vcos,vexp - d3121ea9 velonica0: [RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu - ae231328 velonica0: Alphabetize RVV elementwise entries in cmake/bzl lists - 0b6f61af velonica0: fix cmake bug - 803079cd Gregory Comer: Add AVX512 f32<->bf16 vcvt kernels - 80303ee9 Gregory Comer: Add native AVX512_BF16 f32->bf16 vcvt kernel - b0a078ab Volodymyr Kysenko: Add benchmarks for int4x2/int2x4 to int8_t conversions. - aca142ca Dillon Sharlet: Decide whether to constant fold heuristically - 172b7f4a Marie White: Add arm_neonbf16 binary kernels - b7ebadee MarkLee131: Reject qpint8 in xnn_define_dynamically_quantized_tensor_value - 524c06d2 MarkLee131: Detect size_t overflow in get_tensor_size and reject the tensor - 0d406ba6 Volodymyr Kysenko: Add 2-bit and 4-bit interleave kernels. - 3b773ce1 Quentin Khan: Don't produce an op when `Cast` is casting to the same type as the input type. - 16acef83 Volodymyr Kysenko: Fix typo in the comment. - c8ec3d63 Misha Gutman: Added dynamic_b support for qdu8-f32-qc8w operator. - 1c7554a8 Dillon Sharlet: Remove unnecessary assert - bf188a3d Nicolas Pitre: Add Zephyr RTOS (Generic) platform support - 55d4036f Frank Barchard: Increase tolerance for SUBGRAPH_FP16.fully_connected_qd8_f16_qc8w test to account for numerical deviation. - 8e06513a XNNPACK Team: Merge pull request openxla#10060 from npitre:zephyr-support-pr - 6a9a9c50 Dillon Sharlet: Disable AMX kernels if msan is enabled - a9cb6cb9 XNNPACK Team: Merge pull request openxla#10023 from GregoryComer:bf16-f32-vcvt-avx512 - cde7d935 Marie White: Improve tile sizes for arm_neonbf16 kernels. Tuned with AI agents. - fadd2dbe XNNPACK Team: Merge pull request openxla#9986 from velonica0:rvv-f16-elementwise - 4ca5fb8d Dillon: Merge branch 'master' into f16-unary-trig-rvv - b2985573 Quentin Khan: Add wrappers for storage type of 2/4 bit int and 16 bit floats. - 9c70eb91 Quentin Khan: Add reverse data type to native type mapping. - 2c52c9f Quentin Khan: Add a conversion function to be able to specialize buffer copy from a sequence. - c817561c Quentin Khan: Move declaration of `NativeStorage` and clarify comment of `StorageImpl`. - 7b375800 MarkLee131: Clarify qpint8 rejection wording - 1437d94b MarkLee131: Use xnn_safe_mul/xnn_safe_add in get_tensor_size - f074438f MarkLee131: Split xnn_safe_mul/xnn_safe_add into separate statements - 64081049 Dillon Sharlet: Resubmit openxla#10069 - 56496fd6 Dillon Sharlet: Add int32 sum kernels - bbc68d90 XNNPACK Team: Merge pull request openxla#9963 from velonica0:rvv-elementwise - 51759bd4 XNNPACK Team: Merge pull request openxla#10102 from MarkLee131:fix/integer-overflow-tensor-size - 562e5274 Dillon Sharlet: Refactor `make_schedule` to allow building just the loop splits, and not a whole `scheduling_info` - 8e4e9d5b Dillon Sharlet: Change reduce to make the identity buffer in slinky, instead of in the subgraph - 64d21ff8 Ken Unger: handle unconfigured f16-vcmul kernel - 834051a2 XNNPACK Team: Merge pull request openxla#10101 from MarkLee131:fix/qpint8-null-deref - b3a5d44f Ken Unger: merge master - 8b3bda45 Ken Unger: update-microkernels - e2da1edb Frank Barchard: Add f16_wasmrelaxedsimd SIMD headers - 5aa5d64e Quentin Khan: Add a parallel lib to `utils:matchers` for internal targets that are only compiled with OSS. - b2f46c0c Quentin Khan: Add a matcher to to check whether two graph are isomorphic. - f81e3eda Volodymyr Kysenko: Support channelwise zero points in YNNPACK quantized dot products. - 4a318ee8 Frank Barchard: Add portable SIMD template for f16-vsqrt - 4780ab70 Frank Barchard: Run generator to create rvv kernels - 3659dcf2 Ken Unger: merge master - f589c63c Jonathan Clohessy: Update CMakeLists.txt to match SME defaults from bazel - 6833e630 Dillon: Merge branch 'master' into f16-unary-trig-rvv - 26c61a7a XNNPACK Team: Merge pull request openxla#9989 from ken-unger:f16-unary-trig-rvv - b0328fc2 Frank Barchard: Fix WAsm typo in XNNPACK by renaming to Wasm - 04b67752 Dillon Sharlet: Refactor tolerance calculations - 8c2df4d5 Dillon Sharlet: Parallelize reductions in YNNPACK - e9de2685 Dillon Sharlet: Add reference kernels for fp64 elementwise ops - a9390e5a Dillon Sharlet: Fix hexagon build - a3da013b Dillon Sharlet: Add benchmark coverage of reference fp64 elementwise ops - 8a3902dd Dillon Sharlet: Add optimized kernels for fp64 elementwise ops - c8c86398 Dillon Sharlet: Add fp64 fma rules to elementwise compiler - 807d9f9c Alexander Shaposhnikov: Introduce XNN_NO_SANITIZE_FUNCTION macro. - 8e406b86 Dillon Sharlet: Loosen tolerances for dequantize_dot test - bb6c6a48 Misha Gutman: Added convert from qint8 to qcint8. - 689c5c60 Misha Gutman: Removed convert qint8 to qcint8 tests from ynnpack test set. - a3664b21 Dillon Sharlet: Avoid capturing kernel in reduce ops - 0fc9e7e7 Volodymyr Kysenko: Disable subgraph_matcher_test when use_ynnpack is enabled. - b5bc455b Dillon Sharlet: Enable adding and removing dimensions via static_transpose - 6dfbf304 Frank Barchard: Optimize xnn_round_f32 for Hexagon HVX. - 52d94589 XNNPACK Team: Merge pull request openxla#9851 from mohammadmseet-hue:fix/nchw-reduce-overflow-and-shim-bounds - 6bd50499 Misha Gutman: Fixed the crash due to unaligned read. - 94ce3bb6 Volodymyr Kysenko: Refactor extent handling in YNNPACK subgraph. - cceae52c Dillon Sharlet: Always constant fold pack_b ops - e0729a7c Dillon Sharlet: Add assert to catch infinite loop case - 50b01640 Frank Barchard: Fix Hexagon HVX build failure 'sf type used as qf32' on Clang 19 - b3daaef9 Dillon Sharlet: Enable sum(squared(x)) => sum_squared(x) for fp64 - 8f17e0c0 Dillon Sharlet: Relax tolerances of dequantize_dot more - a493bbeb Dillon Sharlet: Add missing benchmark - 95103d5b Frank Barchard: Enable f16 vsqrt wasmrelaxedsimd kernel and scalar fallbacks - d830cd16 Volodymyr Kysenko: Rewrite reduce(static_transpose(x)) into reduce(x) - 7b1bde34 Dillon Sharlet: Remove ternary multiply for purely float types - a571a74b Dillon Sharlet: Add tolerance for quantized int8 operations that may round differently - 58698bd6 Dillon Sharlet: Add `exp2_round` simd helper - ece55c6e Dillon Sharlet: Add rewrite for `sum(a*b)` => `dot(a, b)` where appropriate - fc7f8975 Frederic Rechtenstein: Fix alignment-related crash on AVX512 - 58a233a4 XNNPACK Team: Merge pull request openxla#10167 from JonathanC-ARM:jonclo01/sync_bazel_cmake_defaults - 778408a8 Dillon Sharlet: Add exp_fp64 kernels - 16c63a38 Volodymyr Kysenko: Add benchmarks for fully connected with QC4W and QC2W weights. - 62f1d600 Misha Gutman: Added rewrite `bmm(a:f32, dequant(b:qint8):f32) -> f32` into - f6cf463c Volodymyr Kysenko: Disable BatchMatrixMultiplyDequantBmmRewrite test under ynnpack. - 5660b4b0 Dillon Sharlet: Implement `static_expand_dims` using `static_transpose` - 11e206b8 XNNPACK Team: Implement `static_expand_dims` using `static_transpose` - 8e4e78fd Quentin Khan: Don't use `graph::Tensor` in the XNNPack lowering interface. - b12ed13b Quentin Khan: Fix memory outdated planning optimization invalidated by reshapes. - c3d8c276 Misha Gutman: Disabled bmm rewrite by default as gemma4 fails precision. - fb152529 Volodymyr Kysenko: Rename QD8F32QC8W benchmark to QD8F32QC8WFullyConnected for consistency. - d8f5abe9 Dillon Sharlet: Rename svcnt => svcnts - 445e613a Dillon Sharlet: Fix spurious debug messages about sum(a*b) -> dot(a, b) rewrites - 48e1d0f0 Dillon Sharlet: Add test coverage of static and dynamic shapes - be45bb35 Dillon Sharlet: Add more test coverage for reduce operators - d48bc34c Dillon Sharlet: Add support for rewriting `sum(a*b, init_c)` => `dot(a, b, init_c)` - 4908d191 Marie White: Fix get_dot_kernel type bug - 84aa6a95 Dillon Sharlet: Move gemm, conv shapes hardcoded in benchmarks to text files - c5c413de Richard Townsend: [gn] Update DEPS - d877e1a1 Dillon Sharlet: Fix warning "unexpected tokens following preprocessor directive - expected a newline" - dbf04022 Volodymyr Kysenko: Fix handling of sub-byte types in packer. - 5039d217 Dillon Sharlet: Fix unsimplified slice extents - 6c8ac561 Frank Barchard: F16-VTANH for avx512, wasm and scalar - 3245ce20 Frank Barchard: Enable f16 vsin and vcos wasmrelaxedsimd kernel and scalar fallbacks - 091b9be6 XNNPACK Team: Enable f16 vsin and vcos wasmrelaxedsimd kernel and scalar fallbacks - 99e4485d Dillon Sharlet: Add `horizontal_sum` for floating point types - f919d369 Quentin Khan: Don't call optimize in fp16 rewrite tests. - c723a993 Quentin Khan: Prepare static_reduce test for upcoming fp16 to fp32 rewrite. - 0a27dcf1 Frank Barchard: Enable f16 vsin and vcos wasmrelaxedsimd kernel and scalar fallbacks - f43db489 Dillon Sharlet: Fix loss of precision for fp64 constants - 6e50ae9f Dillon Sharlet: Fix reshape -> slice pattern - 74daa88a Dillon Sharlet: Use internal define_static_expand_dims in define_dot - 28ef957f Dillon Sharlet: Disable sum(a*b) => dot(a, b) rewrite if there are no broadcast dimensions on either side - 016914cb Richard Townsend: [gn] Add pthreadpool for the Chromium config - 25d15607 Volodymyr Kysenko: Fix store in the tail of transpose kernels for sub-byte types. - 5d007c4c Volodymyr Kysenko: Make reference int2/int4 convert work with unaligned n. - 713c3b72 Dillon Sharlet: Require reshape strides to be the shape we need too - 2e6e343b Dillon Sharlet: Rewrite reduce kernels to optimize for numerical behavior - 73c5abb5 Marie White: Fix bug in `get_max_concurrency`. - 1dbb15fc Marie White: Fix fully-connected DynamicB tests to work with QP8. - cf96f77e Marie White: Fix fully-connected DynamicB tests to work with QP8. - 7829cd69 Quentin Khan: Move row sum rewrite to after other optimization rewrites. - 2d16035f Dillon Sharlet: Fix bugs with reduce fusion - 0b66c9f1 Dillon Sharlet: Fix slice bugs - 768003bd Marie White: Fix rank pollution in channelwise quantized scales for YNNPACK. - 26f5c9e1 Marie White: Fix logical extent calculation during constant folding for sub-byte types. - bb971d4 Dillon Sharlet: Refactor the implementation of `remove_static_broadcast_from_elementwise` - 860a6421 XNNPACK Team: Fix rank pollution in channelwise quantized scales for YNNPACK. - f3513194 Frank Barchard: Add rules for updating copyright for new files and removing trailing spaces on blank lines - 5dba5dad Dillon Sharlet: Improve static_slice test coverage - f569d17b Ken Unger: merge master - 12f71cd4 XNNPACK Team: Merge pull request openxla#9971 from ken-unger:f16-vcmul-rvv - 2466b8c2 Dillon Sharlet: Update deps to get bug fixes - cc278f5c Dillon Sharlet: Add support for strides to static_slice - 53007d69 Dillon Sharlet: Add YNN_FLAG_NO_EXCESS_PRECISION - fe166973 Dillon Sharlet: Disable static_slice test until slinky bug is fixed - 4fad5b39 Dillon Sharlet: Disable static_slice test until slinky bug is fixed - d72fa85c Dillon Sharlet: Improve log_fp32 kernels - 95ee916a Dillon Sharlet: Use a better unroll factor for log2_fp32_sse2 - 9ab80cd6 Volodymyr Kysenko: Allow adding function own loops even if some of its non-trivial loops has been already fused. - 11fb8859 Dillon Sharlet: Implement round to nearest even for float -> bf16 conversions - 49e266f7 Volodymyr Kysenko: Add optimized convert int2/int4 to int8 kernels. - ace56b61 Dillon Sharlet: Improve `exp` kernel accuracy and correctness - 34c80155 Volodymyr Kysenko: Make sure partial reduction splits match the loop step. - 7bf9c692 Frank Barchard: Fix ambiguous std::isfinite, std::abs, and std::fpclassify calls for _Float16 in test framework by explicitly casting to float. - c3ac56a5 Quentin Khan: Add subgraph matcher target to `BUILD.gn`. - 1c292bfc Richard Townsend: [gn] Test building AVX512 - 8da42ae2 Gerardo Carranza: Add support for log fp16 in XNNPACK. - 1052f90b Richard Townsend: [gn] Add support for building/testing AArch32 - 01db6e14 Dillon Sharlet: Fix possible infinite recursion in convert - f1fe9b5c Dillon Sharlet: Only rewrite reduce(convert(x)) if we have a kernel for that reduction type. - 98c8ded4 Dillon Sharlet: Polynomial approximation improvements for `exp` and `log` Commit history for dsharlet/slinky (1032be67 -> eb004cb3): - 63c773f3 Dillon: Simplify `make_buffer` with new broadcast dimensions to `transpose` (#802) - 66efc5ef Dillon: Fix `can_fuse` for broadcast dimensions (#803) - 6fcfed78 Dillon: Fix more instances of `fold_factor` that should have been changed to `stride` after #802 (#806) - 2af0a012 Dillon: Remove unnecessary branches for the rank of buffers when accessing dims (#807) - 70b443b7 Dillon: Add fast path to `for_each_element` for rank 0 buffers (#805) - dea32175 Dillon: Remove extent 1 dimensions in `optimize_dims` (#797) - 7e02995b Dillon: Fix out of bounds vector access when simplifying nested transpose ops (#808) - 9140d8ac Dillon: Change drop-loops to keep the loop but rewrite the extent (#809) - f3ab7b63 Dillon: Fix aliases that use buffer bounds before they are defined (#810) - 7bc45e1f Dillon: Add support for `slice_buffer`, `slice_dim`, and `transpose` in `alias_copies` (#811) - 0335d87e Dillon: Cast object instead of function pointer (#812) - 27f5d9d9 Dillon: Fix externally defined fold factors (#813) - c08ef409 Dillon: Fix copy aliasing for copies that remove dimensions (#814) - 284794e8 Dillon: Fix bugs uncovered by copying from a rank > 0 buffer to a scalar (#815) - 56f8638a Dillon: Fix crop simplification bug (#816) - 0fbea044 Dillon: Fix simplify of nested transposes (#817) - c01931be Dillon: Fix a straggler usage of `op->dims` => `dims` (#818) - eb004cb3 Dillon: Fix strided copies (#819) PiperOrigin-RevId: 917405521
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Remove DsoLoader indirection and directly link to rocm libs