merge main into amd-staging #671

ronlieb · 2025-11-24T21:06:51Z

No description provided.

…67885) Replaces the default "Success" std::error_code with a more meaningful one if `Magic != file_magic::pdb`.

``` --------------------------------------------------- Benchmark old new --------------------------------------------------- BM_num_get<bool> 86.5 ns 32.3 ns BM_num_get<long> 82.1 ns 30.3 ns BM_num_get<long long> 85.2 ns 33.4 ns BM_num_get<unsigned short> 85.3 ns 31.2 ns BM_num_get<unsigned int> 84.2 ns 31.1 ns BM_num_get<unsigned long> 83.6 ns 31.9 ns BM_num_get<unsigned long long> 87.7 ns 31.5 ns BM_num_get<float> 116 ns 114 ns BM_num_get<double> 114 ns 114 ns BM_num_get<long double> 113 ns 114 ns BM_num_get<void*> 151 ns 144 ns ``` This patch applies multiple optimizations: - Stages two and three of do_get are merged and a custom integer parser has been implemented This avoids allocations, removes the need for strto{,u}ll and avoids __stage2_int_loop (avoiding extra writes to memory) - std::find has been replaced with __atoms_offset, which uses vector instructions to look for a character Fixes llvm#158100 Fixes llvm#158102

…alescing SUBREG_TO_REG" A SUBREG_TO_REG instruction expresses that the top bits of the result register are set to a certain value (e.g. 0). The example below expresses that the result of %1 will have the top 32 bits zeroed and the lower 32bits being equal to the result of INSTR. ``` %0:gpr32 = INSTR %1:gpr64 = SUBREG_TO_REG 0, %0, sub32 ``` When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by coalescing %0 into %1, it must keep the same semantics. Currently however, the RegisterCoalescer would emit: ``` %1.sub32:gpr64 = INSTR ``` which no longer expresses that the top 32-bits of the register are defined (zeroed) by INSTR. This may cause issues with e.g. machine copy propagation where the pass may think it can remove a COPY-like instruction because the MIR says only the bottom 32-bits are defined/used, even though other uses of the register rely on the top 32-bits being zeroed by the COPY-like instruction. This PR changes the RegisterCoalescer to instead emit: ``` undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1 ``` to express that the entire contents of %1:gpr64 are defined by the instruction. This tries to reland llvm#134408 which had to be reverted due to a few reported failures.

Reverts llvm#169318 Our builders are back online. I see them picking up existing jobs.

…9350) This is identical to 'copy' and 'copyin', except it uses 'create' and 'copyout' as its entry/exit op. This patch adds the same tests, and similar code for all of it.

…m#169305) The `#warning` causes diagnostics if system headers include deprecated headers. llvm#168041 will add a way to deprecated headers properly, which then also interacts nicely with system header suppression.

…9351) Update getAllocTokenModeFromString() to recognize "default" as a valid mode string, mapping it to `DefaultAllocTokenMode`.

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

…69356) This one is another that is effectively identical to copy, copyin, and copyout, except its entry/exit ops pair is create/delete.

llvm#162458) The inliner uses DirectSP to check if a function has instructions that modify the SP. Exceptions are stack Push and Pop instructions. We can also allow pointer signing and authenticating instructions. The inliner removes the Return instructions from the inlined functions. If it is a fused pointer-authentication-and-return (e.g. RETAA), we have to generate a new authentication instruction.

**Problem** In Rust, checked math functions (like `checked_mul`, `overflowing_mul`, `saturating_mul`) are part of the primitive implementation of integers ([see u64](https://doc.rust-lang.org/std/primitive.u64.html) for instance). The Rust compiler builds the Rust [compiler-builtins](https://github.com/rust-lang/compiler-builtins) crate as a step in the compilation processes, since it contains the math builtins to be lowered in the target. For BPF, however, when using those functions in Rust we hit the following errors: ``` ERROR llvm: <unknown>:0:0: in function func i64 (i64, i64): A call to built-in function '__multi3' is not supported. ERROR llvm: <unknown>:0:0: in function func i64 (i64, i64): only small returns supported ``` Those errors come from the following code: ``` pub fn func(a: u64, b: u64) -> u64 { a.saturating_mul(b) } ``` Those functions invoke underneath the llvm instrinc `{ i64, i1 } @llvm.umul.with.overflow.i64(i64, i64)` or its variants. It is very useful to use safe math operations when writing BPF code in Rust, and I would like to add support for those in the target. **Changes** 1. Create a target feature `allow-builtin-calls` to enable code generation for builtin functions. 2. Implement `CanLowerReturn` to fix the error `only small returns supported`. 3. Add code to correctly invoke lib functions. 4. Add a test case together with the corresponding C code.

This patch fixes a minor typo in the **Kaleidoscope tutorial (Chapter 2)**. The sentence: “checks to see if **if** is too low” has been corrected to: “checks to see if **it** is too low”. This is a documentation-only change and does not affect any semantic behavior or code generation. Thank you for maintaining the tutorial, and please let me know if any further adjustments are needed.

This adds some trivial handling to force emitting of child decls inside C++ records.

…dialect (llvm#169194) Enables `amdgcn.named.barrier` target extension type as a global variable type in MLIR.

…m#168134) Make sure the CoroSplitter pass correctly handles `#dbg_declare_value` intrinsics. Which means, it should identify them, and convert them to `#dbg_declares` so that any subsequent passes do not need to be amended to support the `#dbg_declare_value` intrinsic. More information here: https://discourse.llvm.org/t/rfc-introduce-new-llvm-dbg-coroframe-entry-intrinsic/88269 This patch is the second and last in a stack of patches, with the one preceding it being: llvm#168132

…lvm#169364) This is to resolve a regression caused by llvm#168534. Now when we have an anonymous object like a struct or union that has a typedef attached, we print the typedef name instead of listing it as anonymous.

This patch adds Mustache HTML tests alongside the legacy HTML backend for namespace output. This way, we can see exactly where the output currently differs before replacing the legacy backend. The same thing will be done for all other tests where the legacy HTML backend is tested.

Changes: Fix a missed update to WidenGEP::usesFirstLaneOnly, and include reduced-case test that was previously hitting the new assert: the underlying reason was that VPWidenGEP::usesScalars was too weak, and the single-scalar WidenGEP was not narrowed by narrowToSingleScalarRecipes. This allows us to strip a special case in VPWidenGEP::execute.

In preparation to extend the work done by dfa665f ([VPlan] Add transformation to narrow interleave groups) to make the narrowing more powerful, pre-commit a test case from llvm#128062.

llvm#169114) The `acc.present` Op as generated by ACCImplicitData does not provide a way to differentiate between `acc.present` ops that are generated implicitly and the ones that are generated as result of an explicit `default(present)` clause in the source code. This differentiation would allow for better communication to the user on the decisions made by the compiler while managing data automatically between the host and the device. This commit adds this information as a discardable attribute on the `acc.present` op.

… (llvm#169055) This reverts commit b83e458. Also undo the use of namespace qualifier for `ReducePassList` as that seems to cause build failures.

…9381) Just like the last handful of patches that did copy, copyin, copyout, create, etc, this patch has the exact same behavior, except the entry op is a present, and the exit is delete.

…able downcasting (llvm#169383)

…bute (llvm#169259) This patch adds a case to `CheckMultiVersionAdditionalDecl()` that detects redeclarations of `target_clones` functions which omit the attribute, and makes sure they are marked as redeclarations. It also updates the comment at the call site of `CheckMultiVersionAdditionalDecl()` to reflect this. Previously, `target_clones` multiversioned functions that omitted the attribute from subsequent declarations would cause Clang to hit an `llvm_unreachable` and crash. In the following example, the second declaration (the function definition) should inherit the `target_clones` attribute from the first declaration (the forward declaration): ``` __attribute__((target_clones("arch=atom", "default"))) void foo(void); void foo(void) { /* ... */ } ``` However, `CheckMultiVersionAdditionalDecl()` was not recognizing the function definition as a redeclaration of the forward declaration, which prevented `Sema::MergeFunctionDecl()` from automatically inheriting the attribute. A side effect of this fix is that Clang now catches redeclarations of `target_clones` functions that have conflicting types, which previously caused Clang to crash by hitting that same `llvm_unreachable`. The `bad_overload1` case in `clang/test/Sema/attr-target-clones.c` has been updated to reflect this. Fixes llvm#165517 Fixes llvm#129483

…GEP chains (llvm#168096) - The DXIL data scalarizer only needs to change vectors into arrays. It does not need to change the types of GEPs to match the pointer type. This PR simplifies the `visitGetElementPtrInst` method to do just that while also accounting for nested GEPs from ConstantExprs. (Before this PR, there were still vector types lingering in nested GEPs with ConstantExprs.) - The `equivalentArrayTypeFromVector` function was awkwardly placed near the top of the file and away from the other helper functions. The function is now moved next to the other helper functions. - Removed an unnecessary `||` condition from `isVectorOrArrayOfVectors` Related tests have also been cleaned up, and the test CHECKs have been modified to account for the new simplified behavior.

…lvm#169389) Just like the last handful of clauses, this is a pretty simple one, doing device_resident (Entry op: declare_device_resident, and exit: delete). This should be the last of the 'local' declare patches.

…into .debug_names (llvm#168513) Depends on: * llvm#168895 Note, the last commit is the one with the actual fix. The others are drive-by/test changes We've been seeing dsymutil verification failures like: ``` error: Name Index @ 0x0: Entry @ 0x11949d: mismatched Name of DIE @ 0x9c644c: index - apply<(lambda at /some/build/dir/lib/LLVMSupport/include/llvm/Support/Error.h:1070:35)>; debug_info - apply<(lambda at /some/build/dir/lib/LLVMCustom/include/llvm/Support/Error.h:1070:35)> apply, _ZN11custom_llvm18ErrorHandlerTraitsIRFvRNS_13ErrorInfoBaseEEE5applyIZNS_12consumeErrorENS_5ErrorEEUlRKS1_E_EES7_OT_NSt3__110unique_ptrIS1_NSD_14default_deleteIS1_EEEE. ``` Not how the name of the DIE has a different lambda path than the one that was used to insert the DIE into debug_names. The root cause of the issue is that we have a DW_AT_subprogram definition whose DW_AT_specification DIE got deduplicated. But the DW_AT_name of the original specification is different than the one it got uniqued to. That’s technically fine because dsymutil uniques by linkage name, which uniquely identifies any function with non-internal linkage. But we insert the definition DIE into the debug-names table using the DW_AT_name of the original specification (we call `getDIENames(InputDIE…)`). But what we really want to do is use the name of the adjusted `DW_AT_specifcation` (i.e., the `DW_AT_specification` of the output DIE). That’s not as simple as it sounds because we can’t just get ahold of the DIE in the output CU. We have to grab the ODR `DeclContext` of the input DIE’s specification. That is the only link back to the canonical specification DIE. For that to be of any use, we have to stash the `DW_AT_name` into `DeclContext` so we can use it in `getDIENames`. We have to account for the possibility of multiple levels of `DW_AT_specification`/`DW_AT_abstract_origin`. So my proposed solution is to recursively scan the referenced DIE’s, grab the canonical DIE for those and get the name from the `DeclContext` (if none exists then use the `DW_AT_name` of the DIE itself). One remaining question is whether we need to handle the case where a DIE has a `DW_AT_specification` *and* a `DW_AT_abstract_origin`? That complicates the way we locate `DW_AT_name`. We'd have to adjust `getCanonicalDIEName` to handle this. But it's not clear what a `DW_AT_name` would be for such cases. Worst case at the moment we take the wrong path up the specifications and don't find any `DW_AT_name`, and don't end up indexing that DIE. Something to keep an eye out for. rdar://149239553

…llvm#160899) The `AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue` method accepts two arguments, `bool ShouldTrap` and `const MCSymbol *OnFailure`, that control the behavior of the emitted instruction sequence when the check fails: * `ShouldTrap` requests an error to be generated * `OnFailure` requests branching to the given label after clearing the PAC field An assertion in `emitPtrauthCheckAuthenticatedValue` ensures that when `ShouldTrap` is true, `OnFailure` must be null. But the opposite holds as well: when `ShouldTrap` is false, `OnFailure` is always non-null, as otherwise the entire sequence following `AUT[ID][AB]` instruction would turn into a very expensive equivalent of XPAC (unless the CPU implements FEAT_FPAC): authenticate Xn inspect PAC field of Xn if PAC field was not cleared: clear PAC field In other words, the value of `ShouldTrap` argument can be computed as `OnFailure == nullptr` at all existing call sites. In fact, at three of four call sites, constant `true` and `nullptr` are passed as the values of these function arguments. `emitPtrauthAuthResign` is the only caller that potentially makes use of checking-but-not-trapping mode of `emitPtrauthCheckAuthenticatedValue`, and it passes a non-null pointer as `OnFailure` when `ShouldTrap` is false. This commit makes the invariant explicit by omitting the `ShouldTrap` argument and inferring its value from the `OnFailure` argument instead.

z1-cciauto · 2025-11-24T21:08:33Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2955

MuellerMP and others added 30 commits November 24, 2025 10:53

[PDB][NativeSession] Use better error code for invalid format (llvm#1…

29cfef1

…67885) Replaces the default "Success" std::error_code with a more meaningful one if `Magic != file_magic::pdb`.

Revert "[libcxx][ci] Temporarily disable ARM jobs" (llvm#169352)

ccd2c3e

Reverts llvm#169318 Our builders are back online. I see them picking up existing jobs.

[clangd] Fix C++20 build failure

e442c67

[OpenACC][CIR] copyout clause lowering on func-local declare (llvm#16…

dc39fa3

…9350) This is identical to 'copy' and 'copyin', except it uses 'create' and 'copyout' as its entry/exit op. This patch adds the same tests, and similar code for all of it.

[Support] Permit "default" string in AllocToken mode parsing (llvm#16…

ab71452

…9351) Update getAllocTokenModeFromString() to recognize "default" as a valid mode string, mapping it to `DefaultAllocTokenMode`.

[libsycl] Add Maintainers.md file (llvm#168550)

f31e1cf

Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

[lldb] Add const& to InstructionList parameter (llvm#169342)

51fef12

[AMDGPU] Use ListSeparator. NFC. (llvm#169347)

cc0371f

[libc++][string_view] Applied [[nodiscard]] (llvm#169010)

e3d0ac1

`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant

[OpenACC][CIR] 'create' clause lowering on func-local-declare (llvm#1…

78d8298

…69356) This one is another that is effectively identical to copy, copyin, and copyout, except its entry/exit ops pair is create/delete.

[CIR] Add handling for static data members (llvm#169134)

ad1be4a

This adds some trivial handling to force emitting of child decls inside C++ records.

[MLIR][LLVM] Support named barrier as a global variable type in llvm …

76e9834

…dialect (llvm#169194) Enables `amdgcn.named.barrier` target extension type as a global variable type in MLIR.

[windows] improve python3.dll load check (llvm#168864)

c1f24a5

[LV] Pre-commit test for llvm#128062 (llvm#164801)

9688f88

In preparation to extend the work done by dfa665f ([VPlan] Add transformation to narrow interleave groups) to make the narrowing more powerful, pre-commit a test case from llvm#128062.

Reapply "[NFC][bugpoint] Namespace cleanup in bugpoint" (llvm#168961)…

a27bb38

… (llvm#169055) This reverts commit b83e458. Also undo the use of namespace qualifier for `ReducePassList` as that seems to cause build failures.

[OpenACC][CIR] Implement 'present' lowering on local-declare (llvm#16…

1b65752

…9381) Just like the last handful of patches that did copy, copyin, copyout, create, etc, this patch has the exact same behavior, except the entry op is a present, and the exit is delete.

[MLIR][Python] add GetTypeID for llvm.struct_type and llvm.ptr and en…

740d0bd

…able downcasting (llvm#169383)

[gn build] Port 2bdd135

d4cd331

[gn build] Port 3773bbe

0e86510

llvmgnsyncbot and others added 8 commits November 24, 2025 18:39

[gn build] Port 645e0dc

40fb2ca

[clang-doc] Add definition information to class templates (llvm#169109)

4a0d485

merge main into amd-staging

c2de9b2

ronlieb requested review from a team and dpalermo November 24, 2025 21:06

ronlieb requested a review from stellaraccident as a code owner November 24, 2025 21:06

dpalermo approved these changes Nov 24, 2025

View reviewed changes

z1-cciauto merged commit b92350c into amd-staging Nov 25, 2025
13 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251124132223 branch November 25, 2025 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #671

merge main into amd-staging #671

Uh oh!

ronlieb commented Nov 24, 2025

Uh oh!

z1-cciauto commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

33 participants

merge main into amd-staging #671

merge main into amd-staging #671

Uh oh!

Conversation

ronlieb commented Nov 24, 2025

Uh oh!

z1-cciauto commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

33 participants