[AutoBump] Merge with ccd3defd (Feb 19) (58) #602

jorickert · 2025-06-18T08:30:49Z

No description provided.

This is a (no-op) locale version of strftime.

…#127640) CaptureTracking considers insertions into aggregates and vectors as captures. As such, extractions from aggregates and vectors are escape sources. A non-escaping identified local cannot alias with the result of an extractvalue/extractelement. Fixes llvm#126670.

Reverts llvm#127708

…27705) This patch adds handling of the RISCVISD::VCPOP_VL node in RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates redundant zero-extension instructions.

…atization (llvm#127754) Issue: Compilation abnormally terminates in parallel default(private) Documentation reference: A threadprivate variable must not appear as the base variable of a list item in any clause except for the copyin and copyprivate clauses Explanation: From the reference, the threadprivate symbols cannot be used in the DSA clauses, which in turn means, the symbol can be skipped for default privatization Fixes llvm#123535

…lvm#125826) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633

llvm#125827)

…m#127455) Delete `equivalenceAnalysis`, which has been incorporated into the `getAliasingValues` API. Also add an additional test case to ensure that equivalence is properly propagated across function boundaries.

…5836) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631

…patterns. (llvm#127643) Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI). The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to better value tracking now recognizing the zero-extension patterns where before it was any-extension

…126762) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631

…ted constexprs in metadata (llvm#127665) Metadata that references unsupported constant expressions can be represented with `poison` metadata instead of `undef` metadata.

The standard libcalls for half to float and float to half conversion are __extendhfsf2 and __truncsfhf2. However, LLVM currently uses __gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these libcalls are an ARM-ism and only provided by libgcc on that platform. compiler-rt always provides both libcalls. Use the standard libcalls by default, and only use the __gnu libcalls on ARM.

gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631

…lvm#127759) This _can_ happen with non-pointers, but we shouldn't diagnose it in that case.

This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631

…vm#126887) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all documentation occurrences of gfx940/gfx941 except for the gfx940 ISA description, which will be the subject of a separate PR. For SWDEV-512631

… decoding (llvm#127630) Similar to insert_subvector - limit this to vXi64 vector cases to make the most of cross lane shuffles (for now).

…lvm#126906) gfx940 and gfx941 are no longer supported. This is the last one of a series of PRs to remove them from the code base. The ISA documentation still contains a lot of links and file names with the "gfx940" identifier. Changing them to "gfx942" is probably not worth the cost of breaking all URLs to these pages that users might have saved in the past. For SWDEV-512631

This also includes comparing the two ImpliedDo Details - For ArrayConstructor, check if x and y have the same elements and type - For ImpliedDo, check if x and y have the same lower, upper, stride and values Fixes: llvm#104526

…o-math-errno is set (llvm#121763) This will allow vectorizing these calls (after a few more patches). This should not change the codegen for targets that enable the use of AA during the codegen (in `TargetSubtargetInfo::useAA()`). This includes targets such as AArch64. This notably does not include x86 but can be worked around by passing `-mllvm -combiner-global-alias-analysis=true` to clang. Follow up to llvm#114086.

This commit improves the behaviour of (__clc_)nextafter around zero. Specifically, the nextafter value of very small negative numbers in the positive direction is now negative zero. Previously we'd return positive zero. This behaviour is not required as far as OpenCL is concerned: at least, the CTS isn't testing for it. However, this change does bring our implementation into bit-equivalence with (libstdc++'s implementation of) std::nextafter, tested on all possible values of 32-bit float towards both positive and negative INFINITY. Furthermore, since the implementation of libclc's floating-point 'rtp' and 'rtz' conversions use __clc_nextafter, the previous behaviour was resulting in CTS validation issues. For example, when converting float -0x1.000002p-25 to half, rounding towards zero or positive infinity, nextafter was returning +0.0, whereas the correct conversion requires us to return -0.0. We could work around this issue in the conversion functions, but since the change to nextafter is small enough and the behaviour around zero matches libstdc++, the fix feels at home there. This commit also converts several variables to unsigned types to avoid undefined behaviour surrounding signed underflow on the subtractions. It also converts some variables to be kept in floating-point types, using fabs to get the absolute value rather than by bit-hacking.

Fix affine.parallel op verifier for missing check on zero result lower or upper bound maps. lb/ub maps should have at least one result. Fixes: llvm#120186

Part of the DECLARE REDUCTION was already supported by the parser, but the semantics to add the reduction identifier wasn't implemented. The semantics would not accept the name given by the reduction, so a few lines added to support that. Some tests were in place but not quite working, so fixed those up too. Adding new tests for unparsing and parse-tree, as well as checking the symbolic name being generated. Lowering of DECLARE REDUCTION is not supported in this patch, and a test that it hits the relevant TODO is in this patch (most of this was already existing, but not actually testing the TODO message).

Enable optional ISA extensions on Grace when mcpu=grace is used: sve2-sm4, sve2-aes, sve2-sha3. Grace is no longer an alias, but a separate CPU definition.

The motivation is llvm#123622 and the fact that is hard to fine the last line entry in a given range. `FindLineEntryByAddress(range_end-1)` is the best we have, but it's not ideal because it has a magic -1 and that it relies on there existing a line entry at that address (generally, it should be there, but if for some case it isn't, we might end up ignoring the entries that are there (or -- like my incorrect fix in llvm#123622 did -- iterating through the entire line table). What we really want is to get the last entry that exists in the given range. Or, equivalently (and more STL-like) the first entry after that range. This is what these functions do. I've used the STL names since they do pretty much exactly what the standard functions do (the main head-scratcher comes from the fact that our entries represent ranges rather than single values). The functions can also be used to simplify the maze of `if` statements in `FindLineEntryByAddress`, but I'm keeping that as a separate patch. For now, I'm just adding some unit testing for that function to gain more confidence that the patch does not change the function behavior. --------- Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>

) This is per style-guide: make file-scope symbol static whenever possible. Fix llvm#125983.

…lvm#127782)

During a recent change, the build system accidentally dropped the (theoretical) support for the CLC builtins library to build target-specific builtins from the 'amdgpu' directory, due to a change in variable names. This functionality wasn't being used but was spotted during another code review. This commit takes the opportunity to clean up and better document the code that manages the list of directories to search for builtin implementations. While fixing this, some references to now-removed SOURCES files were discovered which have been cleaned up.

This patch adds intrinsics for tcgen05.cp and tcgen05.shift instructions. lit tests are added and verified with a ptxas-12.8 executable. Docs are updated in the NVPTXUsage.rst file. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

While reviewing llvm#127623, I missed that it didn’t have a release note.

…126048)

Doing so provides stability when compiling the builtins in a mode in which unqualified pointers may be interpreted as being in the generic address space, such as in OpenCL 3.0. We eventually want to provide 'generic' overloads of the builtins in libclc so this prepares the ground a little better. It could be argued that having the internal CLC helper functions be unqualified is more flexible, in case it's better for a target to have the pointers in the generic address space. This commits to the private address space for more stability across different OpenCL environments.

…7682) This makes GetOutputStreamSP and GetErrorStreamSP protected members of Debugger. Users who want to print to the debugger's stream should use GetAsyncOutputStreamSP and GetAsyncErrorStreamSP instead and the few remaining stragglers have been migrated.

…ureInfo (llvm#125880)" This reverts commit 0fab404. Seems to break LTO builds of clang on Windows, see comments on llvm#125880

A handful of minor improvements to StreamAsynchronousIO: - Document the class. - Use a named enum value to distinguishing between stdout and stderr. - Add missing period to comment. - Clear the string instead of assigning to it. - Eliminate color argument.

…llvm#121109) As a follow-up to llvm#121013 (which optimized `ranges::copy`) and llvm#121026 (which optimized `ranges::copy_backward`), this PR enhances the performance of `std::ranges::{move, move_backward}` for `vector<bool>::iterator`, addressing a subtask outlined in issue llvm#64038. The optimizations bring performance improvements analogous to those achieved for the `{copy, copy_backward}` algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.

Previous PR llvm#122950 get reverted since it hit the buildbot failure. Another patch get merged when this PR is under review, and thus causing one test not up to date. repen this PR and fixed the issue.

For example, determine that the address in `obj%p` below cannot alias the address of `v`: ``` module m type :: ty real, pointer :: p end type ty end module m subroutine test() use m real, target :: t real :: v type(ty) :: obj obj%p => t v = obj%p end subroutine test ```

This patch allows using fpfeatures pragmas with __builtin_convertvector: - added TrailingObjects with FPOptionsOverride and methods for handling it to ConvertVectorExpr - added support for codegen, node dumping, and serialization of fpfeatures contained in ConvertVectorExpr

Add frontend actions to support emitting assembly, bitcode, and object files when compiling with ClangIR. This change also correctly sets and propagates the target triple in the MLIR and LLVM modules, which was a necessary prerequisite for emitting assembly and object files.

) Disable fold when it will result in more instructions.

When the script has executed `cd %t`, it is fine to to use the output file `a.out`. (We don't want to rely on lit's default PWD to support lit compatible runners. Therefore -o /dev/null is used when PWD has not been changed to a %t derived path.)

for `trunc nuw` saves a instruction and otherwise only other instructions without the select, same behavior as for bit test before. proof: https://alive2.llvm.org/ce/z/a6QmyV

…lvm#120909) This refactor includes the following changes: - Refactor similar tests using `types::for_each` to remove redundant code; - Explicitly include the missing header `type_algorithms.h` in some test files; - Some tests scattered in different test functions with ad-hoc names (e.g., `test5()`, `test6()`) but belong to the same kind are now grouped into one function (`test_struct_array()`).

…127810) See https://discourse.llvm.org/t/taking-ownership-of-clang-test-analysis/84689

The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect VPBB for the ExitBB; the regions successor is the middle block, no the exit block. It also unnecessarily triggers an assertion after 38376de.

…, replace HasVariableMask bool arg. NFC. (llvm#127826) Minor NFC refactor before making better variable mask combining decisions - isTargetShuffleVariableMask doesn't discriminate between fast (AND, PSHUFB etc.) and slow (VPERMV3 etc.) variable shuffles, so an opaque HasVariableMask is only of limited use.

…ck in SymbolCollector::handleMacros() (llvm#127757)

petrhosek and others added 30 commits February 18, 2025 23:54

[libc] Add strftime_l (llvm#127708)

9072ba7

This is a (no-op) locale version of strftime.

Revert "[libc] Add strftime_l" (llvm#127766)

a2b4d4e

Reverts llvm#127708

[RISCVISel] Compute leading zeros for RISCVISD::VCPOP_VL node (llvm#1…

b9a1e58

…27705) This patch adds handling of the RISCVISD::VCPOP_VL node in RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates redundant zero-extension instructions.

[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (l…

a2f9ae1

…lvm#125826) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633

[AMDGPU] Add missing gfx architectures to AddFlangOffloadRuntime.cmake (

55fb793

llvm#125827)

[AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (llvm#12…

8900e41

…5836) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631

[BitcodeReader] Use poison instead of undef to represent unsuppor…

4af8c53

…ted constexprs in metadata (llvm#127665) Metadata that references unsupported constant expressions can be represented with `poison` metadata instead of `undef` metadata.

[clang][bytecode] Fix three-way unordered non-pointer comparisions (l…

1760289

…lvm#127759) This _can_ happen with non-pointers, but we shouldn't diagnose it in that case.

[AMDGPU] Remove FeatureForceStoreSC0SC1 (llvm#126878)

2260d59

This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631

[X86] getFauxShuffleMask - add support for vXi64/vXf64 concat_vectors…

0607f94

… decoding (llvm#127630) Similar to insert_subvector - limit this to vXi64 vector cases to make the most of cross lane shuffles (for now).

[MLIR][Affine] Fix affine.parallel op verifier (llvm#127611)

3c938d0

Fix affine.parallel op verifier for missing check on zero result lower or upper bound maps. lb/ub maps should have at least one result. Fixes: llvm#120186

[AArch64] Add optional extensions enabled on Grace (llvm#127620)

404f94a

Enable optional ISA extensions on Grace when mcpu=grace is used: sve2-sm4, sve2-aes, sve2-sha3. Grace is no longer an alias, but a separate CPU definition.

[NFC][MLIR] Make file-local cl::opt global variables static (llvm#126714

c0a763d

) This is per style-guide: make file-scope symbol static whenever possible. Fix llvm#125983.

AMDGPU: Avoid double attribute lookup for register count attributes (l…

0f472e9

…lvm#127782)

[NVPTX] Add tcgen05.cp/shift intrinsics (llvm#127669)

3ce2e4d

This patch adds intrinsics for tcgen05.cp and tcgen05.shift instructions. lit tests are added and verified with a ptxas-12.8 executable. Docs are updated in the NVPTXUsage.rst file. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

Sirraide and others added 30 commits February 19, 2025 16:48

[Clang] Add release note for llvm#127623 (llvm#127815)

888c099

While reviewing llvm#127623, I missed that it didn’t have a release note.

[libc++] Avoid code duplication in strings operator+ overloads (llvm#…

3e61c1a

…126048)

[Analysis] Avoid repeated hash lookups (NFC) (llvm#127743)

2f2295c

[AsmPrinter] Avoid repeated hash lookups (NFC) (llvm#127744)

c23256e

[CodeGen] Avoid repeated hash lookups (NFC) (llvm#127745)

af922cf

[Object] Avoid repeated hash lookups (NFC) (llvm#127746)

1bb72f0

[Support] Avoid repeated hash lookups (NFC) (llvm#127747)

bb75a96

[X86] Avoid repeated hash lookups (NFC) (llvm#127748)

fc5849d

Revert "Reapply [CaptureTracking][FunctionAttrs] Add support for Capt…

e2ba1b6

…ureInfo (llvm#125880)" This reverts commit 0fab404. Seems to break LTO builds of clang on Windows, see comments on llvm#125880

[AMDGPU][True16][CodeGen] true16 codegen pattern for fma (llvm#127240)

210036a

Previous PR llvm#122950 get reverted since it hit the buildbot failure. Another patch get merged when this PR is under review, and thus causing one test not up to date. repen this PR and fixed the issue.

[AMDGPU] Remove unused variables. NFC

ddf2408

[InstCombine] avoid extra instructions in foldSelectICmpAnd (llvm#127398

8fc03e4

) Disable fold when it will result in more instructions.

[ELF,test] Remove unneeded -o /dev/null

0ffe270

When the script has executed `cd %t`, it is fine to to use the output file `a.out`. (We don't want to rely on lit's default PWD to support lit compatible runners. Therefore -o /dev/null is used when PWD has not been changed to a %t derived path.)

Remove header file spuriously added by 9905728.

8ecd788

[InstCombine] handle trunc to i1 in foldSelectICmpAndBinOp (llvm#127390)

aa847ce

for `trunc nuw` saves a instruction and otherwise only other instructions without the select, same behavior as for bit test before. proof: https://alive2.llvm.org/ce/z/a6QmyV

[ELF,test] Clean up aarch64-relocs.s

e1d1bb9

[SLP][NFC]Replace undefs by zeroinitializer

3e8db13

Include test folder in the Clang Static Analyzer team mentions (llvm#…

5450954

…127810) See https://discourse.llvm.org/t/taking-ownership-of-clang-test-analysis/84689

[VPlan] Remove dead exit block handling code in HCFGBuilder.

a96444a

The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect VPBB for the ExitBB; the regions successor is the middle block, no the exit block. It also unnecessarily triggers an assertion after 38376de.

[clangd] Avoid round-trip from SourceLocation to clangd::Range and ba…

ccd3def

…ck in SymbolCollector::handleMacros() (llvm#127757)

[AutoBump] Merge with ccd3def (Feb 19)

5e85473

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with ccd3defd (Feb 19) (58) #602

[AutoBump] Merge with ccd3defd (Feb 19) (58) #602

Uh oh!

jorickert commented Jun 18, 2025

Uh oh!

Uh oh!

[AutoBump] Merge with ccd3defd (Feb 19) (58) #602

Are you sure you want to change the base?

[AutoBump] Merge with ccd3defd (Feb 19) (58) #602

Uh oh!

Conversation

jorickert commented Jun 18, 2025

Uh oh!

Uh oh!