forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 5
[AutoBump] Merge with ccd3defd (Feb 19) (58) #602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jorickert
wants to merge
77
commits into
bump_to_c4f8da94
Choose a base branch
from
bump_to_ccd3defd
base: bump_to_c4f8da94
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a (no-op) locale version of strftime.
…#127640) CaptureTracking considers insertions into aggregates and vectors as captures. As such, extractions from aggregates and vectors are escape sources. A non-escaping identified local cannot alias with the result of an extractvalue/extractelement. Fixes llvm#126670.
…27705) This patch adds handling of the RISCVISD::VCPOP_VL node in RISCVTargetLowering::computeKnownBitsForTargetNode. It eliminates redundant zero-extension instructions.
…atization (llvm#127754) Issue: Compilation abnormally terminates in parallel default(private) Documentation reference: A threadprivate variable must not appear as the base variable of a list item in any clause except for the copyin and copyprivate clauses Explanation: From the reference, the threadprivate symbols cannot be used in the DSA clauses, which in turn means, the symbol can be skipped for default privatization Fixes llvm#123535
…lvm#125826) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633
…m#127455) Delete `equivalenceAnalysis`, which has been incorporated into the `getAliasingValues` API. Also add an additional test case to ensure that equivalence is properly propagated across function boundaries.
…5836) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631
…patterns. (llvm#127643) Handles both BWI and non-BWI cases (skips PMOV*XBW without BWI). The vector-interleaved-store-i8-stride-8.ll VPTERNLOG diffs are due to better value tracking now recognizing the zero-extension patterns where before it was any-extension
…126762) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631
…ted constexprs in metadata (llvm#127665) Metadata that references unsupported constant expressions can be represented with `poison` metadata instead of `undef` metadata.
The standard libcalls for half to float and float to half conversion are __extendhfsf2 and __truncsfhf2. However, LLVM currently uses __gnu_h2f_ieee and __gnu_f2h_ieee instead. As far as I can tell, these libcalls are an ARM-ism and only provided by libgcc on that platform. compiler-rt always provides both libcalls. Use the standard libcalls by default, and only use the __gnu libcalls on ARM.
gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631
…lvm#127759) This _can_ happen with non-pointers, but we shouldn't diagnose it in that case.
This was only used for gfx940 and gfx941, which have since been removed. For SWDEV-512631
…vm#126887) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all documentation occurrences of gfx940/gfx941 except for the gfx940 ISA description, which will be the subject of a separate PR. For SWDEV-512631
… decoding (llvm#127630) Similar to insert_subvector - limit this to vXi64 vector cases to make the most of cross lane shuffles (for now).
…lvm#126906) gfx940 and gfx941 are no longer supported. This is the last one of a series of PRs to remove them from the code base. The ISA documentation still contains a lot of links and file names with the "gfx940" identifier. Changing them to "gfx942" is probably not worth the cost of breaking all URLs to these pages that users might have saved in the past. For SWDEV-512631
This also includes comparing the two ImpliedDo Details - For ArrayConstructor, check if x and y have the same elements and type - For ImpliedDo, check if x and y have the same lower, upper, stride and values Fixes: llvm#104526
…o-math-errno is set (llvm#121763) This will allow vectorizing these calls (after a few more patches). This should not change the codegen for targets that enable the use of AA during the codegen (in `TargetSubtargetInfo::useAA()`). This includes targets such as AArch64. This notably does not include x86 but can be worked around by passing `-mllvm -combiner-global-alias-analysis=true` to clang. Follow up to llvm#114086.
This commit improves the behaviour of (__clc_)nextafter around zero. Specifically, the nextafter value of very small negative numbers in the positive direction is now negative zero. Previously we'd return positive zero. This behaviour is not required as far as OpenCL is concerned: at least, the CTS isn't testing for it. However, this change does bring our implementation into bit-equivalence with (libstdc++'s implementation of) std::nextafter, tested on all possible values of 32-bit float towards both positive and negative INFINITY. Furthermore, since the implementation of libclc's floating-point 'rtp' and 'rtz' conversions use __clc_nextafter, the previous behaviour was resulting in CTS validation issues. For example, when converting float -0x1.000002p-25 to half, rounding towards zero or positive infinity, nextafter was returning +0.0, whereas the correct conversion requires us to return -0.0. We could work around this issue in the conversion functions, but since the change to nextafter is small enough and the behaviour around zero matches libstdc++, the fix feels at home there. This commit also converts several variables to unsigned types to avoid undefined behaviour surrounding signed underflow on the subtractions. It also converts some variables to be kept in floating-point types, using fabs to get the absolute value rather than by bit-hacking.
Fix affine.parallel op verifier for missing check on zero result lower or upper bound maps. lb/ub maps should have at least one result. Fixes: llvm#120186
Part of the DECLARE REDUCTION was already supported by the parser, but the semantics to add the reduction identifier wasn't implemented. The semantics would not accept the name given by the reduction, so a few lines added to support that. Some tests were in place but not quite working, so fixed those up too. Adding new tests for unparsing and parse-tree, as well as checking the symbolic name being generated. Lowering of DECLARE REDUCTION is not supported in this patch, and a test that it hits the relevant TODO is in this patch (most of this was already existing, but not actually testing the TODO message).
Enable optional ISA extensions on Grace when mcpu=grace is used: sve2-sm4, sve2-aes, sve2-sha3. Grace is no longer an alias, but a separate CPU definition.
The motivation is llvm#123622 and the fact that is hard to fine the last line entry in a given range. `FindLineEntryByAddress(range_end-1)` is the best we have, but it's not ideal because it has a magic -1 and that it relies on there existing a line entry at that address (generally, it should be there, but if for some case it isn't, we might end up ignoring the entries that are there (or -- like my incorrect fix in llvm#123622 did -- iterating through the entire line table). What we really want is to get the last entry that exists in the given range. Or, equivalently (and more STL-like) the first entry after that range. This is what these functions do. I've used the STL names since they do pretty much exactly what the standard functions do (the main head-scratcher comes from the fact that our entries represent ranges rather than single values). The functions can also be used to simplify the maze of `if` statements in `FindLineEntryByAddress`, but I'm keeping that as a separate patch. For now, I'm just adding some unit testing for that function to gain more confidence that the patch does not change the function behavior. --------- Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
) This is per style-guide: make file-scope symbol static whenever possible. Fix llvm#125983.
During a recent change, the build system accidentally dropped the (theoretical) support for the CLC builtins library to build target-specific builtins from the 'amdgpu' directory, due to a change in variable names. This functionality wasn't being used but was spotted during another code review. This commit takes the opportunity to clean up and better document the code that manages the list of directories to search for builtin implementations. While fixing this, some references to now-removed SOURCES files were discovered which have been cleaned up.
This patch adds intrinsics for tcgen05.cp and tcgen05.shift instructions. lit tests are added and verified with a ptxas-12.8 executable. Docs are updated in the NVPTXUsage.rst file. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
While reviewing llvm#127623, I missed that it didn’t have a release note.
Doing so provides stability when compiling the builtins in a mode in which unqualified pointers may be interpreted as being in the generic address space, such as in OpenCL 3.0. We eventually want to provide 'generic' overloads of the builtins in libclc so this prepares the ground a little better. It could be argued that having the internal CLC helper functions be unqualified is more flexible, in case it's better for a target to have the pointers in the generic address space. This commits to the private address space for more stability across different OpenCL environments.
…7682) This makes GetOutputStreamSP and GetErrorStreamSP protected members of Debugger. Users who want to print to the debugger's stream should use GetAsyncOutputStreamSP and GetAsyncErrorStreamSP instead and the few remaining stragglers have been migrated.
…ureInfo (llvm#125880)" This reverts commit 0fab404. Seems to break LTO builds of clang on Windows, see comments on llvm#125880
A handful of minor improvements to StreamAsynchronousIO: - Document the class. - Use a named enum value to distinguishing between stdout and stderr. - Add missing period to comment. - Clear the string instead of assigning to it. - Eliminate color argument.
…llvm#121109) As a follow-up to llvm#121013 (which optimized `ranges::copy`) and llvm#121026 (which optimized `ranges::copy_backward`), this PR enhances the performance of `std::ranges::{move, move_backward}` for `vector<bool>::iterator`, addressing a subtask outlined in issue llvm#64038. The optimizations bring performance improvements analogous to those achieved for the `{copy, copy_backward}` algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.
Previous PR llvm#122950 get reverted since it hit the buildbot failure. Another patch get merged when this PR is under review, and thus causing one test not up to date. repen this PR and fixed the issue.
For example, determine that the address in `obj%p` below cannot alias the address of `v`: ``` module m type :: ty real, pointer :: p end type ty end module m subroutine test() use m real, target :: t real :: v type(ty) :: obj obj%p => t v = obj%p end subroutine test ```
This patch allows using fpfeatures pragmas with __builtin_convertvector: - added TrailingObjects with FPOptionsOverride and methods for handling it to ConvertVectorExpr - added support for codegen, node dumping, and serialization of fpfeatures contained in ConvertVectorExpr
Add frontend actions to support emitting assembly, bitcode, and object files when compiling with ClangIR. This change also correctly sets and propagates the target triple in the MLIR and LLVM modules, which was a necessary prerequisite for emitting assembly and object files.
When the script has executed `cd %t`, it is fine to to use the output file `a.out`. (We don't want to rely on lit's default PWD to support lit compatible runners. Therefore -o /dev/null is used when PWD has not been changed to a %t derived path.)
for `trunc nuw` saves a instruction and otherwise only other instructions without the select, same behavior as for bit test before. proof: https://alive2.llvm.org/ce/z/a6QmyV
…lvm#120909) This refactor includes the following changes: - Refactor similar tests using `types::for_each` to remove redundant code; - Explicitly include the missing header `type_algorithms.h` in some test files; - Some tests scattered in different test functions with ad-hoc names (e.g., `test5()`, `test6()`) but belong to the same kind are now grouped into one function (`test_struct_array()`).
The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect VPBB for the ExitBB; the regions successor is the middle block, no the exit block. It also unnecessarily triggers an assertion after 38376de.
…, replace HasVariableMask bool arg. NFC. (llvm#127826) Minor NFC refactor before making better variable mask combining decisions - isTargetShuffleVariableMask doesn't discriminate between fast (AND, PSHUFB etc.) and slow (VPERMV3 etc.) variable shuffles, so an opaque HasVariableMask is only of limited use.
…ck in SymbolCollector::handleMacros() (llvm#127757)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.