-
Notifications
You must be signed in to change notification settings - Fork 77
merge amd-staging into amd-feature/wave-transform #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge amd-staging into amd-feature/wave-transform #704
Conversation
This is noted by the specification, and should save a dynamic instruction. Code size should be no worse than before, as the pairs of moves can usually be turned into two 16-bit moves, but `fmv.d` is always a 32-bit instruction. LLVM can look through a `FSGNJ_D_IN32X`, in `RISCVInstrInfo::isCopyInstrImpl` which helps copy propagation.
Automated with `sed -i 's/\.Value//g' lib/Target/AMDGPU/*.td` plus a tiny bit of manual reformatting.
This upstreams the code to handle member initialization for non-record arrays.
When individual elements of a vector are updated via vector swizzle, it needs to be handled as separate store operations to the individual vector elements. Clang treats vectors as one unit, so if a part of a vector needs to be updated, the whole vector is loaded, some elements modified, and then the whole vector is stored. In HLSL vector elements are handled separately. We need to avoid this load/modify/store sequence to prevent overwriting other vector elements that might be getting updated in parallel. Fixes llvm#152815
Flags are now passed on construction/cloning. Remove unnecessary transferFlags call, and make code independent of VPRecipeWithIRFlags, to support additional recipes in the future.
It is possible that a fork could occur while MutexTSDs is being held and then cause a deadlock in a forked process when something attempts to lock it again. Instead add it to the enable/disable list of mutexes.
Add test cases for canonicalizing AddRecs that may wrap.
…-vector-to-llvm`." (llvm#169570) Reverts llvm#166204 There was a build issue due to a missing dependency.
libcxx requires minimal macOS 11 to build. This patch bumps the minimal OS X target in Fuchsia's cmake cache file to 11.0 to satisfy this requirement.
…69569) Some globals (e.g., fir.global) have initialization regions that may transitively reference other globals or type descriptors. Add getInitRegion() to GlobalVariableOpInterface to retrieve these regions, returning Region* (nullptr if the global uses attributes for initialization, as with memref.global).
…m#166597) Currently, -gsplit-dwarf and -mrelax are incompatible options in Clang. The issue is that .dwo files should not contain any relocations, as they are not processed by the linker. However, relaxable code emits relocations in DWARF for debug ranges that reside in the .dwo file when DWARF fission is enabled. This patch makes DWARF fission compatible with RISC-V relaxations. It uses the StartxEndx DWARF forms in .debug_rnglists.dwo, which allow referencing addresses from .debug_addr instead of using absolute addresses. This approach eliminates relocations from .dwo files.
…#682) AMDGPU requires more complex CFI rules, normally these would be expressed with .cfi_escape, however this would make the CFI unreadable and makes it difficult to update registers in CFI instructions (also something AMDGPU requires). Authored-by: Emma Pilkington <Emma.Pilkington@amd.com> (cherry picked from commit fd94b41)
This PR adds codegen for `cir.await` ready and suspend. One notable difference from the classic codegen is that, in the suspend branch, it emits an `AwaitSuspendWrapper`(`.__await_suspend_wrapper__init`) function that is always inlined. This function wraps the suspend logic inside an internal wrapper that gets inlined. Example here: https://godbolt.org/z/rWYGcaaG4
…" (llvm#169546) This reverts commit f67409c. cc @fiigii Including us, several separate groups are experiencing regressions with this change. This is the smallest reproducer pasted by @akuegel : llvm#162930 (comment)
Allows construction of ErrorAsOutParameters from Error references.
…m#169565) Use `getNestedDoConstruct` from Utils to get the nested DoConstructs. Fixes llvm#169532
Implementation similar to the clang one in `clang/lib/Headers/__clang_cuda_intrinsics.h`
This makes all of the clangd tests work with the internal shell. Modifications needed for each test are as follows: 1. system-include-extractor.test was using variable expansion which is not supported in the internal shell. This patch rewrites it to use the readfile mechanism along with python. This isn't super pretty but is readily understandable and there are only two tests across the monorepo that use this construction, so making it prettier is hard to justify. 2. include-cleaner-batch-fix.test - Was using $'' construction to create new lines in a string. Simply replace it with multiple echo commands to be canonical with the rest of the repository. 3. index-tools.test - Just add IndexBenchmark to the clangd test depends, so the test now just works unconditionally. This should significantly increase test coverage at little cost. Reviewers: ilovepi, HighCommander4, petrhosek, kadircet Reviewed By: ilovepi Pull Request: llvm#169539
Enable it now that all of the tests pass under the internal shell. The internal shell is slightly faster (10-15%) and also provides a better debugging experience. Reviewers: petrhosek, ilovepi, kadircet, HighCommander4 Reviewed By: ilovepi Pull Request: llvm#169540
Fixes smoke-dev mpi-reduce, mpi-allreduce. Fixes errors of the form EmissaryMPI.h:54:3: error: unknown type name 'MPI_Datatype'.
This reverts commit c51c382. This breaks at least one buildbot: 1. https://lab.llvm.org/buildbot/#/builders/134/builds/30460
This reverts commit 9c414c4. This one is causing buildbot failures too at CMake configure time: 1. https://lab.llvm.org/buildbot/#/builders/193/builds/12452
Split from llvm#158900 it adds a PerThreadContainer that can use STL-like indexed containers based on a slightly refactored PerThreadTable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
…uiltins (llvm#168666) This PR extends __scoped_atomic builtins with inc and dec functions. They map to LLVM IR `atomicrmw uinc_wrap` and `atomicrmw udec_wrap`. These enable implementation of OpenCL-style atomic_inc / atomic_dec with wrap semantics on targets supporting scoped atomics (e.g. GPUs). --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-uses.ll` due to changes to ReplaceConstant (llvm#169848) Fixes an LLVM DirectX codegen test after it broke due to llvm#169141 The CBuffer loads and GEPs are no longer duplicated when there are two or more accesses within the same basic block. This PR removes the duplicate check for CBuffer load and GEP from the original test function `@f` and adds a new test function `@g` which places duplicate CBuffer loads into separate basic blocks.
Indirect call instrumentation snippet uses x16 register in exit
handler to go to destination target
__bolt_instr_ind_call_handler_func:
msr nzcv, x1
ldp x0, x1, [sp], llvm#16
ldr x16, [sp], llvm#16
ldp x0, x1, [sp], llvm#16
br x16 <-----
This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect instruction.
Example:
mov x16, foo
infirectCall:
adrp x8, Label
add x8, x8, #:lo12:Label
blr x8
Before:
Instrumented indirect call:
stp x0, x1, [sp, #-16]!
mov x0, x8
movk x1, #0x0, lsl llvm#48
movk x1, #0x0, lsl llvm#32
movk x1, #0x0, lsl llvm#16
movk x1, #0x0
stp x0, x1, [sp, #-16]!
adrp x0, __bolt_instr_ind_call_handler_func
add x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
blr x0
__bolt_instr_ind_call_handler: (exit snippet)
msr nzcv, x1
ldp x0, x1, [sp], llvm#16
ldr x16, [sp], llvm#16
ldp x0, x1, [sp], llvm#16
br x16 <- overwrites the original value in X16
__bolt_instr_ind_call_handler_func: (entry snippet)
stp x0, x1, [sp, #-16]!
mrs x1, nzcv
adrp x0, __bolt_instr_ind_call_handler
add x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
ldr x0, [x0]
cmp x0, #0x0
b.eq __bolt_instr_ind_call_handler
str x30, [sp, #-16]!
blr x0 <--- runtime lib store/load all regs
ldr x30, [sp], llvm#16
b __bolt_instr_ind_call_handler
_________________________________________________________________________
After:
mov x16, foo
infirectCall:
adrp x8, Label
add x8, x8, #:lo12:Label
blr x8
Instrumented indirect call:
stp x0, x1, [sp, #-16]!
mov x0, x8
movk x1, #0x0, lsl llvm#48
movk x1, #0x0, lsl llvm#32
movk x1, #0x0, lsl llvm#16
movk x1, #0x0
stp x0, x30, [sp, #-16]!
adrp x8, __bolt_instr_ind_call_handler_func
add x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
blr x8 <--- call trampoline instr lib
ldp x0, x30, [sp], llvm#16
mov x8, x0 <---- restore original target
ldp x0, x1, [sp], llvm#16
blr x8 <--- original indirect call instruction
// don't touch regs besides x0, x1
__bolt_instr_ind_call_handler: (exit snippet)
ret <---- return to original function with indirect call
__bolt_instr_ind_call_handler_func: (entry snippet)
adrp x0, __bolt_instr_ind_call_handler
add x0, x0, #:lo12:__bolt_instr_ind_call_handler
ldr x0, [x0]
cmp x0, #0x0
b.eq __bolt_instr_ind_call_handler
str x30, [sp, #-16]!
blr x0 <--- runtime lib store/load all regs
ldr x30, [sp], llvm#16
b __bolt_instr_ind_call_handler
Upstream TryCall Op as a prerequisite for Try Catch work Issue llvm#154992
Add a variant of m_Intrinsic that matches a variable runtime ID.
…lvm#169338) In some case, VPWidenPointerInductions become only used by scalars after legalizeAndOptimizationInducftions was already run, for example due to some VPlan optimizations. Move the code to scalarize VPWidenPointerInductions to a helper and use it if needed. This fixes a crash after llvm#148274 in the added test case. Fixes llvm#169780
…ses.ll` more strict (llvm#169855) Continuation of PR llvm#169848 to address PR comments. This PR makes the test more strict by adding CHECKs to ensure the loads are indeed using the same or different GEPs.
passed https://compiler-ci.amd.com/blue/organizations/jenkins/compiler-psdb-amd-staging/detail/compiler-psdb-amd-staging/3005/pipeline/722/ which failed trying to land PR , since approval missing
Proof: https://alive2.llvm.org/ce/z/a5fzlJ Closes llvm#146642 --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
This pull request addresses an issue encountered when building **libcxx** with certain configurations (`-D_LIBCPP_HAS_MUSL_LIBC` & `-D__linux__`) that lack the `_GNU_SOURCE` definition. Specifically, this issue arises if the system **musl libc** is built with `_BSD_SOURCE` instead of `_GNU_SOURCE`. The resultant configuration leads to problems with the "Strtonum functions" in the file [libcxx/include/__locale_dir/support/linux.h](https://github.com/llvm/llvm-project/tree/master/libcxx/include/__locale_dir/support/linux.h), affecting the following functions: - `__strtof` - `__strtod` - `__strtold` **Error messages displayed include**: ```console error: no member named 'strtof_l' in the global namespace ``` ```console error: no member named 'strtod_l' in the global namespace ``` ```console error: no member named 'strtold_l' in the global namespace ``` For more insight, relevant code can be accessed [here](https://github.com/llvm/llvm-project/blob/79cd1b7a25cdbf42c7234999ae9bc51db30af1f0/libcxx/include/__locale_dir/support/linux.h#L85-L95).
…69458) Summary ====== This PR update the schedule for online sync-up and update link for past meeting slides. Changes ====== * Remove the wednesday schedule, since we did not have the meeting for Americas-friendly timezones. * Use a single folder for past meeting slides instead of individual links. Related Links ========= * [Meeting materials for Qualification Working Group](https://llvm.org/docs/QualGroup.html#meeting-materials) * [Online Sync-Ups](https://llvm.org/docs/GettingInvolved.html#online-sync-ups) --------- Signed-off-by: ZakyHermawan <zaky.hermawan9615@gmail.com>
…vm#166684) The `arith.cmpf` lowering pattern used to generate invalid IR when an unsupported floating-point type was used.
…ations. (llvm#169273) This is achieved by using some of the bits of RelType to tag vendor namespaces. This change also adds a relocation iterator for RISCV that folds vendor namespaces into the RelType of the following relocation. This patch is extracted from the implementation of RISCV vendor-specific relocations in the CHERIoT LLVM downstream: CHERIoT-Platform/llvm-project@3d6d6f7
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/30 |
No description provided.