New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rust_OBJS to fix compiling error #4
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
philberty
pushed a commit
that referenced
this pull request
Mar 3, 2021
This adds implementation for the optabs for complex operations. With this the following C code: void g (float complex a[restrict N], float complex b[restrict N], float complex c[restrict N]) { for (int i=0; i < N; i++) c[i] = a[i] * b[i]; } generates NEON: g: vmov.f32 q11, #0.0 @ v4sf add r3, r2, #1600 .L2: vmov q8, q11 @ v4sf vld1.32 {q10}, [r1]! vld1.32 {q9}, [r0]! vcmla.f32 q8, q9, q10, #0 vcmla.f32 q8, q9, q10, #90 vst1.32 {q8}, [r2]! cmp r3, r2 bne .L2 bx lr MVE: g: push {lr} mov lr, #100 dls lr, lr .L2: vldrw.32 q1, [r1], #16 vldrw.32 q2, [r0], #16 vcmul.f32 q3, q2, q1, #0 vcmla.f32 q3, q2, q1, #90 vstrw.32 q3, [r2], #16 le lr, .L2 ldr pc, [sp], #4 instead of g: add r3, r2, #1600 .L2: vld2.32 {d20-d23}, [r0]! vld2.32 {d16-d19}, [r1]! vmul.f32 q14, q11, q9 vmul.f32 q15, q11, q8 vneg.f32 q14, q14 vfma.f32 q15, q10, q9 vfma.f32 q14, q10, q8 vmov q13, q15 @ v4sf vmov q12, q14 @ v4sf vst2.32 {d24-d27}, [r2]! cmp r3, r2 bne .L2 bx lr and g: add r3, r2, #1600 .L2: vld2.32 {d20-d23}, [r0]! vld2.32 {d16-d19}, [r1]! vmul.f32 q15, q10, q8 vmul.f32 q14, q10, q9 vmls.f32 q15, q11, q9 vmla.f32 q14, q11, q8 vmov q12, q15 @ v4sf vmov q13, q14 @ v4sf vst2.32 {d24-d27}, [r2]! cmp r3, r2 bne .L2 bx lr respectively. gcc/ChangeLog: * config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1, VCMLA_OP, VCMUL_OP): New. * config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Support vec_dup 0. * config/arm/neon.md (cmul<conj_op><mode>3): New. * config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ, UNSPEC_VCMUL_CONJ): New. * config/arm/vec-common.md (cmul<conj_op><mode>3, arm_vcmla<rot><mode>, cml<fcmac1><conj_op><mode>4): New.
philberty
pushed a commit
that referenced
this pull request
Mar 3, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128' #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77 #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190 #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338 #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370 #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f) #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24) #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd) Differential Revision: https://reviews.llvm.org/D97263 Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.
bors bot
pushed a commit
that referenced
this pull request
Sep 30, 2021
The current restriction on folding memcpy to a single element of size MOVE_MAX is excessively cautious on most machines and limits some significant further optimizations. So relax the restriction provided the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET insn exists for moving the value into machine registers. Note that there were already checks in place for having misaligned move operations when one or more of the operands were unaligned. On Arm this now permits optimizing uint64_t bar64(const uint8_t *rData1) { uint64_t buffer; memcpy(&buffer, rData1, sizeof(buffer)); return buffer; } from ldr r2, [r0] @ unaligned sub sp, sp, #8 ldr r3, [r0, #4] @ unaligned strd r2, [sp] ldrd r0, [sp] add sp, sp, #8 to mov r3, r0 ldr r0, [r0] @ unaligned ldr r1, [r3, #4] @ unaligned PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations gcc/ChangeLog: PR target/102125 * gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
bors bot
pushed a commit
that referenced
this pull request
Jan 25, 2022
…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.
dkm
pushed a commit
that referenced
this pull request
Jan 6, 2023
With many thanks to H.J. for doing all the hard work, this patch resolves two P1 regressions; PR target/106933 and PR target/106959. Although superficially similar, the i386 backend's two scalar-to-vector (STV) passes perform their transformations in importantly different ways. The original pass converting SImode and DImode operations to V4SImode or V2DImode operations is "soft", allowing values to be maintained in both integer and vector hard registers. The newer pass converting TImode operations to V1TImode is "hard" (all or nothing) that converts all uses of a pseudo to vector form. To implement this it invokes powerful ju-ju calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates this pseudo's mode everywhere in the RTL chain. Hence, TImode STV can only be performed when all uses of a pseudo are convertible to V1TImode form. To ensure this the STV passes currently use data-flow analysis to inspect all DEFs and USEs in a chain. This works fine for chains that are in the usual single assignment form, but the occurrence of uninitialized variables, or multiple assignments that split a pseudo's usage into several independent chains (lifetimes) can lead to situations where some but not all of a pseudo's occurrences need to be updated. This is safe for the SImode/DImode pass, but leads to the above bugs during the TImode pass. My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959 is to only perform the new single_def_chain_p check for TImode STV; it turns out that STV of SImode/DImode min/max operates safely on multiple-def chains, and prohibiting this leads to testsuite regressions. We don't (yet) support V1TImode min/max, so this idiom isn't an issue during the TImode STV pass. For the record, the two alternate possible fixes are (i) make the TImode STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx with a new pseudo, or (ii) merging "chains" so that multiple DFA chains/lifetimes are considered a single STV chain. 2022-12-23 H.J. Lu <hjl.tools@gmail.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/106933 PR target/106959 * config/i386/i386-features.cc (single_def_chain_p): New predicate function to check that a pseudo's use-def chain is in SSA form. (timode_scalar_to_vector_candidate_p): Check that TImode regs that are SET_DEST or SET_SRC of an insn match/are single_def_chain_p. gcc/testsuite/ChangeLog PR target/106933 PR target/106959 * gcc.target/i386/pr106933-1.c: New test case. * gcc.target/i386/pr106933-2.c: Likewise. * gcc.target/i386/pr106959-1.c: Likewise. * gcc.target/i386/pr106959-2.c: Likewise. * gcc.target/i386/pr106959-3.c: Likewise.
CohenArthur
pushed a commit
that referenced
this pull request
Jan 31, 2023
The aarch64 ISA specification allows a left shift amount to be applied after extension in the range of 0 to 4 (encoded in the imm3 field). This is true for at least the following instructions: * ADD (extend register) * ADDS (extended register) * SUB (extended register) The result of this patch can be seen, when compiling the following code: uint64_t myadd(uint64_t a, uint64_t b) { return a+(((uint8_t)b)<<4); } Without the patch the following sequence will be generated: 0000000000000000 <myadd>: 0: d37c1c21 ubfiz x1, x1, #4, #8 4: 8b000020 add x0, x1, x0 8: d65f03c0 ret With the patch the ubfiz will be merged into the add instruction: 0000000000000000 <myadd>: 0: 8b211000 add x0, x0, w1, uxtb #4 4: d65f03c0 ret gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_uxt_size): fix an off-by-one in checking the permissible shift-amount.
CohenArthur
pushed a commit
that referenced
this pull request
Nov 9, 2023
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
CohenArthur
pushed a commit
that referenced
this pull request
Feb 20, 2024
Here we have template<class T> auto is_throwable(T t) -> decltype(throw t, true) { ... } where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused the wrong overload to have been chosen. Jason figured out it's because we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266 says that an id-expression is move-eligible if "the id-expression (possibly parenthesized) is the operand of a throw-expression, and names an implicitly movable entity that belongs to a scope that does not contain the compound-statement of the innermost lambda-expression, try-block, or function-try-block (if any) whose compound-statement or ctor-initializer contains the throw-expression." I worked out that it's trying to say that given struct X { X(); X(const X&); X(X&&) = delete; }; the following should fail: the scope of the throw is an sk_try, and it's also x's scope S, and S "does not contain the compound-statement of the *try-block" so x is move-eligible, so we move, so we fail. void f () try { X x; throw x; // use of deleted function } catch (...) { } Whereas here: void g (X x) try { throw x; } catch (...) { } the throw is again in an sk_try, but x's scope is an sk_function_parms which *does* contain the {} of the *try-block, so x is not move-eligible, so we don't move, so we use X(const X&), and the code is fine. The current code also doesn't seem to handle void h (X x) { void z (decltype(throw x, true)); } where there's no enclosing lambda or sk_try so we should move. I'm not doing anything about lambdas because we shouldn't reach the code at the end of the function: the DECL_HAS_VALUE_EXPR_P check shouldn't let us go further. PR c++/113789 PR c++/113853 gcc/cp/ChangeLog: * typeck.cc (treat_lvalue_as_rvalue_p): Update code to better reflect [expr.prim.id.unqual]#4.2. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/sfinae69.C: Remove dg-bogus. * g++.dg/cpp0x/sfinae70.C: New test. * g++.dg/cpp0x/sfinae71.C: New test. * g++.dg/cpp0x/sfinae72.C: New test. * g++.dg/cpp2a/implicit-move4.C: New test.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Makefile of gcc directory will iterate
lang_OBJS
variable for detecting language frontend objects, so the standard way is to define this variable with all related object filenames.Besides, the missing of
lang_OBJS
may cause the unwilling issue, say, you can't runmake -j n
otherwise the compiler complainsinsn_modes.h
is missing. Now we can usemake -j n
.BTW, could you authorize me to push the branch directly? That's better than maintaining forks. And you can protect the master branch.