Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rust_OBJS to fix compiling error #4

Merged
merged 1 commit into from Apr 21, 2020
Merged

Add rust_OBJS to fix compiling error #4

merged 1 commit into from Apr 21, 2020

Conversation

NalaGinrut
Copy link
Contributor

@NalaGinrut NalaGinrut commented Apr 21, 2020

The Makefile of gcc directory will iterate lang_OBJS variable for detecting language frontend objects, so the standard way is to define this variable with all related object filenames.

Besides, the missing of lang_OBJS may cause the unwilling issue, say, you can't run make -j n otherwise the compiler complains insn_modes.h is missing. Now we can use make -j n.

BTW, could you authorize me to push the branch directly? That's better than maintaining forks. And you can protect the master branch.

@philberty philberty merged commit d044423 into Rust-GCC:master Apr 21, 2020
philberty pushed a commit that referenced this pull request Mar 3, 2021
This adds implementation for the optabs for complex operations.  With this the
following C code:

  void g (float complex a[restrict N], float complex b[restrict N],
	  float complex c[restrict N])
  {
    for (int i=0; i < N; i++)
      c[i] =  a[i] * b[i];
  }

generates

NEON:

g:
        vmov.f32        q11, #0.0  @ v4sf
        add     r3, r2, #1600
.L2:
        vmov    q8, q11  @ v4sf
        vld1.32 {q10}, [r1]!
        vld1.32 {q9}, [r0]!
        vcmla.f32       q8, q9, q10, #0
        vcmla.f32       q8, q9, q10, #90
        vst1.32 {q8}, [r2]!
        cmp     r3, r2
        bne     .L2
        bx      lr

MVE:

g:
        push    {lr}
        mov     lr, #100
        dls     lr, lr
.L2:
        vldrw.32        q1, [r1], #16
        vldrw.32        q2, [r0], #16
        vcmul.f32       q3, q2, q1, #0
        vcmla.f32       q3, q2, q1, #90
        vstrw.32        q3, [r2], #16
        le      lr, .L2
        ldr     pc, [sp], #4

instead of

g:
        add     r3, r2, #1600
.L2:
        vld2.32 {d20-d23}, [r0]!
        vld2.32 {d16-d19}, [r1]!
        vmul.f32        q14, q11, q9
        vmul.f32        q15, q11, q8
        vneg.f32        q14, q14
        vfma.f32        q15, q10, q9
        vfma.f32        q14, q10, q8
        vmov    q13, q15  @ v4sf
        vmov    q12, q14  @ v4sf
        vst2.32 {d24-d27}, [r2]!
        cmp     r3, r2
        bne     .L2
        bx      lr

and

g:
        add     r3, r2, #1600
.L2:
        vld2.32 {d20-d23}, [r0]!
        vld2.32 {d16-d19}, [r1]!
        vmul.f32        q15, q10, q8
        vmul.f32        q14, q10, q9
        vmls.f32        q15, q11, q9
        vmla.f32        q14, q11, q8
        vmov    q12, q15  @ v4sf
        vmov    q13, q14  @ v4sf
        vst2.32 {d24-d27}, [r2]!
        cmp     r3, r2
        bne     .L2
        bx      lr

respectively.

gcc/ChangeLog:

	* config/arm/iterators.md (rotsplit1, rotsplit2, conj_op, fcmac1,
	VCMLA_OP, VCMUL_OP): New.
	* config/arm/mve.md (mve_vcmlaq<mve_rot><mode>): Support vec_dup 0.
	* config/arm/neon.md (cmul<conj_op><mode>3): New.
	* config/arm/unspecs.md (UNSPEC_VCMLA_CONJ, UNSPEC_VCMLA180_CONJ,
	UNSPEC_VCMUL_CONJ): New.
	* config/arm/vec-common.md (cmul<conj_op><mode>3, arm_vcmla<rot><mode>,
	cml<fcmac1><conj_op><mode>4): New.
philberty pushed a commit that referenced this pull request Mar 3, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.
bors bot pushed a commit that referenced this pull request Sep 30, 2021
The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations.  So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.

Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.

On Arm this now permits optimizing

uint64_t bar64(const uint8_t *rData1)
{
    uint64_t buffer;
    memcpy(&buffer, rData1, sizeof(buffer));
    return buffer;
}

from
        ldr     r2, [r0]        @ unaligned
        sub     sp, sp, #8
        ldr     r3, [r0, #4]    @ unaligned
        strd    r2, [sp]
        ldrd    r0, [sp]
        add     sp, sp, #8

to
        mov     r3, r0
        ldr     r0, [r0]        @ unaligned
        ldr     r1, [r3, #4]    @ unaligned

PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations

gcc/ChangeLog:

	PR target/102125
	* gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
	memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
bors bot pushed a commit that referenced this pull request Jan 25, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
dkm pushed a commit that referenced this pull request Jan 6, 2023
With many thanks to H.J. for doing all the hard work, this patch resolves
two P1 regressions; PR target/106933 and PR target/106959.

Although superficially similar, the i386 backend's two scalar-to-vector
(STV) passes perform their transformations in importantly different ways.
The original pass converting SImode and DImode operations to V4SImode
or V2DImode operations is "soft", allowing values to be maintained in
both integer and vector hard registers.  The newer pass converting TImode
operations to V1TImode is "hard" (all or nothing) that converts all uses
of a pseudo to vector form.  To implement this it invokes powerful ju-ju
calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates
this pseudo's mode everywhere in the RTL chain.  Hence, TImode STV can only
be performed when all uses of a pseudo are convertible to V1TImode form.
To ensure this the STV passes currently use data-flow analysis to inspect
all DEFs and USEs in a chain.  This works fine for chains that are in
the usual single assignment form, but the occurrence of uninitialized
variables, or multiple assignments that split a pseudo's usage into
several independent chains (lifetimes) can lead to situations where
some but not all of a pseudo's occurrences need to be updated.  This is
safe for the SImode/DImode pass, but leads to the above bugs during
the TImode pass.

My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959
is to only perform the new single_def_chain_p check for TImode STV; it
turns out that STV of SImode/DImode min/max operates safely on multiple-def
chains, and prohibiting this leads to testsuite regressions.  We don't
(yet) support V1TImode min/max, so this idiom isn't an issue during the
TImode STV pass.

For the record, the two alternate possible fixes are (i) make the TImode
STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
with a new pseudo, or (ii) merging "chains" so that multiple DFA
chains/lifetimes are considered a single STV chain.

2022-12-23  H.J. Lu  <hjl.tools@gmail.com>
	    Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/106933
	PR target/106959
	* config/i386/i386-features.cc (single_def_chain_p): New predicate
	function to check that a pseudo's use-def chain is in SSA form.
	(timode_scalar_to_vector_candidate_p): Check that TImode regs that
	are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.

gcc/testsuite/ChangeLog
	PR target/106933
	PR target/106959
	* gcc.target/i386/pr106933-1.c: New test case.
	* gcc.target/i386/pr106933-2.c: Likewise.
	* gcc.target/i386/pr106959-1.c: Likewise.
	* gcc.target/i386/pr106959-2.c: Likewise.
	* gcc.target/i386/pr106959-3.c: Likewise.
CohenArthur pushed a commit that referenced this pull request Jan 31, 2023
The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

 * ADD (extend register)
 * ADDS (extended register)
 * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
    return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

0000000000000000 <myadd>:
   0:	d37c1c21 	ubfiz	x1, x1, #4, #8
   4:	8b000020 	add	x0, x1, x0
   8:	d65f03c0 	ret

With the patch the ubfiz will be merged into the add instruction:

0000000000000000 <myadd>:
   0:	8b211000 	add	x0, x0, w1, uxtb #4
   4:	d65f03c0 	ret

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_uxt_size): fix an
	off-by-one in checking the permissible shift-amount.
CohenArthur pushed a commit that referenced this pull request Nov 9, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
CohenArthur pushed a commit that referenced this pull request Feb 20, 2024
Here we have

  template<class T>
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
    X();
    X(const X&);
    X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
    X x;
    throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
    throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
    void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

	PR c++/113789
	PR c++/113853

gcc/cp/ChangeLog:

	* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
	reflect [expr.prim.id.unqual]#4.2.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
	* g++.dg/cpp0x/sfinae70.C: New test.
	* g++.dg/cpp0x/sfinae71.C: New test.
	* g++.dg/cpp0x/sfinae72.C: New test.
	* g++.dg/cpp2a/implicit-move4.C: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants