Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RTEMS support for GDB #2

Closed
heshamelmatary opened this issue Apr 20, 2015 · 4 comments
Closed

Adding RTEMS support for GDB #2

heshamelmatary opened this issue Apr 20, 2015 · 4 comments

Comments

@heshamelmatary
Copy link
Contributor

Hi,

Can this commit [1] be applied to "epiphany-gdb-7.8" branch (and maybe the default you're working with now)? If not please tell me to create a new pull request for it.

[1] b6c483b

Thanks,
Hesham

@olajep
Copy link
Contributor

olajep commented Apr 22, 2015

Fixed in e1c7583

@olajep olajep closed this as completed Apr 22, 2015
@heshamelmatary
Copy link
Contributor Author

Hi,

Thanks for merging the pull request(s). I am submitting patches to RTEMS Source Builder (RSB) to build the tools from these branches:

  • epiphany-binutils-2.23-software-cache
  • epiphany-gcc-4.9
  • epiphany-gdb-7.8

If these are not the correct (stable) ones kindly let me know. Also, are you going to push your work upstream? Basically RTEMS fetch from there and fetching from external repos (like Adapteva) is not very desirable.

@olajep
Copy link
Contributor

olajep commented Apr 22, 2015

Hi,

On 2015-04-22 14:11, Hesham ALMatary wrote:

Hi,

Thanks for merging the pull request(s). I am submitting patches to RTEMS Source Builder (RSB) to build the tools from these branches:

  • epiphany-binutils-2.23-software-cache
  • epiphany-gcc-4.9
  • epiphany-gdb-7.8

If these are not the correct (stable) ones kindly let me know.

epiphany-binutils-2.23-software-cache
epiphany-gcc-4.8-software-cache
epiphany-gdb-7.6

That configuration is used in both the eSDK 2014.11 and 2015.1 releases. Having said that we have had epiphany-gcc-4.9 on the master branch for a while and I have not seen any problems with it so far. (And we should bump to at least gdb 7.8 and binutils 2.24 when the build issue I mentioned earlier is sorted out)

Also, are you going to push your work upstream? Basically RTEMS fetch from there and fetching from external repos (like Adapteva) is not very desirable.

That is the idea but as much as we would like to it's not going to happen anytime soon. First we need to catch up with upstream for all toolchain components which will require some work. Other things have higher priority right now.

// Ola

@heshamelmatary
Copy link
Contributor Author

Thanks for the details. We will be working from the repos I mentioned above (tested and working fine with RTEMS), I just hope that my gotten-merged pull requests will be populated for any other newly-created branches for all the tools.

Thanks,
Hesham

P.S This is the last pending pull request for building epiphany-rtems4.11-gdb. heshamelmatary@acb8490

olajep pushed a commit that referenced this issue Nov 10, 2016
When I run process-dies-while-detaching.exp with GDBserver, I see many
warnings printed by GDBserver,

ptrace(regsets_fetch_inferior_registers) PID=26183: No such process
ptrace(regsets_fetch_inferior_registers) PID=26183: No such process
ptrace(regsets_fetch_inferior_registers) PID=26184: No such process
ptrace(regsets_fetch_inferior_registers) PID=26184: No such process

regsets_fetch_inferior_registers is called when GDBserver resumes each
lwp.

 #2  0x0000000000428260 in regsets_fetch_inferior_registers (regsets_info=0x4690d0 <aarch64_regsets_info>, regcache=0x31832020)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:5412
 #3  0x00000000004070e8 in get_thread_regcache (thread=0x31832940, fetch=fetch@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/regcache.c:58
 #4  0x0000000000429c40 in linux_resume_one_lwp_throw (info=<optimized out>, signal=0, step=0, lwp=0x31832830)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4463
 #5  linux_resume_one_lwp (lwp=0x31832830, step=<optimized out>, signal=<optimized out>, info=<optimized out>)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4573

The is the case that threads are disappeared when GDB/GDBserver resumes
them.  We check errno for ESRCH, and don't print error messages, like
what we are doing in regsets_store_inferior_registers.

gdb/gdbserver:

2016-08-04  Yao Qi  <yao.qi@linaro.org>

	* linux-low.c (regsets_fetch_inferior_registers): Check
	errno is ESRCH or not.
olajep pushed a commit that referenced this issue Nov 10, 2016
… out value

With something like:

  struct A { int bitfield:4; } var;

If 'var' ends up wholly-optimized out, printing 'var.bitfield' crashes
gdb here:

 (top-gdb) bt
 #0  0x000000000058b89f in extract_unsigned_integer (addr=0x2 <error: Cannot access memory at address 0x2>, len=2, byte_order=BFD_ENDIAN_LITTLE)
     at /home/pedro/gdb/mygit/src/gdb/findvar.c:109
 #1  0x00000000005a187a in unpack_bits_as_long (field_type=0x16cff70, valaddr=0x0, bitpos=16, bitsize=12) at /home/pedro/gdb/mygit/src/gdb/value.c:3347
 #2  0x00000000005a1b9d in unpack_value_bitfield (dest_val=0x1b5d9d0, bitpos=16, bitsize=12, valaddr=0x0, embedded_offset=0, val=0x1b5d8d0)
     at /home/pedro/gdb/mygit/src/gdb/value.c:3441
 #3  0x00000000005a2a5f in value_fetch_lazy (val=0x1b5d9d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3958
 #4  0x00000000005a10a7 in value_primitive_field (arg1=0x1b5d8d0, offset=0, fieldno=0, arg_type=0x16d04c0) at /home/pedro/gdb/mygit/src/gdb/value.c:3161
 #5  0x00000000005b01e5 in do_search_struct_field (name=0x1727c60 "bitfield", arg1=0x1b5d8d0, offset=0, type=0x16d04c0, looking_for_baseclass=0, result_ptr=0x7fffffffcaf8,
 [...]

unpack_value_bitfield is already optimized-out/unavailable -aware:

   (...) VALADDR points to the contents of VAL.  If the VAL's contents
   required to extract the bitfield from are unavailable/optimized
   out, DEST_VAL is correspondingly marked unavailable/optimized out.

however, it is not considering the case of the value having no
contents buffer at all, as can happen through
allocate_optimized_out_value.

gdb/ChangeLog:
2016-08-09  Pedro Alves  <palves@redhat.com>

	* value.c (unpack_value_bitfield): Skip unpacking if the parent
	has no contents buffer to begin with.

gdb/testsuite/ChangeLog:
2016-08-09  Pedro Alves  <palves@redhat.com>

	* gdb.dwarf2/bitfield-parent-optimized-out.exp: New file.
olajep pushed a commit that referenced this issue Nov 10, 2016
I build GDB with -fsanitize=address, and see the error in tests,

(gdb) PASS: gdb.linespec/ls-errs.exp: lang=C++: break 3 foo
break -line 3 foo^M
=================================================================^M
==4401==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000047487 at pc 0x819d8e bp 0x7fff4e4e6bb0 sp 0x7fff4e4e6ba8^M
READ of size 1 at 0x603000047487 thread T0^[[1m^[[0m^M
    #0 0x819d8d in explicit_location_lex_one /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:502^M
    #1 0x81a185 in string_to_explicit_location(char const**, language_defn const*, int) /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:556^M
    #2 0x81ac10 in string_to_event_location(char**, language_defn const*) /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:687^

the code in question is:

>         /* Special case: C++ operator,.  */
>         if (language->la_language == language_cplus
>             && strncmp (*inp, "operator", 8)  <--- [1]
>             && (*inp)[9] == ',')
>           (*inp) += 9;
>         ++(*inp);

The error is caused by the access to (*inp)[9] if 9 is out of its bounds.
However [1] looks odd to me, because if strncmp returns true (non-zero),
the following check "(*inp)[9] == ','" makes no sense any more.  I
suspect it was a typo in the code we meant to "strncmp () == 0".  Another
problem in the code above is that if *inp is "operator,", we first
increment *inp by 9, and then increment it by one again, which is wrong
to me.  We should only increment *inp by 8 to skip "operator", and go
back to the loop header to decide where we stop.

gdb:

2016-08-15  Yao Qi  <yao.qi@linaro.org>

	* location.c (explicit_location_lex_one): Compare the return
	value of strncmp with zero.  Don't check (*inp)[9].  Increment
	*inp by 8.
olajep pushed a commit that referenced this issue Nov 10, 2016
If I build gdb with -fsanitize=address and run tests, I get error,

malformed linespec error: unexpected colon^M
(gdb) PASS: gdb.linespec/ls-errs.exp: lang=C: break     :
break   :=================================================================^M
==3266==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000051451 at pc 0x2b5797a972a8 bp 0x7fffd8e0f3c0 sp 0x7fffd8e0f398^M
READ of size 2 at 0x602000051451 thread T0
    #0 0x2b5797a972a7 in __interceptor_strlen (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x322a7)^M
    #1 0x7bd004 in compare_filenames_for_search(char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:316^M
    #2 0x7bd310 in iterate_over_some_symtabs(char const*, char const*, int (*)(symtab*, void*), void*, compunit_symtab*, compunit_symtab*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:411^M
    #3 0x7bd775 in iterate_over_symtabs(char const*, int (*)(symtab*, void*), void*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:481^M
    #4 0x7bda15 in lookup_symtab(char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:527^M
    #5 0x7d5e2a in make_file_symbol_completion_list_1 /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5635^M
    #6 0x7d61e1 in make_file_symbol_completion_list(char const*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5684^M
    #7 0x88dc06 in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:288
....
0x602000051451 is located 0 bytes to the right of 1-byte region [0x602000051450,0x602000051451)^M
mallocated by thread T0 here:
    #0 0x2b5797ab97ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef)^M
    #1 0xbbfb8d in xmalloc /home/yao/SourceCode/gnu/gdb/git/gdb/common/common-utils.c:43^M
    #2 0x88dabd in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:273^M
    #3 0x88e5ef in location_completer(cmd_list_element*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:531^M
    #4 0x8902e7 in complete_line_internal /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:964^

The code in question is here

       file_to_match = (char *) xmalloc (colon - text + 1);
       strncpy (file_to_match, text, colon - text + 1);

it is likely that file_to_match is not null-terminated.  The patch is
to strncpy 'colon - text' bytes and explicitly set '\0'.

gdb:

2016-08-19  Yao Qi  <yao.qi@linaro.org>

	* completer.c (linespec_location_completer): Make file_to_match
	null-terminated.
olajep pushed a commit that referenced this issue Nov 10, 2016
This test case verifies that GDB will not attempt to invoke a python
unwinder recursively.

At the moment, the behavior exhibited by GDB looks like this:

    (gdb) source py-recurse-unwind.py
    Python script imported
    (gdb) b ccc
    Breakpoint 1 at 0x4004bd: file py-recurse-unwind.c, line 23.
    (gdb) run
    Starting program: py-recurse-unwind
    TestUnwinder: Recursion detected - returning early.
    TestUnwinder: Recursion detected - returning early.
    TestUnwinder: Recursion detected - returning early.
    TestUnwinder: Recursion detected - returning early.

    Breakpoint 1, ccc (arg=<unavailable>) at py-recurse-unwind.c:23
    23      }
    (gdb) bt
    #-1 ccc (arg=<unavailable>) at py-recurse-unwind.c:23
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)

[I've shortened pathnames for easier reading.]

The desired / expected behavior looks like this:

    (gdb) source py-recurse-unwind.py
    Python script imported
    (gdb) b ccc
    Breakpoint 1 at 0x4004bd: file py-recurse-unwind.c, line 23.
    (gdb) run
    Starting program: py-recurse-unwind

    Breakpoint 1, ccc (arg=789) at py-recurse-unwind.c:23
    23      }
    (gdb) bt
    #0  ccc (arg=789) at py-recurse-unwind.c:23
    #1  0x00000000004004d5 in bbb (arg=456) at py-recurse-unwind.c:28
    #2  0x00000000004004ed in aaa (arg=123) at py-recurse-unwind.c:34
    #3  0x00000000004004fe in main () at py-recurse-unwind.c:40

Note that GDB's problems go well beyond the fact that it invokes the
unwinder recursively.  In the process it messes up some internal state
(the frame stash) leading to display of (only) the sentinel frame in
the backtrace.

gdb/testsuite/ChangeLog:

	* gdb.python/py-recurse-unwind.c: New file.
	* gdb.python/py-recurse-unwind.py: New file.
	* gdb.python/py-recurse-unwind.exp: New file.
olajep pushed a commit that referenced this issue Nov 10, 2016
…_eval

This fixes the problem exercised by Kevin's test at:

 https://sourceware.org/ml/gdb-patches/2016-08/msg00216.html

This was originally exposed by the OpenJDK Python-based unwinder.

If an unwinder attempts to call parse_and_eval from within its
sniffing method, GDB's unwinding machinery enters infinite recursion.
However, parse_and_eval is a pretty reasonable thing to call, because
Python/Scheme-based unwinders will often need to read globals out of
inferior memory.  The recursion happens because:

- get_current_frame() is called soon after the target stops.

- current_frame is NULL, and so we unwind it from the sentinel frame
  (which is special and has level == -1).

- We reach get_prev_frame_if_no_cycle, which does cycle detection
  based on frame id, and thus tries to compute the frame id of the new
  frame.

- Frame id computation requires an unwinder, so we go through all
  unwinder sniffers trying to see if one accepts the new frame (the
  current frame).

- the unwinder's sniffer calls parse_and_eval().

- parse_and_eval depends on the selected frame/block, and if not set
  yet, the selected frame is set to the current frame.

- get_current_frame () is called again.  current_frame is still NULL,
  so ...

- recurse forever.


In Kevin's test at:

 https://sourceware.org/ml/gdb-patches/2016-08/msg00216.html

gdb doesn't recurse forever simply because the Python unwinder
contains code to detect and stop the recursion itself.  However, GDB
goes downhill from here, e.g., by showing the sentinel frame as
current frame (note the -1):

    Breakpoint 1, ccc (arg=<unavailable>) at py-recurse-unwind.c:23
    23      }
    (gdb) bt
    #-1 ccc (arg=<unavailable>) at py-recurse-unwind.c:23
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)

That "-1" frame level comes from this:

      if (catch_exceptions (current_uiout, unwind_to_current_frame,
			    sentinel_frame, RETURN_MASK_ERROR) != 0)
	{
	  /* Oops! Fake a current frame?  Is this useful?  It has a PC
             of zero, for instance.  */
	  current_frame = sentinel_frame;
	}

which is bogus.  It's never correct to set the current frame to the
sentinel frame.  The only reason this has survived so long is that
getting here normally indicates something wrong has already happened
before and we fix that.  And this case is no exception -- it doesn't
really matter how precisely we managed to get to that bogus code (it
has to do with the the stash), because anything after recursion
happens is going to be invalid.

So the fix is to avoid the recursion in the first place.

Observations:

 #1 - The recursion happens because we try to do cycle detection from
      within get_prev_frame_if_no_cycle.  That requires computing the
      frame id of the frame being unwound, and that itself requires
      calling into the unwinders.

 #2 - But, the first time we're unwinding from the sentinel frame,
      when we reach get_prev_frame_if_no_cycle, there's no frame chain
      at all yet:

      - current_frame is NULL.
      - the frame stash is empty.

Thus, there's really no need to do cycle detection the first time we
reach get_prev_frame_if_no_cycle, when building the current frame.

So we can break the recursion by making get_current_frame call a
simplified version of get_prev_frame_if_no_cycle that results in
setting the current_frame global _before_ computing the current
frame's id.

But, we can go a little bit further.  As there's really no reason
anymore to compute the current frame's frame id immediately, we can
defer computing it to when some caller of get_current_frame might need
it.  This was actually how the frame id was computed for all frames
before the stash-based cycle detection was added.  So in a way, this
patch reintroduces the lazy frame id computation, but unlike before,
only for the case of the current frame, which turns out to be special.

This lazyness, however, requires adjusting
gdb.python/py-unwind-maint.exp, because that assumes unwinders are
immediately called as side effect of some commands.  I didn't see a
need to preserve the behavior expected by that test (all it would take
is call get_frame_id inside get_current_frame), so I adjusted the
test.

gdb/ChangeLog:
2016-09-05  Pedro Alves  <palves@redhat.com>

	PR backtrace/19927
	* frame.c (get_frame_id): Compute the frame id if not computed
	yet.
	(unwind_to_current_frame): Delete.
	(get_current_frame): Use get_prev_frame_always_1 to get the
	current frame and assert that that always succeeds.
	(get_prev_frame_if_no_cycle): Skip cycle detection if returning
	the current frame.

gdb/testsuite/ChangeLog:
2016-09-05  Pedro Alves  <palves@redhat.com>

	PR backtrace/19927
	* gdb.python/py-unwind-maint.exp: Adjust tests to not expect that
	unwinders are immediately called as side effect of "source" or
	"disable unwinder" commands.
	* gdb.python/py-recurse-unwind.exp: Remove setup_kfail calls.
olajep pushed a commit that referenced this issue Nov 10, 2016
Previously:

        fmov d0, #2

would give an error:

        Operand 2 should be an integer register

whereas the user probably just forgot to add the ".0" to make:

        fmov d0, #2.0

This patch reports an invalid floating point constant unless the
operand is obviously a register.

The FPIMM8 handling is only relevant for SVE.  Without it:

        fmov z0, z1

would try to parse z1 as an integer immediate zero (the res2 path),
whereas it's more likely that the user forgot the predicate.  This is
tested by the final patch.

gas/
	* config/tc-aarch64.c (parse_aarch64_imm_float): Report a specific
	low-severity error for registers.
	(parse_operands): Report an invalid floating point constant for
	if parsing an FPIMM8 fails, and if no better error has been
	recorded.
	* testsuite/gas/aarch64/diagnostic.s,
	testsuite/gas/aarch64/diagnostic.l: Add tests for integer operands
	to FMOV.
olajep pushed a commit that referenced this issue Nov 10, 2016
Some SVE instructions count the number of elements in a given vector
pattern and allow a scale factor of [1, 16] to be applied to the result.
This scale factor is written ", MUL #n", where "MUL" is a new operator.
E.g.:

	UQINCD	X0, POW2, MUL #2

This patch adds support for this kind of operand.

All existing operators were shifts of some kind, so there was a natural
range of [0, 63] regardless of context.  This was then narrowered further
by later checks (e.g. to [0, 31] when used for 32-bit values).

In contrast, MUL doesn't really have a natural context-independent range.
Rather than pick one arbitrarily, it seemed better to make the "shift"
amount a full 64-bit value and leave the range test to the usual
operand-checking code.  I've rearranged the fields of aarch64_opnd_info
so that this doesn't increase the size of the structure (although I don't
think its size is critical anyway).

include/
	* opcode/aarch64.h (AARCH64_OPND_SVE_PATTERN_SCALED): New
	aarch64_opnd.
	(AARCH64_MOD_MUL): New aarch64_modifier_kind.
	(aarch64_opnd_info): Make shifter.amount an int64_t and
	rearrange the fields.

opcodes/
	* aarch64-tbl.h (AARCH64_OPERANDS): Add an entry for
	AARCH64_OPND_SVE_PATTERN_SCALED.
	* aarch64-opc.h (FLD_SVE_imm4): New aarch64_field_kind.
	* aarch64-opc.c (fields): Add a corresponding entry.
	(set_multiplier_out_of_range_error): New function.
	(aarch64_operand_modifiers): Add entry for AARCH64_MOD_MUL.
	(operand_general_constraint_met_p): Handle
	AARCH64_OPND_SVE_PATTERN_SCALED.
	(print_register_offset_address): Use PRIi64 to print the
	shift amount.
	(aarch64_print_operand): Likewise.  Handle
	AARCH64_OPND_SVE_PATTERN_SCALED.
	* aarch64-opc-2.c: Regenerate.
	* aarch64-asm.h (ins_sve_scale): New inserter.
	* aarch64-asm.c (aarch64_ins_sve_scale): New function.
	* aarch64-asm-2.c: Regenerate.
	* aarch64-dis.h (ext_sve_scale): New inserter.
	* aarch64-dis.c (aarch64_ext_sve_scale): New function.
	* aarch64-dis-2.c: Regenerate.

gas/
	* config/tc-aarch64.c (SHIFTED_MUL): New parse_shift_mode.
	(parse_shift): Handle it.  Reject AARCH64_MOD_MUL for all other
	shift modes.  Skip range tests for AARCH64_MOD_MUL.
	(process_omitted_operand): Handle AARCH64_OPND_SVE_PATTERN_SCALED.
	(parse_operands): Likewise.
olajep pushed a commit that referenced this issue Nov 10, 2016
This patch adds support for the new SVE floating-point immediate
operands.  One operand uses the same 8-bit encoding as base AArch64,
but in a different position.  The others use a single bit to select
between two values.

One of the single-bit operands is a choice between 0 and 1, where 0
is not a valid 8-bit encoding.  I think the cleanest way of handling
these single-bit immediates is therefore to use the IEEE float encoding
itself as the immediate value and select between the two possible values
when encoding and decoding.

As described in the covering note for the patch that added F_STRICT,
we get better error messages by accepting unsuffixed vector registers
and leaving the qualifier matching code to report an error.  This means
that we carry on parsing the other operands, and so can try to parse FP
immediates for invalid instructions like:

	fcpy	z0, #2.5

In this case there is no suffix to tell us whether the immediate should
be treated as single or double precision.  Again, we get better error
messages by picking one (arbitrary) immediate size and reporting an error
for the missing suffix later.

include/
	* opcode/aarch64.h (AARCH64_OPND_SVE_FPIMM8): New aarch64_opnd.
	(AARCH64_OPND_SVE_I1_HALF_ONE, AARCH64_OPND_SVE_I1_HALF_TWO)
	(AARCH64_OPND_SVE_I1_ZERO_ONE): Likewise.

opcodes/
	* aarch64-tbl.h (AARCH64_OPERANDS): Add entries for the new SVE FP
	immediate operands.
	* aarch64-opc.h (FLD_SVE_i1): New aarch64_field_kind.
	* aarch64-opc.c (fields): Add corresponding entry.
	(operand_general_constraint_met_p): Handle the new SVE FP immediate
	operands.
	(aarch64_print_operand): Likewise.
	* aarch64-opc-2.c: Regenerate.
	* aarch64-asm.h (ins_sve_float_half_one, ins_sve_float_half_two)
	(ins_sve_float_zero_one): New inserters.
	* aarch64-asm.c (aarch64_ins_sve_float_half_one): New function.
	(aarch64_ins_sve_float_half_two): Likewise.
	(aarch64_ins_sve_float_zero_one): Likewise.
	* aarch64-asm-2.c: Regenerate.
	* aarch64-dis.h (ext_sve_float_half_one, ext_sve_float_half_two)
	(ext_sve_float_zero_one): New extractors.
	* aarch64-dis.c (aarch64_ext_sve_float_half_one): New function.
	(aarch64_ext_sve_float_half_two): Likewise.
	(aarch64_ext_sve_float_zero_one): Likewise.
	* aarch64-dis-2.c: Regenerate.

gas/
	* config/tc-aarch64.c (double_precision_operand_p): New function.
	(parse_operands): Use it to calculate the dp_p input to
	parse_aarch64_imm_float.  Handle the new SVE FP immediate operands.
olajep pushed a commit that referenced this issue Nov 10, 2016
If xmalloc fails allocating memory, usually because something tried a
huge allocation, like xmalloc(-1) or some such, GDB asks the user what
to do:

  .../src/gdb/utils.c:1079: internal-error: virtual memory exhausted.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n)

If the user says "n", that throws a QUIT exception, which is caught by
one of the multiple CATCH(RETURN_MASK_ALL) blocks somewhere up the
stack.

The default implementations of operator new / operator new[] call
malloc directly, and on memory allocation failure throw
std::bad_alloc.  Currently, if that happens, since nothing catches it,
the exception escapes out of main, and GDB aborts from unhandled
exception.

This patch replaces the default operator new variants with versions
that, just like xmalloc:

 #1 - Raise an internal-error on memory allocation failure.

 #2 - Throw a QUIT gdb_exception, so that the exact same CATCH blocks
      continue handling memory allocation problems.

A minor complication of #2 is that operator new can _only_ throw
std::bad_alloc, or something that extends it:

  void* operator new (std::size_t size) throw (std::bad_alloc);

That means that if we let a gdb QUIT exception escape from within
operator new, the C++ runtime aborts due to unexpected exception
thrown.

So to bridge the gap, this patch adds a new gdb_quit_bad_alloc
exception type that inherits both std::bad_alloc and gdb_exception,
and throws _that_.

If we decide that we should be catching memory allocation errors in
fewer places than all the places we currently catch them (everywhere
we use RETURN_MASK_ALL currently), then we could change operator new
to throw plain std::bad_alloc then.  But I'm considering such a change
as separate matter from this one -- it'd make sense to do the same to
xmalloc at the same time, for instance.

Meanwhile, this allows using new/new[] instead of xmalloc/XNEW/etc.
without losing the "virtual memory exhausted" internal-error
safeguard.

Tested on x86_64 Fedora 23.

gdb/ChangeLog:
2016-09-23  Pedro Alves  <palves@redhat.com>

	* Makefile.in (SFILES): Add common/new-op.c.
	(COMMON_OBS): Add common/new-op.o.
	(new-op.o): New rule.
	* common/common-exceptions.h: Include <new>.
	(struct gdb_quit_bad_alloc): New type.
	* common/new-op.c: New file.

gdb/gdbserver/ChangeLog:
2016-09-23  Pedro Alves  <palves@redhat.com>

	* Makefile.in (SFILES): Add common/new-op.c.
	(OBS): Add common/new-op.o.
	(new-op.o): New rule.
olajep pushed a commit that referenced this issue Nov 10, 2016
gcc-6.2.1-2.fc24.x86_64

(gdb) backtrace 10^M
(gdb) FAIL: gdb.arch/i386-signal.exp: backtrace 10

(gdb) disas/s
Dump of assembler code for function main:
.../gdb/testsuite/gdb.arch/i386-signal.c:
30      {
   0x000000000040057f <+0>:     push   %rbp
   0x0000000000400580 <+1>:     mov    %rsp,%rbp
31        setup ();
   0x0000000000400583 <+4>:     callq  0x400590 <setup>
=> 0x0000000000400588 <+9>:     mov    $0x0,%eax
32      }
   0x000000000040058d <+14>:    pop    %rbp
   0x000000000040058e <+15>:    retq
End of assembler dump.

The .exp patch is an obvious typo fix I think.  The regex was written to
accept "ADDR in main" and I find it OK as checking .debug_line validity is not
the purpose of this testfile.

gcc-4.8.5-11.el7.x86_64 did not put the 'mov $0x0,%eax' instruction there at
all so there was no problem with .debug_line.

gdb/testsuite/ChangeLog
2016-10-05  Jan Kratochvil  <jan.kratochvil@redhat.com>

	* gdb.arch/i386-signal.exp (backtrace 10): Fix #2 typo.
olajep pushed a commit that referenced this issue Nov 10, 2016
Even though this was supposedly in the gdb 7.2 timeframe, the testcase
in PR11094 crashes current GDB with a segfault:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005ee894 in event_location_to_string (location=0x0) at
  src/gdb/location.c:412
  412       if (EL_STRING (location) == NULL)
  (top-gdb) bt
  #0  0x00000000005ee894 in event_location_to_string (location=0x0) at
  src/gdb/location.c:412
  #1  0x000000000057411a in print_breakpoint_location (b=0x18288e0, loc=0x0) at
  src/gdb/breakpoint.c:6201
  #2  0x000000000057483f in print_one_breakpoint_location (b=0x18288e0,
  loc=0x182cf10, loc_number=0, last_loc=0x7fffffffd258, allflag=1)
      at src/gdb/breakpoint.c:6473
  #3  0x00000000005751e1 in print_one_breakpoint (b=0x18288e0,
  last_loc=0x7fffffffd258, allflag=1) at
  src/gdb/breakpoint.c:6707
  #4  0x000000000057589c in breakpoint_1 (args=0x0, allflag=1, filter=0x0) at
  src/gdb/breakpoint.c:6947
  #5  0x0000000000575aa8 in maintenance_info_breakpoints (args=0x0, from_tty=0)
  at src/gdb/breakpoint.c:7026
  [...]

This is GDB trying to print the location spec of the JIT event
breakpoint, but that's an internal breakpoint without one.

If I add a NULL check, then we see that the JIT breakpoint is now
pending (because its location has shlib_disabled set):

  (gdb) maint info breakpoints
  Num     Type           Disp Enb Address            What
  [...]
  -8      jit events     keep y   <PENDING>           inf 1
  [...]

But that's incorrect.  GDB should have managed to recreate the JIT
breakpoint's location for the second run.  So the problem is
elsewhere.

The problem is that if the JIT loads at the same address on the second
run, we never recreate the JIT breakpoint, because we hit this early
return:

  static int
  jit_breakpoint_re_set_internal (struct gdbarch *gdbarch,
				  struct jit_program_space_data *ps_data)
  {
    [...]
    if (ps_data->cached_code_address == addr)
      return 0;

    [...]
      delete_breakpoint (ps_data->jit_breakpoint);
    [...]
    ps_data->jit_breakpoint = create_jit_event_breakpoint (gdbarch, addr);

Fix this by deleting the breakpoint and discarding the cached code
address when the objfile where the previous JIT breakpoint was found
is deleted/unloaded in the first place.

The test that was originally added for PR11094 doesn't trip on this
because:

  #1 - It doesn't test the case of the JIT descriptor's address _not_
       changing between reruns.

  #2 - And then it doesn't do "maint info breakpoints", or really
       anything with the JIT at all.

  #3 - and even then, to trigger the problem the JIT descriptor needs
       to be in a separate library, while the current test puts it in
       the main program.

The patch extends the test to cover all combinations of these
scenarios.

gdb/ChangeLog:
2016-10-06  Pedro Alves  <palves@redhat.com>

	* jit.c (free_objfile_data): Delete the JIT breakpoint and clear
	the cached code address.

gdb/testsuite/ChangeLog:
2016-10-06  Pedro Alves  <palves@redhat.com>

	* gdb.base/jit-simple-dl.c: New file.
	* gdb.base/jit-simple-jit.c: New file, factored out from ...
	* gdb.base/jit-simple.c: ... this.
	* gdb.base/jit-simple.exp (jit_run): Delete.
	(build_jit): New proc.
	(jit_test_reread): Recompile either the main program or the shared
	library, depending on what is being tested.  Skip changing address
	if caller wants to.  Compare before/after addresses.  If testing
	standalone, explicitly load the binary.  Test "maint info
	breakpoints".
	(top level): Add "standalone vs shared lib" and "change address"
	vs "same address" axes.
olajep pushed a commit that referenced this issue Nov 10, 2016
Nowadays, if we build GDB with -fsanitize=address, we can get the asan
error below,

(gdb) quit
=================================================================
==9723==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete) on 0x60200003bf70
    #0 0x7f88f3837527 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x55527)
    #1 0xac8e13 in __gnu_cxx::new_allocator<void (*)()>::deallocate(void (**)(), unsigned long) /usr/include/c++/4.9/ext/new_allocator.h:110
    #2 0xac8cc2 in __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::deallocate(std::allocator<void (*)()>&, void (**)(), unsigned long) /usr/include/c++/4.9/ext/alloc_traits.h:185
....
0x60200003bf70 is located 0 bytes inside of 8-byte region [0x60200003bf70,0x60200003bf78)
allocated by thread T0 here:
    #0 0x7f88f38367ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef)
    #1 0xbd2762 in operator new(unsigned long) /home/yao/SourceCode/gnu/gdb/git/gdb/common/new-op.c:42
    #2 0xac8edc in __gnu_cxx::new_allocator<void (*)()>::allocate(unsigned long, void const*) /usr/include/c++/4.9/ext/new_allocator.h:104
    #3 0xac8d81 in __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::allocate(std::allocator<void (*)()>&, unsigned long) /usr/include/c++/4.9/ext/alloc_traits.h:182

The reason for this is that we override operator new but don't override
operator delete.  This patch does the override if the code is NOT
compiled with asan.

gdb:

2016-10-25  Yao Qi  <yao.qi@linaro.org>

	PR gdb/20716
	* common/new-op.c (__has_feature): New macro.
	Don't override operator new if asan is used.
olajep pushed a commit that referenced this issue Nov 10, 2016
Currently GDB never sends more than one action per vCont packet, when
connected in non-stop mode.  A follow up patch will change that, and
it exposed a gdbserver problem with the vCont handling.

For example, this in non-stop mode:

  => vCont;s:p1.1;c
  <= OK

Should be equivalent to:

  => vCont;s:p1.1
  <= OK
  => vCont;c
  <= OK

But gdbserver currently doesn't handle this.  In the latter case,
"vCont;c" makes gdbserver clobber the previous step request.  This
patch fixes that.

Note the server side must ignore resume actions for the thread that
has a pending %Stopped notification (and any other threads with events
pending), until GDB acks the notification with vStopped.  Otherwise,
e.g., the following case is mishandled:

 #1 => g  (or any other packet)
 #2 <= [registers]
 #3 <= %Stopped T05 thread:p1.2
 #4 => vCont s:p1.1;c
 #5 <= OK

Above, the server must not resume thread p1.2 when it processes the
vCont.  GDB can't know that p1.2 stopped until it acks the %Stopped
notification.  (Otherwise it wouldn't send a default "c" action.)

(The vCont documentation already specifies this.)

Finally, special care must also be given to handling fork/vfork
events.  A (v)fork event actually tells us that two processes stopped
-- the parent and the child.  Until we follow the fork, we must not
resume the child.  Therefore, if we have a pending fork follow, we
must not send a global wildcard resume action (vCont;c).  We can still
send process-wide wildcards though.

(The comments above will be added as code comments to gdb in a follow
up patch.)

gdb/gdbserver/ChangeLog:
2016-10-26  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Link parent/child fork
	threads.
	(linux_wait_1): Unlink them.
	(linux_set_resume_request): Ignore resume requests for
	already-resumed and unhandled fork child threads.
	* linux-low.h (struct lwp_info) <fork_relative>: New field.
	* server.c (in_queued_stop_replies_ptid, in_queued_stop_replies):
	New functions.
	(handle_v_requests) <vCont>: Don't call require_running.
	* server.h (in_queued_stop_replies): New declaration.
olajep pushed a commit that referenced this issue Nov 10, 2016
Most of the time, the trace should be in one piece.  This case is handled fine
by GDB.  In some cases, however, there may be gaps in the trace.  They result
from trace decode errors or from overflows.

A gap in the trace means we lost an unknown amount of trace.  Gaps can be very
small, such as a few instructions in the same function, or they can be rather
big.  We may, for example, lose a few function calls or returns.  The trace may
continue in a different function and we likely don't know how we got there.

Even though we can't say how the program executed across a gap, higher levels
may not be impacted too much by it.  Let's assume we have functions a-e and a
trace that looks roughly like this:

  a
   \
    b                    b
     \                  /
      c   <gap>        c
                      /
                 d   d
                  \ /
                   e

Even though we can't say for sure, it is likely that b and c are the same
function instance before and after the gap.  This patch is trying to connect
the c and b function segments across the gap.

This will add a to the back trace of b on the right hand side.  The changes are
reflected in GDB's internal representation of the trace and will improve:

  - the output of "record function-call-history /c"
  - the output of "backtrace" in replay mode
  - source stepping in replay mode
    will be improved indirectly via the improved back trace

I don't have an automated test for this patch; decode errors will be fixed and
overflows occur sporadically and are quite rare.  I tested it by hacking GDB to
provoke a decode error and on the expected gap in the gdb.btrace/dlopen.exp
test.

The issue is that we can't predict where we will be able to re-sync in case of
errors.  For the expected decode error in gdb.btrace/dlopen.exp, for example, we
may be able to re-sync somewhere in dlclose, in test, in main, or not at all.

Here's one example run of gdb.btrace/dlopen.exp with and without this patch.

    (gdb) info record
    Active record target: record-btrace
    Recording format: Intel Processor Trace.
    Buffer size: 16kB.
    warning: Non-contiguous trace at instruction 66608 (offset = 0xa83, pc = 0xb7fdcc31).
    warning: Non-contiguous trace at instruction 66652 (offset = 0xa9b, pc = 0xb7fdcc31).
    warning: Non-contiguous trace at instruction 66770 (offset = 0xacb, pc = 0xb7fdcc31).
    warning: Non-contiguous trace at instruction 66966 (offset = 0xb60, pc = 0xb7ff5ee4).
    warning: Non-contiguous trace at instruction 66994 (offset = 0xb74, pc = 0xb7ff5f24).
    warning: Non-contiguous trace at instruction 67334 (offset = 0xbac, pc = 0xb7ff5e6d).
    warning: Non-contiguous trace at instruction 69022 (offset = 0xc04, pc = 0xb7ff60b3).
    warning: Non-contiguous trace at instruction 69116 (offset = 0xc1c, pc = 0xb7ff60b3).
    warning: Non-contiguous trace at instruction 69504 (offset = 0xc74, pc = 0xb7ff605d).
    warning: Non-contiguous trace at instruction 83648 (offset = 0xecc, pc = 0xb7ff6134).
    warning: Decode error (-13) at instruction 83876 (offset = 0xf48, pc = 0xb7fd6380): no memory mapped at this address.
    warning: Non-contiguous trace at instruction 83876 (offset = 0x11b7, pc = 0xb7ff1c70).
    Recorded 83948 instructions in 912 functions (12 gaps) for thread 1 (process 12996).
    (gdb) record instruction-history 83876, +2
    83876   => 0xb7fec46f <call_init.part.0+95>:    call   *%eax
    [decode error (-13): no memory mapped at this address]
    [disabled]
    83877      0xb7ff1c70 <_dl_close_worker.part.0+1584>:   nop

Without the patch, the trace is disconnected and the backtrace is short:

    (gdb) record goto 83876
    #0  0xb7fec46f in call_init.part () from /lib/ld-linux.so.2
    (gdb) backtrace
    #0  0xb7fec46f in call_init.part () from /lib/ld-linux.so.2
    #1  0xb7fec5d0 in _dl_init () from /lib/ld-linux.so.2
    #2  0xb7ff0fe3 in dl_open_worker () from /lib/ld-linux.so.2
    Backtrace stopped: not enough registers or memory available to unwind further
    (gdb) record goto 83877
    #0  0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2
    (gdb) backtrace
    #0  0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2
    #1  0xb7ff287a in _dl_close () from /lib/ld-linux.so.2
    #2  0xb7fc3d5d in dlclose_doit () from /lib/libdl.so.2
    #3  0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2
    #4  0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2
    #5  0xb7fc3d98 in dlclose () from /lib/libdl.so.2
    #6  0x0804860a in test ()
    #7  0x08048628 in main ()

With the patch, GDB is able to connect the trace pieces and we get a full
backtrace.

    (gdb) record goto 83876
    #0  0xb7fec46f in call_init.part () from /lib/ld-linux.so.2
    (gdb) backtrace
    #0  0xb7fec46f in call_init.part () from /lib/ld-linux.so.2
    #1  0xb7fec5d0 in _dl_init () from /lib/ld-linux.so.2
    #2  0xb7ff0fe3 in dl_open_worker () from /lib/ld-linux.so.2
    #3  0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2
    #4  0xb7ff02e2 in _dl_open () from /lib/ld-linux.so.2
    #5  0xb7fc3c65 in dlopen_doit () from /lib/libdl.so.2
    #6  0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2
    #7  0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2
    #8  0xb7fc3d0e in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
    #9  0xb7ff28ee in _dl_runtime_resolve () from /lib/ld-linux.so.2
    #10 0x0804841c in ?? ()
    #11 0x08048470 in dlopen@plt ()
    #12 0x080485a3 in test ()
    #13 0x08048628 in main ()
    (gdb) record goto 83877
    #0  0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2
    (gdb) backtrace
    #0  0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2
    #1  0xb7ff287a in _dl_close () from /lib/ld-linux.so.2
    #2  0xb7fc3d5d in dlclose_doit () from /lib/libdl.so.2
    #3  0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2
    #4  0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2
    #5  0xb7fc3d98 in dlclose () from /lib/libdl.so.2
    #6  0x0804860a in test ()
    #7  0x08048628 in main ()

It worked nicely in this case but it may, of course, also lead to weird
connections; it is a heuristic, after all.

It works best when the gap is small and the trace pieces are long.

gdb/
	* btrace.c (bfun_s): New typedef.
	(ftrace_update_caller): Print caller in debug dump.
	(ftrace_get_caller, ftrace_match_backtrace, ftrace_fixup_level)
	(ftrace_compute_global_level_offset, ftrace_connect_bfun)
	(ftrace_connect_backtrace, ftrace_bridge_gap, btrace_bridge_gaps): New.
	(btrace_compute_ftrace_bts): Pass vector of gaps.  Collect gaps.
	(btrace_compute_ftrace_pt): Likewise.
	(btrace_compute_ftrace): Split into this, ...
	(btrace_compute_ftrace_1): ... this, and ...
	(btrace_finalize_ftrace): ... this.  Call btrace_bridge_gaps.
olajep pushed a commit that referenced this issue Mar 25, 2019
Running gdbserver under Valgrind I get:

  ==26925== Conditional jump or move depends on uninitialised value(s)
  ==26925==    at 0x473E7F: i387_cache_to_xsave(regcache*, void*) (i387-fp.c:579)
  ==26925==    by 0x46E3ED: x86_fill_xstateregset(regcache*, void*) (linux-x86-low.c:418)
  ==26925==    by 0x45E747: regsets_store_inferior_registers(regsets_info*, regcache*) (linux-low.c:5456)
  ==26925==    by 0x45EEF8: linux_store_registers(regcache*, int) (linux-low.c:5731)
  ==26925==    by 0x426441: regcache_invalidate_thread(thread_info*) (regcache.c:89)
  ==26925==    by 0x45CCAF: linux_resume_one_lwp_throw(lwp_info*, int, int, siginfo_t*) (linux-low.c:4447)
  ==26925==    by 0x45CE2A: linux_resume_one_lwp(lwp_info*, int, int, siginfo_t*) (linux-low.c:4519)
  ==26925==    by 0x45E17C: proceed_one_lwp(thread_info*, lwp_info*) (linux-low.c:5216)
  ==26925==    by 0x45DC81: linux_resume_one_thread(thread_info*, bool) (linux-low.c:5031)
  ==26925==    by 0x45DD34: linux_resume(thread_resume*, unsigned long)::{lambda(thread_info*)#2}::operator()(thread_info*) const (linux-low.c:5095)
  ==26925==    by 0x462907: void for_each_thread<linux_resume(thread_resume*, unsigned long)::{lambda(thread_info*)#2}>(linux_resume(thread_resume*, unsigned long)::{lambda(thread_info*)#2}) (gdbthread.h:150)
  ==26925==    by 0x45DE62: linux_resume(thread_resume*, unsigned long) (linux-low.c:5093)
  ==26925==
  ==26925== Conditional jump or move depends on uninitialised value(s)
  ==26925==    at 0x473EBD: i387_cache_to_xsave(regcache*, void*) (i387-fp.c:586)
  ==26925==    by 0x46E3ED: x86_fill_xstateregset(regcache*, void*) (linux-x86-low.c:418)
  ==26925==    by 0x45E747: regsets_store_inferior_registers(regsets_info*, regcache*) (linux-low.c:5456)
  ==26925==    by 0x45EEF8: linux_store_registers(regcache*, int) (linux-low.c:5731)
  ==26925==    by 0x426441: regcache_invalidate_thread(thread_info*) (regcache.c:89)
  ==26925==    by 0x45CCAF: linux_resume_one_lwp_throw(lwp_info*, int, int, siginfo_t*) (linux-low.c:4447)
  ==26925==    by 0x45CE2A: linux_resume_one_lwp(lwp_info*, int, int, siginfo_t*) (linux-low.c:4519)
  ==26925==    by 0x45E17C: proceed_one_lwp(thread_info*, lwp_info*) (linux-low.c:5216)
  ==26925==    by 0x45DC81: linux_resume_one_thread(thread_info*, bool) (linux-low.c:5031)
  ==26925==    by 0x45DD34: linux_resume(thread_resume*, unsigned long)::{lambda(thread_info*)#2}::operator()(thread_info*) const (linux-low.c:5095)
  ==26925==    by 0x462907: void for_each_thread<linux_resume(thread_resume*, unsigned long)::{lambda(thread_info*)#2}>(linux_resume(thread_resume*, unsigned long)::{lambda(thread_info*)#2}) (gdbthread.h:150)
  ==26925==    by 0x45DE62: linux_resume(thread_resume*, unsigned long) (linux-low.c:5093)

The problem is a type/width mismatch in code like this, in
gdbserver/i387-fp.c:

  /* Some registers are 16-bit.  */
  collect_register_by_name (regcache, "fctrl", &val);
  fp->fctrl = val;

In the above code:

 #1 - 'val' is a 64-bit unsigned long.

 #2 - "fctrl" is 32-bit in the register cache, thus half of 'val' is
      left uninitialized by collect_register_by_name, which works with
      an untyped raw buffer output (i.e., void*).

 #3 - fp->fctrl is an unsigned short (16-bit).  For some such
      registers we're masking off the uninitialized bits with 0xffff,
      but not in all cases.

We end up in such a fragile situation because
collect_registers_by_name works with an untyped output buffer pointer,
making it easy to pass a pointer to a variable of the wrong size.

Fix this by using regcache_raw_get_unsigned instead (actually a new
regcache_raw_get_unsigned_by_name wrapper), which always returns a
zero-extended ULONGEST register value.  It ends up simplifying the
i387-tdep.c code a bit, even.

gdb/gdbserver/ChangeLog:
2018-07-11  Pedro Alves  <palves@redhat.com>

	* i387-fp.c (i387_cache_to_fsave, cache_to_fxsave)
	(i387_cache_to_xsave): Use regcache_raw_get_unsigned_by_name
	instead of collect_register_by_name.
	* regcache.c (regcache_raw_get_unsigned_by_name): New.
	* regcache.h (regcache_raw_get_unsigned_by_name): New.
olajep pushed a commit that referenced this issue Mar 25, 2019
… gdb/23377)

This fixes a gdb.base/multi-forks.exp regression with GDBserver.

Git commit f2ffa92 ("gdb: Eliminate the 'stop_pc' global") caused
the regression by exposing a latent bug in gdbserver.

The bug is that GDBserver's implementation of the D;PID packet
incorrectly assumes that the selected thread points to the process
being detached.  This happens via the any_persistent_commands call,
which calls current_process:

  (gdb) bt
  #0  0x000000000040a57e in internal_error(char const*, int, char const*, ...)
  (file=0x4a53c0 "src/gdb/gdbserver/inferiors.c", line=212, fmt=0x4a539e "%s:
  Assertion `%s' failed.") at src/gdb/gdbserver/../common/errors.c:54
  #1  0x0000000000420acf in current_process() () at
  src/gdb/gdbserver/inferiors.c:212
  #2  0x00000000004226a0 in any_persistent_commands() () at
  gdb/gdbserver/mem-break.c:308
  #3  0x000000000042cb43 in handle_detach(char*) (own_buf=0x6f0280 "D;62ea") at
  src/gdb/gdbserver/server.c:1210
  #4  0x0000000000433af3 in process_serial_event() () at
  src/gdb/gdbserver/server.c:4055
  #5  0x0000000000434878 in handle_serial_event(int, void*) (err=0,
  client_data=0x0)

The "eliminate stop_pc" commit exposes the problem because before that
commit, GDB's switch_to_thread always read the newly-selected thread's
PC, and that would end up forcing GDBserver's selected thread to
change accordingly as side effect.  After that commit, GDB no longer
reads the thread's PC, and GDBserver does not switch the thread.

Fix this by removing the assumption from GDBserver.

gdb/gdbserver/ChangeLog:
2018-07-11  Pedro Alves  <palves@redhat.com>

	PR gdb/23377
	* mem-break.c (any_persistent_commands): Add process_info
	parameter and use it instead of relying on the current process.
	Change return type to bool.
	* mem-break.h (any_persistent_commands): Add process_info
	parameter and change return type to bool.
	* server.c (handle_detach): Remove require_running_or_return call.
	Look up the process_info for the process we're about to detach.
	If not found, return back error to GDB.  Adjust
	any_persistent_commands call to pass down a process pointer.
olajep pushed a commit that referenced this issue Mar 25, 2019
…b/23379)

This commit fixes a 8.1->8.2 regression exposed by
gdb.python/py-evthreads.exp when testing with
--target_board=native-gdbserver.

gdb.log shows:

  src/gdb/thread.c:93: internal-error: thread_info* inferior_thread(): Assertion `tp' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n) FAIL: gdb.python/py-evthreads.exp: run to breakpoint 1 (GDB internal error)

A backtrace shows (frames #2 and #10 highlighted) that the assertion
fails when GDB is setting up the connection to the remote target, in
non-stop mode:

  #0  0x0000000000622ff0 in internal_error(char const*, int, char const*, ...) (file=0xc1ad98 "src/gdb/thread.c", line=93, fmt=0xc1ad20 "%s: Assertion `%s' failed.") at src/gdb/common/errors.c:54
  #1  0x000000000089567e in inferior_thread() () at src/gdb/thread.c:93
= #2  0x00000000004da91d in get_event_thread() () at src/gdb/python/py-threadevent.c:38
  #3  0x00000000004da9b7 in create_thread_event_object(_typeobject*, _object*) (py_type=0x11574c0 <continue_event_object_type>, thread=0x0)
      at src/gdb/python/py-threadevent.c:60
  #4  0x00000000004bf6fe in create_continue_event_object() () at src/gdb/python/py-continueevent.c:27
  #5  0x00000000004bf738 in emit_continue_event(ptid_t) (ptid=...) at src/gdb/python/py-continueevent.c:40
  #6  0x00000000004c7d47 in python_on_resume(ptid_t) (ptid=...) at src/gdb/python/py-inferior.c:108
  #7  0x0000000000485bfb in std::_Function_handler<void (ptid_t), void (*)(ptid_t)>::_M_invoke(std::_Any_data const&, ptid_t&&) (__functor=..., __args#0=...) at /usr/include/c++/7/bits/std_function.h:316
  #8  0x000000000089b416 in std::function<void (ptid_t)>::operator()(ptid_t) const (this=0x12aa600, __args#0=...)
      at /usr/include/c++/7/bits/std_function.h:706
  #9  0x000000000089aa0e in gdb::observers::observable<ptid_t>::notify(ptid_t) const (this=0x118a7a0 <gdb::observers::target_resumed>, args#0=...)
      at src/gdb/common/observable.h:106
= #10 0x0000000000896fbe in set_running(ptid_t, int) (ptid=..., running=1) at src/gdb/thread.c:880
  #11 0x00000000007f750f in remote_target::remote_add_thread(ptid_t, bool, bool) (this=0x12c5440, ptid=..., running=true, executing=true) at src/gdb/remote.c:2434
  #12 0x00000000007f779d in remote_target::remote_notice_new_inferior(ptid_t, int) (this=0x12c5440, currthread=..., executing=1)
      at src/gdb/remote.c:2515
  #13 0x00000000007f9c44 in remote_target::update_thread_list() (this=0x12c5440) at src/gdb/remote.c:3831
  #14 0x00000000007fb922 in remote_target::start_remote(int, int) (this=0x12c5440, from_tty=0, extended_p=0)
      at src/gdb/remote.c:4655
  #15 0x00000000007fd102 in remote_target::open_1(char const*, int, int) (name=0x1a4f45e "localhost:2346", from_tty=0, extended_p=0)
      at src/gdb/remote.c:5638
  #16 0x00000000007fbec1 in remote_target::open(char const*, int) (name=0x1a4f45e "localhost:2346", from_tty=0)
      at src/gdb/remote.c:4862

So on frame #10, we're marking a newly-discovered thread as running,
and that causes the Python API to emit a gdb.ContinueEvent.
gdb.ContinueEvent is a gdb.ThreadEvent, and as such includes the event
thread as the "inferior_thread" attribute.  The problem is that when
we get to frame #3/#4, we lost all references to the thread that is
being marked as running.  create_continue_event_object assumes that it
is the current thread, which is not true in this case.

Fix this by passing down the right thread in
create_continue_event_object.  Also remove
create_thread_event_object's default argument and have the only other
caller left pass down the right thread explicitly too.

gdb/ChangeLog:
2018-08-24  Pedro Alves  <palves@redhat.com>
	    Simon Marchi  <simon.marchi@ericsson.com>

	PR gdb/23379
	* python/py-continueevent.c: Include "gdbthread.h".
	(create_continue_event_object): Add intro comment.  Add 'ptid'
	parameter.  Use it to find thread to pass to
	create_thread_event_object.
	(emit_continue_event): Pass PTID down to
	create_continue_event_object.
	* python/py-event.h (py_get_event_thread): Declare.
	(create_thread_event_object): Remove default from 'thread'
	parameter.
	* python/py-stopevent.c (create_stop_event_object): Use
	py_get_event_thread.
	* python/py-threadevent.c (get_event_thread): Rename to ...
	(py_get_event_thread): ... this, make extern, add 'ptid' parameter
	and use it to find the thread.
	(create_thread_event_object): Assert that THREAD isn't null.
	Don't find the event thread here.
olajep pushed a commit that referenced this issue Mar 25, 2019
This change adds an optional output parameter BLOCK to
find_pc_partial_function.  If BLOCK is non-null, then *BLOCK will be
set to the address of the block corresponding to the function symbol
if such a symbol was found during lookup.  Otherwise it's set to the
NULL value.  Callers may wish to use the block information to
determine whether the block contains any non-contiguous ranges.  The
caller may also iterate over or examine those ranges.

When I first started looking at the broken stepping behavior associated
with functions w/ non-contiguous ranges, I found that I could "fix"
the problem by disabling the find_pc_partial_function cache.  It would
sometimes happen that the PC passed in would be between the low and
high cache values, but would be in some other function that happens to
be placed in between the ranges for the cached function.  This caused
incorrect values to be returned.

So dealing with this cache turns out to be very important for fixing
this problem.  I explored three different ways of dealing with the
cache.

My first approach was to clear the cache when a block was encountered
with more than one range.  This would cause the non-cache pathway to
be executed on the next call to find_pc_partial_function.

Another approach, which I suspect is slightly faster, checks to see
whether the PC is within one of the ranges associated with the cached
block.  If so, then the cached values can be used.  It falls back to
the original behavior if there is no cached block.

The current approach, suggested by Simon Marchi, is to restrict the
low/high pc values recorded for the cache to the beginning and end of
the range containing the PC value under consideration.  This allows us
to retain the simple (and fast) test for determining whether the
memoized (cached) values apply to the PC passed to
find_pc_partial_function.

Another choice that had to be made regards setting *ADDRESS and
*ENDADDR.  There are three possibilities which might make sense:

1) *ADDRESS and *ENDADDR represent the lowest and highest address
   of the function.
2) *ADDRESS and *ENDADDR are set to the start and end address of
   the range containing the entry pc.
3) *ADDRESS and *ENDADDR are set to the start and end address of
   the range in which PC is found.

An earlier version of this patch implemented option #1.  I found out
that it's not very useful though and, in fact, returns results that
are incorrect when used in the context of determining the start and
end of the function for doing prologue analysis.  While debugging a
function in which the entry pc was in the second range (of a function
containing two non-contiguous ranges), I noticed that
amd64_skip_prologue called find_pc_partial_function - the returned
start address was set to the beginning of the first range.  This is
incorrect for this function.  What was also interesting was that this
first invocation of find_pc_partial_function correctly set the cache
for the PC on which it had been invoked, but a slightly later call
from skip_prologue_using_sal could not use this cached value because
it was now being used to lookup the very lowest address of the
function - which is in a range not containing the entry pc.

Option #2 is attractive as it would provide a desirable result
when used in the context of prologue analysis.  However, many callers,
including some which do prologue analysis want the condition
*ADDRESS <= PC < *ENDADDR to hold.  This will not be the case when
find_pc_partial_function is called on a PC that's in a non-entry-pc
range.  A later patch to this series adds
find_function_entry_range_from_pc as a wrapper of
find_pc_partial_function.

Option #3 causes the *ADDRESS <= PC < *ENDADDR property to hold.  If
find_pc_partial_function is called with a PC that's within entry pc's
range, then it will correctly return the limits of that range.  So, if
the result of a minsym search is passed to find_pc_partial_function
to find the limits, then correct results will be achieved.  Returned
limits (for prologue analysis) won't be correct when PC is within some
other (non-entry-pc) range.  I don't yet know how big of a problem
this might be; I'm guessing that it won't be a serious problem - if a
compiler generates functions which have non-contiguous ranges, then it
also probably generates DWARF2 CFI which makes a lot of the old
prologue analysis moot.

I've implemented option #3 for this version of the patch.  I don't see
any regressions for x86-64.  Moreover, I don't expect to see
regressions for other targets either simply because
find_pc_partial_function behaves the same as it did before for the
contiguous address range case.  That said, there may be some
adjustments needed if GDB encounters a function requiring prologue
analysis which occupies non-contiguous ranges.

gdb/ChangeLog:

	* symtab.h (find_pc_partial_function): Add new parameter `block'.
	* blockframe.c (cache_pc_function_block): New static global.
	(clear_pc_function_cache): Clear cache_pc_function_block.
	(find_pc_partial_function): Move comment to symtab.h.  Add
	support for non-contiguous blocks.
olajep pushed a commit that referenced this issue Mar 25, 2019
#1- Check that the warning is emitted.

#2- Avoid overriding INTERNAL_GDBFLAGS, as per documentated in
    gdb/testsuite/README:

 ~~~
 The testsuite does not override a value provided by the user.
 ~~~

We don't actually need to tweak INTERNAL_GDBFLAGS, we just need to
append out -data-directory to GDBFLAGS, because each passed
-data-directory option leads to a call to the warning:

 $ ./gdb -data-directory=foo -data-directory=bar
 Warning: foo: No such file or directory.
 Warning: bar: No such file or directory.
 [...]

2018-11-19  Pedro Alves  <palves@redhat.com>

	* gdb.base/warning.exp: Don't override INTERNAL_FLAGS.  Use
	gdb_spawn_with_cmdline_opts instead of gdb_start.  Check that we
	see the expected warning.
olajep pushed a commit that referenced this issue Mar 25, 2019
First of all, I would like to express my gratitude to Keith Seitz, Jan
Kratochvil and Tom Tromey, who were really kind and helped a lot with
this bug.  The patch itself was authored by Jan.

This all began with:

  https://bugzilla.redhat.com/show_bug.cgi?id=1639242
  py-bt is broken, results in exception

In summary, the error reported by the bug above is:

  $ gdb -args python3
  GNU gdb (GDB) Fedora 8.1.1-3.fc28
  (...)
  Reading symbols from python3...Reading symbols from /usr/lib/debug/usr/bin/python3.6-3.6.6-1.fc28.x86_64.debug...done.
  done.
  Dwarf Error: could not find partial DIE containing offset 0x316 [in module /usr/lib/debug/usr/bin/python3.6-3.6.6-1.fc28.x86_64.debug]

After a long investigation, and after thinking that the problem might
actually be on DWZ's side, we were able to determine that there's
something wrong going on when
dwarf2read.c:dwarf2_find_containing_comp_unit performs a binary search
over all of the CUs belonging to an objfile in order to find the CU
which contains a DIE at an specific offset.  The current algorithm is:

  static struct dwarf2_per_cu_data *
  dwarf2_find_containing_comp_unit (sect_offset sect_off,
				    unsigned int offset_in_dwz,
				    struct dwarf2_per_objfile *dwarf2_per_objfile)
  {
    struct dwarf2_per_cu_data *this_cu;
    int low, high;
    const sect_offset *cu_off;

    low = 0;
    high = dwarf2_per_objfile->all_comp_units.size () - 1;
    while (high > low)
      {
	struct dwarf2_per_cu_data *mid_cu;
	int mid = low + (high - low) / 2;

	mid_cu = dwarf2_per_objfile->all_comp_units[mid];
	cu_off = &mid_cu->sect_off;
	if (mid_cu->is_dwz > offset_in_dwz
	    || (mid_cu->is_dwz == offset_in_dwz && *cu_off >= sect_off))
	  high = mid;
	else
	  low = mid + 1;
      }

For the sake of this example, let's consider that "sect_off =
0x7d".

There are a few important things going on here.  First,
"dwarf2_per_objfile->all_comp_units ()" will be sorted first by
whether the CU is a DWZ CU, and then by cu->sect_off.  In this
specific bug, "offset_in_dwz" is false, which means that, for the most
part of the loop, we're going to do "high = mid" (i.e, we'll work with
the lower part of the vector).

In our particular case, when we reach the part where "mid_cu->is_dwz
== offset_in_dwz" (i.e, both are false), we end up with "high = 2" and
"mid = 1".  I.e., there are only 2 elements in the vector who are not
DWZ.  The vector looks like this:

  #0: cu->sect_off = 0;   length = 114;  is_dwz = false  <-- low
  #1: cu->sect_off = 114; length = 7796; is_dwz = false  <-- mid
  #2: cu->sect_off = 0;   length = 28;   is_dwz = true   <-- high
  ...

The CU we want is #1, which is exactly where "mid" is.  Also, #1 is
not DWZ, which is also exactly what we want.  So we perform the second
comparison:

  (mid_cu->is_dwz == offset_in_dwz && *cu_off >= sect_off)
                                      ^^^^^^^^^^^^^^^^^^^

Because "*cu_off = 114" and "sect_off = 0x7d", this evaluates to
false, so we end up with "low = mid + 1 = 2", which actually gives us
the wrong CU (i.e., a CU that is DWZ).  Next in the code, GDB does:

    gdb_assert (low == high);
    this_cu = dwarf2_per_objfile->all_comp_units[low];
    cu_off = &this_cu->sect_off;
    if (this_cu->is_dwz != offset_in_dwz || *cu_off > sect_off)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      {
	if (low == 0 || this_cu->is_dwz != offset_in_dwz)
	  error (_("Dwarf Error: could not find partial DIE containing "
		 "offset %s [in module %s]"),
		 sect_offset_str (sect_off),
		 bfd_get_filename (dwarf2_per_objfile->objfile->obfd));
	...

Triggering the error we saw in the original bug report.

It's important to notice that we see the error message because the
selected CU is a DWZ one, but we're looking for a non-DWZ CU here.
However, even when the selected CU is *not* a DWZ (and we don't see
any error message), we still end up with the wrong CU.  For example,
suppose that the vector had:

  #0: cu->sect_off = 0;    length = 114;  is_dwz = false
  #1: cu->sect_off = 114;  length = 7796; is_dwz = false
  #2: cu->sect_off = 7910; length = 28;   is_dwz = false
  ...

I.e., #2's "is_dwz" is false instead of true.  In this case, we still
want #1, because that's where the DIE is located.  After the loop ends
up in #2, we have "is_dwz" as false, which is what we wanted, so we
compare offsets.  In this case, "7910 >= 0x7d", so we set "mid = high
= 2".  Next iteration, we have "mid = 0 + (2 - 0) / 2 = 1", and thus
we examining #1.  "is_dwz" is still false, but "114 >= 0x7d" also
evaluates to false, so "low = mid + 1 = 2", which makes the loop stop.
Therefore, we end up choosing #2 as our CU, even though #1 is the
right one.

The problem here is happening because we're comparing "sect_off"
directly against "*cu_off", while we should actually be comparing
against "*cu_off + mid_cu->length" (i.e., the end offset):

  ...
  || (mid_cu->is_dwz == offset_in_dwz
      && *cu_off + mid_cu->length >= sect_off))
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ...

And this is what the patch does.  The idea is that if GDB is searching
for an offset that falls above the *end* of the CU being
analyzed (i.e., "mid"), then the next iteration should try a
higher-offset CU next.  The previous algorithm was using
the *beginning* of the CU.

Unfortunately, I could not devise a testcase for this problem, so I am
proposing a fix with this huge explanation attached to it in the hope
that it is sufficient.  After talking a bit to Keith (our testcase
guru), it seems that one would have to create an objfile with both DWZ
and non-DWZ sections, which may prove very hard to do, I think.

I ran this patch on our BuildBot, and no regressions were detected.

gdb/ChangeLog:
2018-11-30  Jan Kratochvil  <jan.kratochvil@redhat.com>
	    Keith Seitz  <keiths@redhat.com>
	    Tom Tromey  <tom@tromey.com>
	    Sergio Durigan Junior  <sergiodj@redhat.com>

	https://bugzilla.redhat.com/show_bug.cgi?id=1613614
	PR gdb/24003
	* dwarf2read.c (dwarf2_find_containing_comp_unit): Add
	'mid_cu->length' to '*cu_off' when checking if 'sect_off' is
	inside the CU.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
On async targets, a synchronous attach is done like this:

   adapteva#1 - target_attach is called (PTRACE_ATTACH is issued)
   adapteva#2 - a continuation is installed
   adapteva#3 - we go back to the event loop
   adapteva#4 - target reports stop (SIGSTOP), event loop wakes up, and
        attach continuation is called
   adapteva#5 - among other things, the continuation calls
        target_terminal_inferior, which removes stdin from the event
        loop

Note that in adapteva#3, GDB is still processing user input.  If the user is
fast enough, e.g., with something like:

  echo -e "attach PID\nset xxx=1" | gdb

... then the "set" command is processed before the attach completes.

We get worse behavior even, if input is a tty and therefore
readline/editing is enabled, with e.g.,:

 (gdb) attach PID\nset xxx=1

we then crash readline/gdb, with:

 Attaching to program: attach-wait-input, process 14537
 readline: readline_callback_read_char() called with no handler!
 Aborted
 $

Fix this by calling target_terminal_inferior before adapteva#3 above.

The test covers both scenarios by running with editing/readline forced
to both on and off.

gdb/
2014-07-09  Pedro Alves  <palves@redhat.com>

	* infcmd.c (attach_command_post_wait): Don't call
	target_terminal_inferior here.
	(attach_command): Call it here instead.

gdb/testsuite/
2014-07-09  Pedro Alves  <palves@redhat.com>

	* gdb.base/attach-wait-input.exp: New file.
	* gdb.base/attach-wait-input.c: New file.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
Here's an example, with the new test:

 gdbserver :9999 gdb.threads/kill
 gdb gdb.threads/kill
 (gdb) b 52
 Breakpoint 1 at 0x4007f4: file kill.c, line 52.
 Continuing.

 Breakpoint 1, main () at kill.c:52
 52        return 0; /* set break here */
 (gdb) k
 Kill the program being debugged? (y or n) y

 gdbserver :9999 gdb.threads/kill
 Process gdb.base/watch_thread_num created; pid = 9719
 Listening on port 1234
 Remote debugging from host 127.0.0.1
 Killing all inferiors
 Segmentation fault (core dumped)

Backtrace:

 (gdb) bt
 #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
 adapteva#1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
 adapteva#2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
 adapteva#3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
 adapteva#4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
 adapteva#5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
 adapteva#6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
 adapteva#7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
 adapteva#8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
 adapteva#9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
 adapteva#10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283

The problem is that linux_wait_for_event deletes lwps that have exited
(even those not passed in as lwps of interest), while the lwp/thread
list is being walked on with find_inferior.  find_inferior can handle
the current iterated inferior being deleted, but not others.

When killing lwps, we don't really care about any of the pending
status handling of linux_wait_for_event.  We can just waitpid the lwps
directly, which is also what GDB does (see
linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
while we're walking the list.  They'll be deleted by linux_mourn
afterwards.

This crash triggers several times when running the testsuite against
GDBserver with the native-gdbserver board (target remote), but as GDB
can't distinguish between GDBserver crashing and "kill" being
sucessful, as in both cases the connection is closed (the 'k' packet
doesn't require a reply), and the inferior is gone, that results in no
FAIL.

The patch adds a generic test that catches the issue with
extended-remote mode (and works fine with native testing too).  Here's
how it fails with the native-extended-gdbserver board without the fix:

 (gdb) info threads
   Id   Target Id         Frame
   6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
   2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
 * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
 (gdb) kill
 Kill the program being debugged? (y or n) y
 Remote connection closed
 ^^^^^^^^^^^^^^^^^^^^^^^^
 (gdb) FAIL: gdb.threads/kill.exp: kill

Extended remote should remain connected after the kill.

gdb/gdbserver/
2014-07-11  Pedro Alves  <palves@redhat.com>

	* linux-low.c (kill_wait_lwp): New function, based on
	kill_one_lwp_callback, but use my_waitpid directly.
	(kill_one_lwp_callback, linux_kill): Use it.

gdb/testsuite/
2014-07-11  Pedro Alves  <palves@redhat.com>

	* gdb.threads/kill.c: New file.
	* gdb.threads/kill.exp: New file.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
The tests at
<https://sourceware.org/ml/gdb-patches/2014-07/msg00277.html> show
that comparing a fully optimized out value's contents with a value
that has not been optimized out, or is partially optimized out crashes
GDB:

 (gdb) bt
 #0  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:816
 adapteva#1  0x00000000005a1914 in memcmp_with_bit_offsets (ptr1=0x202b2f0 "\n", offset1_bits=0, ptr2=0x0, offset2_bits=0, length_bits=32)
     at /home/pedro/gdb/mygit/build/../src/gdb/value.c:678
 adapteva#2  0x00000000005a1a05 in value_available_contents_bits_eq (val1=0x2361ad0, offset1=0, val2=0x23683b0, offset2=0, length=32)
     at /home/pedro/gdb/mygit/build/../src/gdb/value.c:717
 adapteva#3  0x00000000005a1c09 in value_available_contents_eq (val1=0x2361ad0, offset1=0, val2=0x23683b0, offset2=0, length=4)
     at /home/pedro/gdb/mygit/build/../src/gdb/value.c:769
 adapteva#4  0x00000000006033ed in read_frame_arg (sym=0x1b78d20, frame=0x19bca50, argp=0x7fff4aba82b0, entryargp=0x7fff4aba82d0)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:416
 adapteva#5  0x0000000000603abb in print_frame_args (func=0x1b78cb0, frame=0x19bca50, num=-1, stream=0x1aea450) at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:671
 adapteva#6  0x0000000000604ae8 in print_frame (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, print_args=1, sal=...)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:1205
 adapteva#7  0x0000000000604050 in print_frame_info (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, print_args=1, set_current_sal=1)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:857
 adapteva#8  0x00000000006029b3 in print_stack_frame (frame=0x19bca50, print_level=0, print_what=SRC_AND_LOC, set_current_sal=1)
     at /home/pedro/gdb/mygit/build/../src/gdb/stack.c:169
 adapteva#9  0x00000000005fc4b8 in print_stop_event (ws=0x7fff4aba8790) at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:6068
 adapteva#10 0x00000000005fc830 in normal_stop () at /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:6214

The 'ptr2=0x0' in frame adapteva#1 is val2->contents, and since git 4f14910:

    gdb/ChangeLog
    2013-11-26  Andrew Burgess  <aburgess@broadcom.com>

        * value.c (allocate_optimized_out_value): Mark value as non-lazy.

... a fully optimized-out value can have it's value contents buffer
NULL.

As a spotgap fix, revert 4f14910, with a comment.  A full fix would
be too invasive for 7.8.

gdb/
2014-07-22  Pedro Alves  <palves@redhat.com>

	* value.c (allocate_optimized_out_value): Don't mark value as
	non-lazy.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
The TUI currently crashes when the user types <return> in response to
a pagination prompt:

  $ gdb --tui ...
  *the TUI is now active*
  (gdb) set height 2
  (gdb) help
  List of classes of commands:

  Program received signal SIGSEGV, Segmentation fault.
  strlen () at ../sysdeps/x86_64/strlen.S:106
  106             movdqu  (%rax), %xmm12

  (top-gdb) bt
  #0  strlen () at ../sysdeps/x86_64/strlen.S:106
  adapteva#1  0x000000000086be5f in xstrdup (s=0x0) at ../src/libiberty/xstrdup.c:33
  adapteva#2  0x00000000005163f9 in tui_prep_terminal (notused1=1) at ../src/gdb/tui/tui-io.c:296
  adapteva#3  0x000000000077a7ee in _rl_callback_newline () at ../src/readline/callback.c:82
  adapteva#4  0x000000000077a853 in rl_callback_handler_install (prompt=0x0, linefunc=0x618b60 <command_line_handler>) at ../src/readline/callback.c:102
  adapteva#5  0x0000000000718a5c in gdb_readline_wrapper_cleanup (arg=0xfd14d0) at ../src/gdb/top.c:788
  adapteva#6  0x0000000000596d08 in do_my_cleanups (pmy_chain=0xcf0b38 <cleanup_chain>, old_chain=0x1043d10) at ../src/gdb/cleanups.c:155
  adapteva#7  0x0000000000596d75 in do_cleanups (old_chain=0x1043d10) at ../src/gdb/cleanups.c:177
  adapteva#8  0x0000000000718bd9 in gdb_readline_wrapper (prompt=0x7fffffffcfa0 "---Type <return> to continue, or q <return> to quit---")
      at ../src/gdb/top.c:835
  adapteva#9  0x000000000071cf74 in prompt_for_continue () at ../src/gdb/utils.c:1894
  adapteva#10 0x000000000071d434 in fputs_maybe_filtered (linebuffer=0x1043db0 "List of classes of commands:\n\n", stream=0xf72e20, filter=1)
      at ../src/gdb/utils.c:2111
  adapteva#11 0x000000000071da0f in vfprintf_maybe_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n", args=0x7fffffffd118, filter=1)
      at ../src/gdb/utils.c:2339
  #12 0x000000000071da4a in vfprintf_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n", args=0x7fffffffd118)
      at ../src/gdb/utils.c:2347
  #13 0x000000000071dc72 in fprintf_filtered (stream=0xf72e20, format=0x89aef8 "List of classes of %scommands:\n\n") at ../src/gdb/utils.c:2399
  #14 0x00000000004f90ab in help_list (list=0xe6d100, cmdtype=0x89ad8c "", class=all_classes, stream=0xf72e20)
      at ../src/gdb/cli/cli-decode.c:1038
  #15 0x00000000004f8dba in help_cmd (arg=0x0, stream=0xf72e20) at ../src/gdb/cli/cli-decode.c:946

Git 0017922 added:

    @@ -776,6 +777,12 @@ gdb_readline_wrapper_cleanup (void *arg)

     gdb_assert (input_handler == gdb_readline_wrapper_line);
     input_handler = cleanup->handler_orig;
  +
  +  /* Reinstall INPUT_HANDLER in readline, without displaying a
  +     prompt.  */
  +  if (async_command_editing_p)
  +    rl_callback_handler_install (NULL, input_handler);

and tui_prep_terminal simply misses handling the case of a NULL
rl_prompt.

I also checked that readline's sources do similar checks.

gdb/
2014-07-24  Pedro Alves  <palves@redhat.com>

	* tui/tui-io.c (tui_prep_terminal): Handle NULL rl_prompt.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
https://bugzilla.redhat.com/show_bug.cgi?id=1126177

ERROR: AddressSanitizer: SEGV on unknown address 0x000000000050 (pc 0x000000992bef sp 0x7ffff9039530 bp 0x7ffff9039540
T0)
    #0 0x992bee in value_type .../gdb/value.c:925
    adapteva#1 0x87c951 in py_print_single_arg python/py-framefilter.c:445
    adapteva#2 0x87cfae in enumerate_args python/py-framefilter.c:596
    adapteva#3 0x87e0b0 in py_print_args python/py-framefilter.c:968

It crashes because frame_arg::val is documented it may contain NULL
(frame_arg::error is then non-NULL) but the code does not handle it.

Another bug is that py_print_single_arg() calls goto out of its TRY_CATCH
which messes up GDB cleanup chain crashing GDB later.

It is probably 7.7 regression (I have not verified it) due to the introduction
of Python frame filters.

gdb/ChangeLog

	PR python/17355
	* python/py-framefilter.c (py_print_single_arg): Handle NULL FA->VAL.
	Fix goto out of TRY_CATCH.

gdb/testsuite/ChangeLog

	PR python/17355
	* gdb.python/amd64-py-framefilter-invalidarg.S: New file.
	* gdb.python/py-framefilter-invalidarg-gdb.py.in: New file.
	* gdb.python/py-framefilter-invalidarg.exp: New file.
	* gdb.python/py-framefilter-invalidarg.py: New file.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
…nd crashes readline/GDB

Jan caught an intermittent GDB crash with the annota1.exp test:

 Starting program: .../gdb/testsuite/gdb.base/annota1 ^M
 [...]
 FAIL: gdb.base/annota1.exp: run until main breakpoint (timeout)
 [...]
 readline: readline_callback_read_char() called with no handler!^M
 ERROR: Process no longer exists

All we need to is to continue the inferior in the foreground, and type
a command while the inferior is running.  E.g.:

 (gdb) set annotate 2

 ▒▒pre-prompt
 (gdb)
 ▒▒prompt
 c

 ▒▒post-prompt
 Continuing.

 ▒▒starting

 ▒▒frames-invalid

 *inferior is running now*

 p 1<ret>

 readline: readline_callback_read_char() called with no handler!
 Aborted (core dumped)
 $


When we run a foreground execution command we call
target_terminal_inferior to stop GDB from processing input, and to put
the inferior's terminal settings in effect.  Then we tell readline to
hide the prompt with display_gdb_prompt, which clears readline's input
callback too.  When the target stops, we call target_terminal_ours,
which re-installs stdin in the event loop, and then we redisplay the
prompt, reinstalling the readline callbacks.

However, when annotations are in effect, the "frames-invalid"
annotation code calls target_terminal_ours after 'resume' had already
called target_terminal_inferior:

 (top-gdb) bt
 #0  0x000000000056b82f in annotate_frames_invalid () at gdb/annotate.c:219
 adapteva#1  0x000000000072e6cc in reinit_frame_cache () at gdb/frame.c:1705
 adapteva#2  0x0000000000594bb9 in registers_changed_ptid (ptid=...) at gdb/regcache.c:612
 adapteva#3  0x000000000064cca1 in target_resume (ptid=..., step=1, signal=GDB_SIGNAL_0) at gdb/target.c:2136
 adapteva#4  0x00000000005f57af in resume (step=1, sig=GDB_SIGNAL_0) at gdb/infrun.c:2263
 adapteva#5  0x00000000005f6051 in proceed (addr=18446744073709551615, siggnal=GDB_SIGNAL_DEFAULT, step=1) at gdb/infrun.c:2613

And then once we hide the prompt and remove readline's input handler
callback, we're in a bad state.  We end up with the target running
supposedly in the foreground, but with stdin still installed on the
event loop.  Any input then calls into readline, which aborts because
no rl_linefunc callback handler is installed:

 Program received signal SIGABRT, Aborted.
 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);

 (top-gdb) bt
 #0  0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 adapteva#1  0x0000003b36a36f68 in __GI_abort () at abort.c:89
 During symbol reading, debug info gives source 9 included from file at zero line 0.
 During symbol reading, debug info gives command-line macro definition with non-zero line 19: _STDC_PREDEF_H 1.
 adapteva#2  0x0000000000784a25 in rl_callback_read_char () at src/readline/callback.c:116
 adapteva#3  0x0000000000619111 in rl_callback_read_char_wrapper (client_data=0x0) at src/gdb/event-top.c:167
 adapteva#4  0x00000000006194e7 in stdin_event_handler (error=0, client_data=0x0) at src/gdb/event-top.c:373
 adapteva#5  0x00000000006180da in handle_file_event (data=...) at src/gdb/event-loop.c:763
 adapteva#6  0x00000000006175c1 in process_event () at src/gdb/event-loop.c:340
 adapteva#7  0x0000000000617688 in gdb_do_one_event () at src/gdb/event-loop.c:404
 adapteva#8  0x00000000006176d8 in start_event_loop () at src/gdb/event-loop.c:429
 adapteva#9  0x0000000000619143 in cli_command_loop (data=0x0) at src/gdb/event-top.c:182
 adapteva#10 0x000000000060f4c8 in current_interp_command_loop () at src/gdb/interps.c:318
 adapteva#11 0x0000000000610691 in captured_command_loop (data=0x0) at src/gdb/main.c:323
 #12 0x000000000060c385 in catch_errors (func=0x610676 <captured_command_loop>, func_args=0x0, errstring=0x900241 "", mask=RETURN_MASK_ALL)
     at src/gdb/exceptions.c:237
 #13 0x0000000000611b8f in captured_main (data=0x7fffffffd7b0) at src/gdb/main.c:1151
 #14 0x000000000060c385 in catch_errors (func=0x610a8e <captured_main>, func_args=0x7fffffffd7b0, errstring=0x900241 "", mask=RETURN_MASK_ALL)
     at src/gdb/exceptions.c:237
 #15 0x0000000000611bb8 in gdb_main (args=0x7fffffffd7b0) at src/gdb/main.c:1159
 #16 0x000000000045ef57 in main (argc=3, argv=0x7fffffffd8b8) at src/gdb/gdb.c:32

The fix is to make the annotation code call target_terminal_inferior
again after printing, if the inferior's settings were in effect.

While at it, when we're doing output only, instead of
target_terminal_ours, we should call target_terminal_ours_for_output.
The latter doesn't actually remove stdin from the event loop, and also
leaves SIGINT forwarded to the target.

New test included.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/
2014-10-17  Pedro Alves  <palves@redhat.com>

	PR gdb/17472
	* annotate.c (annotate_breakpoints_invalid): Use
	target_terminal_our_for_output instead of target_terminal_ours.
	Give back the terminal to the target.
	(annotate_frames_invalid): Likewise.

gdb/testsuite/
2014-10-17  Pedro Alves  <palves@redhat.com>

	PR gdb/17472
	* gdb.base/annota-input-while-running.c: New file.
	* gdb.base/annota-input-while-running.exp: New file.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
If all threads in the target were already running when the user does
"c -a", nothing puts the inferior's terminal settings in effect and
removes stdin from the event loop, which we must when running a
foreground command.  The result is that user input afterwards crashes
readline/gdb:

 (gdb) start
 Temporary breakpoint 1 at 0x4005d4: file continue-all-already-running.c, line 23.
 Starting program: continue-all-already-running

 Temporary breakpoint 1, main () at continue-all-already-running.c:23
 23        sleep (10);
 (gdb) c -a&
 Continuing.
 (gdb) c -a
 Continuing.
 p 1
 readline: readline_callback_read_char() called with no handler!
 Aborted (core dumped)
 $

Backtrace:

 Program received signal SIGABRT, Aborted.
 0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
 (top-gdb) p 1
 $1 = 1
 (top-gdb) bt
 #0  0x0000003b36a35877 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 adapteva#1  0x0000003b36a36f68 in __GI_abort () at abort.c:89
 adapteva#2  0x0000000000784aa9 in rl_callback_read_char () at readline/callback.c:116
 adapteva#3  0x0000000000619181 in rl_callback_read_char_wrapper (client_data=0x0) at gdb/event-top.c:167
 adapteva#4  0x0000000000619557 in stdin_event_handler (error=0, client_data=0x0) at gdb/event-top.c:373
 adapteva#5  0x000000000061814a in handle_file_event (data=...) at gdb/event-loop.c:763
 adapteva#6  0x0000000000617631 in process_event () at gdb/event-loop.c:340
 adapteva#7  0x00000000006176f8 in gdb_do_one_event () at gdb/event-loop.c:404
 adapteva#8  0x0000000000617748 in start_event_loop () at gdb/event-loop.c:429
 adapteva#9  0x00000000006191b3 in cli_command_loop (data=0x0) at gdb/event-top.c:182
 adapteva#10 0x000000000060f538 in current_interp_command_loop () at gdb/interps.c:318
 adapteva#11 0x0000000000610701 in captured_command_loop (data=0x0) at gdb/main.c:323
 #12 0x000000000060c3f5 in catch_errors (func=0x6106e6 <captured_command_loop>, func_args=0x0, errstring=0x9002c1 "", mask=RETURN_MASK_ALL)
     at gdb/exceptions.c:237
 #13 0x0000000000611bff in captured_main (data=0x7fffffffd780) at gdb/main.c:1151
 #14 0x000000000060c3f5 in catch_errors (func=0x610afe <captured_main>, func_args=0x7fffffffd780, errstring=0x9002c1 "", mask=RETURN_MASK_ALL)
     at gdb/exceptions.c:237
 #15 0x0000000000611c28 in gdb_main (args=0x7fffffffd780) at gdb/main.c:1159
 #16 0x000000000045ef97 in main (argc=5, argv=0x7fffffffd888) at gdb/gdb.c:32
 (top-gdb)

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/
2014-10-17  Pedro Alves  <palves@redhat.com>

	PR gdb/17300
	* infcmd.c (continue_1): If continuing all threads in the
	foreground, make sure the inferior's terminal settings are put in
	effect.

gdb/testsuite/
2014-10-17  Pedro Alves  <palves@redhat.com>

	PR gdb/17300
	* gdb.base/continue-all-already-running.c: New file.
	* gdb.base/continue-all-already-running.exp: New file.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

 (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
 continue &
 Continuing.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
 [New Thread 28092.28092]

 [Thread 28092.28092] adapteva#2 stopped.
 [New Thread 28094.28094]
 [Inferior 2 (process 28092) exited normally]
 [New Thread 28094.28105]
 [New Thread 28094.28109]

...

[Thread 28174.28174] #18 stopped.
 [New Thread 28185.28185]
 [Inferior 10 (process 28174) exited normally]
 [New Thread 28185.28196]

 [Thread 28185.28185] #20 stopped.
 Cannot remove breakpoints because program is no longer writable.
 Further execution is probably impossible.
 [Inferior 11 (process 28185) exited normally]
 [Inferior 1 (process 28091) exited normally]
 PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
 info threads
 No threads.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
 info inferiors
   Num  Description       Executable
 * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left

All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.

The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event.  GDB later decides whether to follow the parent
or the child.  But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:

...
 >>>> entering unstop_all_lwps
 unstopping all lwps
 proceed_one_lwp: lwp 1600
    client wants LWP to remain 1600 stopped
 proceed_one_lwp: lwp 1828
 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Sending sigstop to lwp 1828
 pc is 0x3615ebc7cc
 Resuming lwp 1828 (continue, signal 0, stop expected)
   continue from pc 0x3615ebc7cc
 unstop_all_lwps done
 sigchld_handler
 <<<< exiting unstop_all_lwps
 handling possible target event
 >>>> entering linux_wait_1
 linux_wait_1: [<all threads>]
 my_waitpid (-1, 0x40000001)
 my_waitpid (-1, 0x1): status(137f), 1828
 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
 LLW: waitpid 1828 received Stopped (signal) (stopped)
 pc is 0x3615ebc7cc
 Expected stop.
 LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
 linux_wait_1 ret = LWP 1828.1828, 1, 0
 <<<< exiting linux_wait_1
 Writing resume reply for LWP 1828.1828:1
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Tested on x86_64 Fedora 20, extended-remote.

gdb/gdbserver/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the child's last
	reported status to TARGET_WAITKIND_STOPPED.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
Making all-stop run on top of non-stop caused a small regression
in behavior. This was observed on x86_64-linux. The attached testcase
is in C whereas the investigation was done with an Ada program,
but it's the same scenario, and using a C testcase allows wider testing.
Basically: I am debugging a single-threaded program, and currently
stopped inside a function provided by a shared-library, at a line
calling a subprogram provided by a second shared library, and trying
to "next" over that function call.

Before we changed the default all-stop behavior, we had:

    7             Impl_Initialize;  -- Stop here and try "next" over this line
    (gdb) n
    8             return 5;  <<-- OK

But now, "next" just stops much earlier:

    (gdb) n
    0x00007ffff7bd8560 in impl.initialize@plt () from /[...]/lib/libpck.so

What happens is that next stops at a call instruction, which calls
the function's PLT, and GDB fails to notice that the inferior stepped
into a subroutine, and so decides that we're done. We can see another
symptom of the same issue by looking at the backtrace at the point
GDB stopped:

    (gdb) bt
    #0  0x00007ffff7bd8560 in impl.initialize@plt ()
       from /[...]/lib/libpck.so
    adapteva#1  0x00000000f7bd86f9 in ?? ()
    adapteva#2  0x00007fffffffdf50 in ?? ()
    adapteva#3  0x0000000000401893 in a () at /[...]/a.adb:7
    Backtrace stopped: frame did not save the PC

With a functioning GDB, the backtrace looks like the following instead:

    #0  0x00007ffff7bd8560 in impl.initialize@plt ()
       from /[...]/lib/libpck.so
    adapteva#1  0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7
    adapteva#2  0x0000000000401893 in a () at /[...]/a.adb:7

Note how, for frame adapteva#1, the address looks quite similar, except
for the high-order bits not being set:

    adapteva#1  0x00007ffff7bd86f9 in sub () at /[...]/pck.adb:7   <<<--  OK
    adapteva#1  0x00000000f7bd86f9 in ?? ()                        <<<--  WRONG
              ^^^^
              ||||
              Wrong

Investigating this further led me to displaced stepping.
As we are "next"-ing from a location where a breakpoint is inserted,
we need to step out of it, and since we're on non-stop mode, we need
to do it using displaced stepping. And looking at
amd64-tdep.c:amd64_displaced_step_fixup, I found the code that handles
the return address:

    regcache_cooked_read_unsigned (regs, AMD64_RSP_REGNUM, &rsp);
    retaddr = read_memory_unsigned_integer (rsp, retaddr_len, byte_order);
    retaddr = (retaddr - insn_offset) & 0xffffffffUL;

The mask used to compute retaddr looks wrong to me, keeping only
4 bytes instead of 8, and explains why the high order bits of
the backtrace are unset. What happens is that, after the displaced
stepping has completed, GDB restores that return address at the location
where the program expects it.  But because the top half bits of
the address have been masked out, the return address is now invalid.
The incorrect behavior of the "next" command and the backtrace at
that location are the first symptoms of that.  Another symptom is
that this actually alters the behavior of the program, where a "cont"
from there soon leads to a SEGV when the inferior tries to jump back
to that incorrect return address:

    (gdb) c
    Continuing.

    Program received signal SIGSEGV, Segmentation fault.
    0x00000000f7bd86f9 in ?? ()
    ^^^^^^^^^^^^^^^^^^

This patch fixes the issue by using a mask that seems more appropriate
for this architecture.

gdb/ChangeLog:

        * amd64-tdep.c (amd64_displaced_step_fixup): Fix the mask used to
        compute RETADDR.

gdb/testsuite/ChangeLog:

        * gdb.base/dso2dso-dso2.c, gdb.base/dso2dso-dso2.h,
        gdb.base/dso2dso-dso1.c, gdb.base/dso2dso-dso1.h, gdb.base/dso2dso.c,
        gdb.base/dso2dso.exp: New files.

Tested on x86_64-linux, no regression.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
-fsanitize=address
gdb.base/attach-pie-noexec.exp

==32586==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200004ed90 at pc 0x48ad50 bp 0x7ffceb3aef50 sp 0x7ffceb3aef20
READ of size 2 at 0x60200004ed90 thread T0
    #0 0x48ad4f in __interceptor_strlen (/home/jkratoch/redhat/gdb-test-asan/gdb/gdb+0x48ad4f)
    adapteva#1 0xeafe5c in xstrdup xstrdup.c:33
    adapteva#2 0x85e024 in attach_command /home/jkratoch/redhat/gdb-test-asan/gdb/infcmd.c:2680

regressed by:

commit 6c4486e
Author: Pedro Alves <palves@redhat.com>
Date:   Fri Oct 17 13:31:26 2014 +0100
    PR gdb/17471: Repeating a background command makes it foreground

gdb/ChangeLog
2015-08-04  Jan Kratochvil  <jan.kratochvil@redhat.com>

	PR gdb/18767
	* infcmd.c (attach_command): Move ARGS_CHAIN cleanup after last ARGS
	use.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
When I run process-dies-while-detaching.exp with GDBserver, I see many
warnings printed by GDBserver,

ptrace(regsets_fetch_inferior_registers) PID=26183: No such process
ptrace(regsets_fetch_inferior_registers) PID=26183: No such process
ptrace(regsets_fetch_inferior_registers) PID=26184: No such process
ptrace(regsets_fetch_inferior_registers) PID=26184: No such process

regsets_fetch_inferior_registers is called when GDBserver resumes each
lwp.

 adapteva#2  0x0000000000428260 in regsets_fetch_inferior_registers (regsets_info=0x4690d0 <aarch64_regsets_info>, regcache=0x31832020)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:5412
 adapteva#3  0x00000000004070e8 in get_thread_regcache (thread=0x31832940, fetch=fetch@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/regcache.c:58
 adapteva#4  0x0000000000429c40 in linux_resume_one_lwp_throw (info=<optimized out>, signal=0, step=0, lwp=0x31832830)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4463
 adapteva#5  linux_resume_one_lwp (lwp=0x31832830, step=<optimized out>, signal=<optimized out>, info=<optimized out>)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4573

The is the case that threads are disappeared when GDB/GDBserver resumes
them.  We check errno for ESRCH, and don't print error messages, like
what we are doing in regsets_store_inferior_registers.

gdb/gdbserver:

2016-08-04  Yao Qi  <yao.qi@linaro.org>

	* linux-low.c (regsets_fetch_inferior_registers): Check
	errno is ESRCH or not.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
… out value

With something like:

  struct A { int bitfield:4; } var;

If 'var' ends up wholly-optimized out, printing 'var.bitfield' crashes
gdb here:

 (top-gdb) bt
 #0  0x000000000058b89f in extract_unsigned_integer (addr=0x2 <error: Cannot access memory at address 0x2>, len=2, byte_order=BFD_ENDIAN_LITTLE)
     at /home/pedro/gdb/mygit/src/gdb/findvar.c:109
 adapteva#1  0x00000000005a187a in unpack_bits_as_long (field_type=0x16cff70, valaddr=0x0, bitpos=16, bitsize=12) at /home/pedro/gdb/mygit/src/gdb/value.c:3347
 adapteva#2  0x00000000005a1b9d in unpack_value_bitfield (dest_val=0x1b5d9d0, bitpos=16, bitsize=12, valaddr=0x0, embedded_offset=0, val=0x1b5d8d0)
     at /home/pedro/gdb/mygit/src/gdb/value.c:3441
 adapteva#3  0x00000000005a2a5f in value_fetch_lazy (val=0x1b5d9d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3958
 adapteva#4  0x00000000005a10a7 in value_primitive_field (arg1=0x1b5d8d0, offset=0, fieldno=0, arg_type=0x16d04c0) at /home/pedro/gdb/mygit/src/gdb/value.c:3161
 adapteva#5  0x00000000005b01e5 in do_search_struct_field (name=0x1727c60 "bitfield", arg1=0x1b5d8d0, offset=0, type=0x16d04c0, looking_for_baseclass=0, result_ptr=0x7fffffffcaf8,
 [...]

unpack_value_bitfield is already optimized-out/unavailable -aware:

   (...) VALADDR points to the contents of VAL.  If the VAL's contents
   required to extract the bitfield from are unavailable/optimized
   out, DEST_VAL is correspondingly marked unavailable/optimized out.

however, it is not considering the case of the value having no
contents buffer at all, as can happen through
allocate_optimized_out_value.

gdb/ChangeLog:
2016-08-09  Pedro Alves  <palves@redhat.com>

	* value.c (unpack_value_bitfield): Skip unpacking if the parent
	has no contents buffer to begin with.

gdb/testsuite/ChangeLog:
2016-08-09  Pedro Alves  <palves@redhat.com>

	* gdb.dwarf2/bitfield-parent-optimized-out.exp: New file.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
I build GDB with -fsanitize=address, and see the error in tests,

(gdb) PASS: gdb.linespec/ls-errs.exp: lang=C++: break 3 foo
break -line 3 foo^M
=================================================================^M
==4401==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000047487 at pc 0x819d8e bp 0x7fff4e4e6bb0 sp 0x7fff4e4e6ba8^M
READ of size 1 at 0x603000047487 thread T0^[[1m^[[0m^M
    #0 0x819d8d in explicit_location_lex_one /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:502^M
    adapteva#1 0x81a185 in string_to_explicit_location(char const**, language_defn const*, int) /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:556^M
    adapteva#2 0x81ac10 in string_to_event_location(char**, language_defn const*) /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:687^

the code in question is:

>         /* Special case: C++ operator,.  */
>         if (language->la_language == language_cplus
>             && strncmp (*inp, "operator", 8)  <--- [1]
>             && (*inp)[9] == ',')
>           (*inp) += 9;
>         ++(*inp);

The error is caused by the access to (*inp)[9] if 9 is out of its bounds.
However [1] looks odd to me, because if strncmp returns true (non-zero),
the following check "(*inp)[9] == ','" makes no sense any more.  I
suspect it was a typo in the code we meant to "strncmp () == 0".  Another
problem in the code above is that if *inp is "operator,", we first
increment *inp by 9, and then increment it by one again, which is wrong
to me.  We should only increment *inp by 8 to skip "operator", and go
back to the loop header to decide where we stop.

gdb:

2016-08-15  Yao Qi  <yao.qi@linaro.org>

	* location.c (explicit_location_lex_one): Compare the return
	value of strncmp with zero.  Don't check (*inp)[9].  Increment
	*inp by 8.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
If I build gdb with -fsanitize=address and run tests, I get error,

malformed linespec error: unexpected colon^M
(gdb) PASS: gdb.linespec/ls-errs.exp: lang=C: break     :
break   :=================================================================^M
==3266==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000051451 at pc 0x2b5797a972a8 bp 0x7fffd8e0f3c0 sp 0x7fffd8e0f398^M
READ of size 2 at 0x602000051451 thread T0
    #0 0x2b5797a972a7 in __interceptor_strlen (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x322a7)^M
    adapteva#1 0x7bd004 in compare_filenames_for_search(char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:316^M
    adapteva#2 0x7bd310 in iterate_over_some_symtabs(char const*, char const*, int (*)(symtab*, void*), void*, compunit_symtab*, compunit_symtab*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:411^M
    adapteva#3 0x7bd775 in iterate_over_symtabs(char const*, int (*)(symtab*, void*), void*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:481^M
    adapteva#4 0x7bda15 in lookup_symtab(char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:527^M
    adapteva#5 0x7d5e2a in make_file_symbol_completion_list_1 /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5635^M
    adapteva#6 0x7d61e1 in make_file_symbol_completion_list(char const*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5684^M
    adapteva#7 0x88dc06 in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:288
....
0x602000051451 is located 0 bytes to the right of 1-byte region [0x602000051450,0x602000051451)^M
mallocated by thread T0 here:
    #0 0x2b5797ab97ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef)^M
    adapteva#1 0xbbfb8d in xmalloc /home/yao/SourceCode/gnu/gdb/git/gdb/common/common-utils.c:43^M
    adapteva#2 0x88dabd in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:273^M
    adapteva#3 0x88e5ef in location_completer(cmd_list_element*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:531^M
    adapteva#4 0x8902e7 in complete_line_internal /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:964^

The code in question is here

       file_to_match = (char *) xmalloc (colon - text + 1);
       strncpy (file_to_match, text, colon - text + 1);

it is likely that file_to_match is not null-terminated.  The patch is
to strncpy 'colon - text' bytes and explicitly set '\0'.

gdb:

2016-08-19  Yao Qi  <yao.qi@linaro.org>

	* completer.c (linespec_location_completer): Make file_to_match
	null-terminated.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
Even though this was supposedly in the gdb 7.2 timeframe, the testcase
in PR11094 crashes current GDB with a segfault:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005ee894 in event_location_to_string (location=0x0) at
  src/gdb/location.c:412
  412       if (EL_STRING (location) == NULL)
  (top-gdb) bt
  #0  0x00000000005ee894 in event_location_to_string (location=0x0) at
  src/gdb/location.c:412
  adapteva#1  0x000000000057411a in print_breakpoint_location (b=0x18288e0, loc=0x0) at
  src/gdb/breakpoint.c:6201
  adapteva#2  0x000000000057483f in print_one_breakpoint_location (b=0x18288e0,
  loc=0x182cf10, loc_number=0, last_loc=0x7fffffffd258, allflag=1)
      at src/gdb/breakpoint.c:6473
  adapteva#3  0x00000000005751e1 in print_one_breakpoint (b=0x18288e0,
  last_loc=0x7fffffffd258, allflag=1) at
  src/gdb/breakpoint.c:6707
  adapteva#4  0x000000000057589c in breakpoint_1 (args=0x0, allflag=1, filter=0x0) at
  src/gdb/breakpoint.c:6947
  adapteva#5  0x0000000000575aa8 in maintenance_info_breakpoints (args=0x0, from_tty=0)
  at src/gdb/breakpoint.c:7026
  [...]

This is GDB trying to print the location spec of the JIT event
breakpoint, but that's an internal breakpoint without one.

If I add a NULL check, then we see that the JIT breakpoint is now
pending (because its location has shlib_disabled set):

  (gdb) maint info breakpoints
  Num     Type           Disp Enb Address            What
  [...]
  -8      jit events     keep y   <PENDING>           inf 1
  [...]

But that's incorrect.  GDB should have managed to recreate the JIT
breakpoint's location for the second run.  So the problem is
elsewhere.

The problem is that if the JIT loads at the same address on the second
run, we never recreate the JIT breakpoint, because we hit this early
return:

  static int
  jit_breakpoint_re_set_internal (struct gdbarch *gdbarch,
				  struct jit_program_space_data *ps_data)
  {
    [...]
    if (ps_data->cached_code_address == addr)
      return 0;

    [...]
      delete_breakpoint (ps_data->jit_breakpoint);
    [...]
    ps_data->jit_breakpoint = create_jit_event_breakpoint (gdbarch, addr);

Fix this by deleting the breakpoint and discarding the cached code
address when the objfile where the previous JIT breakpoint was found
is deleted/unloaded in the first place.

The test that was originally added for PR11094 doesn't trip on this
because:

  adapteva#1 - It doesn't test the case of the JIT descriptor's address _not_
       changing between reruns.

  adapteva#2 - And then it doesn't do "maint info breakpoints", or really
       anything with the JIT at all.

  adapteva#3 - and even then, to trigger the problem the JIT descriptor needs
       to be in a separate library, while the current test puts it in
       the main program.

The patch extends the test to cover all combinations of these
scenarios.

gdb/ChangeLog:
2016-10-06  Pedro Alves  <palves@redhat.com>

	* jit.c (free_objfile_data): Delete the JIT breakpoint and clear
	the cached code address.

gdb/testsuite/ChangeLog:
2016-10-06  Pedro Alves  <palves@redhat.com>

	* gdb.base/jit-simple-dl.c: New file.
	* gdb.base/jit-simple-jit.c: New file, factored out from ...
	* gdb.base/jit-simple.c: ... this.
	* gdb.base/jit-simple.exp (jit_run): Delete.
	(build_jit): New proc.
	(jit_test_reread): Recompile either the main program or the shared
	library, depending on what is being tested.  Skip changing address
	if caller wants to.  Compare before/after addresses.  If testing
	standalone, explicitly load the binary.  Test "maint info
	breakpoints".
	(top level): Add "standalone vs shared lib" and "change address"
	vs "same address" axes.
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
When loading a file using the file command on macOS, we get:

    $ ./gdb -nx --data-directory=data-directory -q -ex "file ./test"
    Reading symbols from ./test...
    Reading symbols from /Users/smarchi/build/binutils-gdb/gdb/test.dSYM/Contents/Resources/DWARF/test...
    /Users/smarchi/src/binutils-gdb/gdb/thread.c:72: internal-error: struct thread_info *inferior_thread(): Assertion `current_thread_ != nullptr' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    Quit this debugging session? (y or n)

The backtrace is:

    * frame #0: 0x0000000101fcb826 gdb`internal_error(file="/Users/smarchi/src/binutils-gdb/gdb/thread.c", line=72, fmt="%s: Assertion `%s' failed.") at errors.cc:52:3
      frame adapteva#1: 0x00000001018a2584 gdb`inferior_thread() at thread.c:72:3
      frame adapteva#2: 0x0000000101469c09 gdb`get_current_regcache() at regcache.c:421:31
      frame adapteva#3: 0x00000001015f9812 gdb`darwin_solib_get_all_image_info_addr_at_init(info=0x0000603000006d00) at solib-darwin.c:464:34
      frame adapteva#4: 0x00000001015f7a04 gdb`darwin_solib_create_inferior_hook(from_tty=1) at solib-darwin.c:515:5
      frame adapteva#5: 0x000000010161205e gdb`solib_create_inferior_hook(from_tty=1) at solib.c:1200:3
      frame adapteva#6: 0x00000001016d8f76 gdb`symbol_file_command(args="./test", from_tty=1) at symfile.c:1650:7
      frame adapteva#7: 0x0000000100abab17 gdb`file_command(arg="./test", from_tty=1) at exec.c:555:3
      frame adapteva#8: 0x00000001004dc799 gdb`do_const_cfunc(c=0x000061100000c340, args="./test", from_tty=1) at cli-decode.c:102:3
      frame adapteva#9: 0x00000001004ea042 gdb`cmd_func(cmd=0x000061100000c340, args="./test", from_tty=1) at cli-decode.c:2160:7
      frame adapteva#10: 0x00000001018d4f59 gdb`execute_command(p="t", from_tty=1) at top.c:674:2
      frame adapteva#11: 0x0000000100eee430 gdb`catch_command_errors(command=(gdb`execute_command(char const*, int) at top.c:561), arg="file ./test", from_tty=1, do_bp_actions=true)(char const*, int), char const*, int, bool) at main.c:523:7
      frame #12: 0x0000000100eee902 gdb`execute_cmdargs(cmdarg_vec=0x00007ffeefbfeba0 size=1, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x00007ffeefbfec20) at main.c:618:9
      frame #13: 0x0000000100eed3a4 gdb`captured_main_1(context=0x00007ffeefbff780) at main.c:1322:3
      frame #14: 0x0000000100ee810d gdb`captured_main(data=0x00007ffeefbff780) at main.c:1343:3
      frame #15: 0x0000000100ee8025 gdb`gdb_main(args=0x00007ffeefbff780) at main.c:1368:7
      frame #16: 0x00000001000044f1 gdb`main(argc=6, argv=0x00007ffeefbff8a0) at gdb.c:32:10
      frame #17: 0x00007fff20558f5d libdyld.dylib`start + 1

The solib_create_inferior_hook call in symbol_file_command was added by
commit ea142fb ("Fix breakpoints on file reloads for PIE
binaries").  It causes solib_create_inferior_hook to be called while
the inferior is not running, which darwin_solib_create_inferior_hook
does not expect.  darwin_solib_get_all_image_info_addr_at_init, in
particular, assumes that there is a current thread, as it tries to get
the current thread's regcache.

Fix it by adding a target_has_execution check and returning early.  Note
that there is a similar check in svr4_solib_create_inferior_hook.

gdb/ChangeLog:

	* solib-darwin.c (darwin_solib_create_inferior_hook): Return
	early if no execution.

Change-Id: Ia11dd983a1e29786e5ce663d0fcaa6846dc611bb
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
Commit 408f668 ("detach in all-stop
with threads running") regressed "detach" with "target remote":

 (gdb) detach
 Detaching from program: target:/any/program, process 3671843
 Detaching from process 3671843
 Ending remote debugging.
 [Inferior 1 (process 3671843) detached]
 In main
 terminate called after throwing an instance of 'gdb_exception_error'
 Aborted (core dumped)

Here's the exception above being thrown:

 (top-gdb) bt
 #0  throw_error (error=TARGET_CLOSE_ERROR, fmt=0x555556035588 "Remote connection closed") at src/gdbsupport/common-exceptions.cc:222
 adapteva#1  0x0000555555bbaa46 in remote_target::readchar (this=0x555556a11040, timeout=10000) at src/gdb/remote.c:9440
 adapteva#2  0x0000555555bbb9e5 in remote_target::getpkt_or_notif_sane_1 (this=0x555556a11040, buf=0x555556a11058, forever=0, expecting_notif=0, is_notif=0x0) at src/gdb/remote.c:9928
 adapteva#3  0x0000555555bbbda9 in remote_target::getpkt_sane (this=0x555556a11040, buf=0x555556a11058, forever=0) at src/gdb/remote.c:10030
 adapteva#4  0x0000555555bc0e75 in remote_target::remote_hostio_send_command (this=0x555556a11040, command_bytes=13, which_packet=14, remote_errno=0x7fffffffcfd0, attachment=0x0, attachment_len=0x0) at src/gdb/remote.c:12137
 adapteva#5  0x0000555555bc1b6c in remote_target::remote_hostio_close (this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12455
 adapteva#6  0x0000555555bc1bb4 in remote_target::fileio_close (During symbol reading: .debug_line address at offset 0x64f417 is 0 [in module build/gdb/gdb]
 this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12462
 adapteva#7  0x0000555555c9274c in target_fileio_close (fd=3, target_errno=0x7fffffffcfd0) at src/gdb/target.c:3365
 adapteva#8  0x000055555595a19d in gdb_bfd_iovec_fileio_close (abfd=0x555556b9f8a0, stream=0x555556b11530) at src/gdb/gdb_bfd.c:439
 adapteva#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556b9f8a0) at src/bfd/opncls.c:599
 adapteva#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556b9f8a0) at src/bfd/opncls.c:847
 adapteva#11 0x0000555555e0a27a in bfd_close (abfd=0x555556b9f8a0) at src/bfd/opncls.c:814
 #12 0x000055555595a9d3 in gdb_bfd_close_or_warn (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:626
 #13 0x000055555595ad29 in gdb_bfd_unref (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:715
 #14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573
 #15 0x0000555555ae955a in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:377
 #16 0x000055555572b7c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:155
 #17 0x00005555557263c3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555556bf0588, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
 #18 0x0000555555ae745e in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
 #19 0x0000555555ae747e in std::shared_ptr<objfile>::~shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
 #20 0x0000555555b1c1dc in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x5555564cdd60, __p=0x555556bf0580) at /usr/include/c++/9/ext/new_allocator.h:153
 #21 0x0000555555b1bb1d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x555556bf0580) at /usr/include/c++/9/bits/alloc_traits.h:497
 #22 0x0000555555b1b73e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/stl_list.h:1921
 #23 0x0000555555b1afeb in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/list.tcc:158
 #24 0x0000555555b19576 in program_space::remove_objfile (this=0x5555564cdd20, objfile=0x555556515540) at src/gdb/progspace.c:210
 #25 0x0000555555ae4502 in objfile::unlink (this=0x555556515540) at src/gdb/objfiles.c:487
 #26 0x0000555555ae5a12 in objfile_purge_solibs () at src/gdb/objfiles.c:875
 #27 0x0000555555c09686 in no_shared_libraries (ignored=0x0, from_tty=1) at src/gdb/solib.c:1236
 #28 0x00005555559e3f5f in detach_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:2769

So frame #28 already detached the remote process, and then we're
purging the shared libraries.  GDB had opened remote shared libraries
via the target: sysroot, so it tries closing them.  GDBserver is
tearing down already, so remote communication breaks down and we close
the remote target and throw TARGET_CLOSE_ERROR.

Note frame #14:

 #14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573

That's a dtor, thus noexcept.  That's the reason for the
std::terminate.

Stepping back a bit, why do we still have open remote files if we've
managed to detach already, and, we're debugging with "target remote"?
The reason is that commit 408f668
makes detach_command hold a reference to the target, so the remote
target won't be finally closed until frame #28 returns.  It's closing
the target that invalidates target file I/O handles.

This commit fixes the issue by not relying on target_close to
invalidate the target file I/O handles, instead invalidate them
immediately in remote_unpush_target.  So when GDB purges the solibs,
and we end up in target_fileio_close (frame adapteva#7 above), there's nothing
to do, and we don't try to talk with the remote target anymore.

The regression isn't seen when testing with
--target_board=native-gdbserver, because that does "set sysroot" to
disable the "target:" sysroot, for test run speed reasons.  So this
commit adds a testcase that explicitly tests detach with "set sysroot
target:".

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	PR gdb/28080
	* remote.c (remote_unpush_target): Invalidate file I/O target
	handles.
	* target.c (fileio_handles_invalidate_target): Make extern.
	* target.h (fileio_handles_invalidate_target): Declare.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	PR gdb/28080
	* gdb.base/detach-sysroot-target.exp: New.
	* gdb.base/detach-sysroot-target.c: New.

Reported-By: Jonah Graham <jonah@kichwacoders.com>

Change-Id: I851234910172f42a1b30e731161376c344d2727d
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
…080)

Before PR gdb/28080 was fixed by the previous patch, GDB was crashing
like this:

 (gdb) detach
 Detaching from program: target:/any/program, process 3671843
 Detaching from process 3671843
 Ending remote debugging.
 [Inferior 1 (process 3671843) detached]
 In main
 terminate called after throwing an instance of 'gdb_exception_error'
 Aborted (core dumped)

Here's the exception above being thrown:

 (top-gdb) bt
 #0  throw_error (error=TARGET_CLOSE_ERROR, fmt=0x555556035588 "Remote connection closed") at src/gdbsupport/common-exceptions.cc:222
 adapteva#1  0x0000555555bbaa46 in remote_target::readchar (this=0x555556a11040, timeout=10000) at src/gdb/remote.c:9440
 adapteva#2  0x0000555555bbb9e5 in remote_target::getpkt_or_notif_sane_1 (this=0x555556a11040, buf=0x555556a11058, forever=0, expecting_notif=0, is_notif=0x0) at src/gdb/remote.c:9928
 adapteva#3  0x0000555555bbbda9 in remote_target::getpkt_sane (this=0x555556a11040, buf=0x555556a11058, forever=0) at src/gdb/remote.c:10030
 adapteva#4  0x0000555555bc0e75 in remote_target::remote_hostio_send_command (this=0x555556a11040, command_bytes=13, which_packet=14, remote_errno=0x7fffffffcfd0, attachment=0x0, attachment_len=0x0) at src/gdb/remote.c:12137
 adapteva#5  0x0000555555bc1b6c in remote_target::remote_hostio_close (this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12455
 adapteva#6  0x0000555555bc1bb4 in remote_target::fileio_close (During symbol reading: .debug_line address at offset 0x64f417 is 0 [in module build/gdb/gdb]
 this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12462
 adapteva#7  0x0000555555c9274c in target_fileio_close (fd=3, target_errno=0x7fffffffcfd0) at src/gdb/target.c:3365
 adapteva#8  0x000055555595a19d in gdb_bfd_iovec_fileio_close (abfd=0x555556b9f8a0, stream=0x555556b11530) at src/gdb/gdb_bfd.c:439
 adapteva#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556b9f8a0) at src/bfd/opncls.c:599
 adapteva#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556b9f8a0) at src/bfd/opncls.c:847
 adapteva#11 0x0000555555e0a27a in bfd_close (abfd=0x555556b9f8a0) at src/bfd/opncls.c:814
 #12 0x000055555595a9d3 in gdb_bfd_close_or_warn (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:626
 #13 0x000055555595ad29 in gdb_bfd_unref (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:715
 #14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573
 #15 0x0000555555ae955a in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:377
 #16 0x000055555572b7c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:155
 #17 0x00005555557263c3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555556bf0588, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
 #18 0x0000555555ae745e in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
 #19 0x0000555555ae747e in std::shared_ptr<objfile>::~shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
 #20 0x0000555555b1c1dc in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x5555564cdd60, __p=0x555556bf0580) at /usr/include/c++/9/ext/new_allocator.h:153
 #21 0x0000555555b1bb1d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x555556bf0580) at /usr/include/c++/9/bits/alloc_traits.h:497
 #22 0x0000555555b1b73e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/stl_list.h:1921
 #23 0x0000555555b1afeb in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/list.tcc:158
 #24 0x0000555555b19576 in program_space::remove_objfile (this=0x5555564cdd20, objfile=0x555556515540) at src/gdb/progspace.c:210
 #25 0x0000555555ae4502 in objfile::unlink (this=0x555556515540) at src/gdb/objfiles.c:487
 #26 0x0000555555ae5a12 in objfile_purge_solibs () at src/gdb/objfiles.c:875
 #27 0x0000555555c09686 in no_shared_libraries (ignored=0x0, from_tty=1) at src/gdb/solib.c:1236
 #28 0x00005555559e3f5f in detach_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:2769

Note frame #14:

 #14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573

That's a dtor, thus noexcept.  That's the reason for the
std::terminate.

The previous patch fixed things such that the exception above isn't
thrown anymore.  However, it's possible that e.g., the remote
connection drops just while a user types "nosharedlibrary", or some
other reason that leads to objfile::~objfile, and then we end up the
same std::terminate problem.

Also notice that frames adapteva#9-adapteva#11 are BFD frames:

 adapteva#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556bc27e0) at src/bfd/opncls.c:599
 adapteva#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556bc27e0) at src/bfd/opncls.c:847
 adapteva#11 0x0000555555e0a27a in bfd_close (abfd=0x555556bc27e0) at src/bfd/opncls.c:814

BFD is written in C and thus throwing exceptions over such frames may
either not clean up properly, or, may abort if bfd is not compiled
with -fasynchronous-unwind-tables (x86-64 defaults that on, but not
all GCC ports do).

Thus frame adapteva#8 seems like a good place to swallow exceptions.  More so
since in this spot we already ignore target_fileio_close return
errors.  That's what this commit does.  Without the previous fix, we'd
see:

 (gdb) detach
 Detaching from program: target:/any/program, process 2197701
 Ending remote debugging.
 [Inferior 1 (process 2197701) detached]
 warning: cannot close "target:/lib64/ld-linux-x86-64.so.2": Remote connection closed

Note it prints a warning, which would still be a regression compared
to GDB 10, if it weren't for the previous fix.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <pedro@palves.net>

	PR gdb/28080
	* gdb_bfd.c (gdb_bfd_close_warning): New.
	(gdb_bfd_iovec_fileio_close): Wrap target_fileio_close in
	try/catch and print warning on exception.
	(gdb_bfd_close_or_warn): Use gdb_bfd_close_warning.

Change-Id: Ic7a26ddba0a4444e3377b0e7c1c89934a84545d7
olajep pushed a commit to olajep/epiphany-binutils-gdb that referenced this issue Oct 16, 2022
In PR28004 the following warning / Internal error is reported:
...
$ gdb -q -batch \
    -iex "set sysroot $(pwd -P)/repro" \
    ./repro/gdb \
    ./repro/core \
    -ex bt
  ...
 Program terminated with signal SIGABRT, Aborted.
 #0  0x00007ff8fe8e5d22 in raise () from repro/usr/lib/libc.so.6
 [Current thread is 1 (LWP 1762498)]
 adapteva#1  0x00007ff8fe8cf862 in abort () from repro/usr/lib/libc.so.6
 warning: (Internal error: pc 0x7ff8feb2c21d in read in psymtab, \
           but not in symtab.)
 warning: (Internal error: pc 0x7ff8feb2c218 in read in psymtab, \
           but not in symtab.)
  ...
 adapteva#2  0x00007ff8feb2c21e in __gnu_debug::_Error_formatter::_M_error() const \
   [clone .cold] (warning: (Internal error: pc 0x7ff8feb2c21d in read in \
   psymtab, but not in symtab.)

) from repro/usr/lib/libstdc++.so.6
...

The warning is about the following:
- in find_pc_sect_compunit_symtab we try to find the address
  (0x7ff8feb2c218 / 0x7ff8feb2c21d) in the symtabs.
- that fails, so we try again in the partial symtabs.
- we find a matching partial symtab
- however, the partial symtab has a full symtab, so
  we should have found a matching symtab in the first step.

The addresses are:
...
(gdb) info sym 0x7ff8feb2c218
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] in \
  section .text of repro/usr/lib/libstdc++.so.6
(gdb) info sym 0x7ff8feb2c21d
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] + 5 in \
  section .text of repro/usr/lib/libstdc++.so.6
...
which correspond to unrelocated addresses 0x9c218 and 0x9c21d:
...
$ nm -C  repro/usr/lib/libstdc++.so.6.0.29 | grep 000000000009c218
000000000009c218 t __gnu_debug::_Error_formatter::_M_error() const \
  [clone .cold]
...
which belong to function __gnu_debug::_Error_formatter::_M_error() in
/build/gcc/src/gcc/libstdc++-v3/src/c++11/debug.cc.

The partial symtab that is found for the addresses is instead the one for
/build/gcc/src/gcc/libstdc++-v3/src/c++98/bitmap_allocator.cc, which is
incorrect.

This happens as follows.

The bitmap_allocator.cc CU has DW_AT_ranges at .debug_rnglist offset 0x4b50:
...
    00004b50 0000000000000000 0000000000000056
    00004b5a 00000000000a4790 00000000000a479c
    00004b64 00000000000a47a0 00000000000a47ac
...

When reading the first range 0x0..0x56, it doesn't trigger the "start address
of zero" complaint here:
...
      /* A not-uncommon case of bad debug info.
         Don't pollute the addrmap with bad data.  */
      if (range_beginning + baseaddr == 0
          && !per_objfile->per_bfd->has_section_at_zero)
        {
          complaint (_(".debug_rnglists entry has start address of zero"
                       " [in module %s]"), objfile_name (objfile));
          continue;
        }
...
because baseaddr != 0, which seems incorrect given that when loading the
shared library individually in gdb (and consequently baseaddr == 0), we do see
the complaint.

Consequently, we run into this case in dwarf2_get_pc_bounds:
...
  if (low == 0 && !per_objfile->per_bfd->has_section_at_zero)
    return PC_BOUNDS_INVALID;
...
which then results in this code in process_psymtab_comp_unit_reader being
called with cu_bounds_kind == PC_BOUNDS_INVALID, which sets the set_addrmap
argument to 1:
...
      scan_partial_symbols (first_die, &lowpc, &highpc,
                            cu_bounds_kind <= PC_BOUNDS_INVALID, cu);
...
and consequently, the CU addrmap gets build using address info from the
functions.

During that process, addrmap_set_empty is called with a range that includes
0x9c218 and 0x9c21d:
...
(gdb) p /x start
$7 = 0x9989c
(gdb) p /x end_inclusive
$8 = 0xb200d
...
but it's called for a function at DIE 0x54153 with DW_AT_ranges at 0x40ae:
...
    000040ae 00000000000b1ee0 00000000000b200e
    000040b9 000000000009989c 00000000000998c4
    000040c3 <End of list>
...
and neither range includes 0x9c218 and 0x9c21d.

This is caused by this code in partial_die_info::read:
...
            if (dwarf2_ranges_read (ranges_offset, &lowpc, &highpc, cu,
                                    nullptr, tag))
             has_pc_info = 1;
...
which pretends that the function is located at addresses 0x9989c..0xb200d,
which is indeed not the case.

This patch fixes the first problem encountered: fix the "start address of
zero" complaint warning by removing the baseaddr part from the condition.
Same for dwarf2_ranges_process.

The effect is that:
- the complaint is triggered, and
- the warning / Internal error is no longer triggered.

This does not fix the observed problem in partial_die_info::read, which is
filed as PR28200.

Tested on x86_64-linux.

Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca>

gdb/ChangeLog:

2021-08-06  Simon Marchi  <simon.marchi@polymtl.ca>
	    Tom de Vries  <tdevries@suse.de>

	PR symtab/28004
	* dwarf2/read.c (dwarf2_rnglists_process, dwarf2_ranges_process):
	Fix zero address complaint.

gdb/testsuite/ChangeLog:

2021-08-06  Simon Marchi  <simon.marchi@polymtl.ca>
	    Tom de Vries  <tdevries@suse.de>

	PR symtab/28004
	* gdb.dwarf2/dw2-zero-range-shlib.c: New test.
	* gdb.dwarf2/dw2-zero-range.c: New test.
	* gdb.dwarf2/dw2-zero-range.exp: New file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants