Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iTerm2, Bash-4.4] shell becomes unresponsive on resize #48

Closed
killermoehre opened this issue Apr 15, 2020 · 16 comments
Closed

[iTerm2, Bash-4.4] shell becomes unresponsive on resize #48

killermoehre opened this issue Apr 15, 2020 · 16 comments

Comments

@killermoehre
Copy link

killermoehre commented Apr 15, 2020

ble version: 0.4.0-devel2+5327f5d
Bash version: 5.0.16(1)-release,x86_64-apple-darwin19.3.0

If I resize a terminal window without typing at least one character into it, the shell won't accept any commands after the resize. The corresponding bash process runs at 100% CPU usage and is only killable by SIGKILL(9). This is independent of the used terminal emulator.

How to reporoduce:

  1. open a terminal window
  2. resize it
  3. see in another terminal the CPU usage rise

PS1: \[\]\[\033]0;\u@\H:\W\007\]\[$(iterm2_prompt_mark)\]\$ \[\]

@akinomyoga
Copy link
Owner

Thank you for the report! But it doesn't reproduce with my environment. I tried ble-0.4.0-devel2+5327f5d with 5.0.11(1)-release,x86_64-redhat-linux-gnu, 4.4.12(3)-release,x86_64-unknown-cygwin and 5.0.16(2)-release,x86_64-pc-linux-gnu (bash devel branch).

PS1: [][\033]0;\u@\H:\W\007][$(iterm2_prompt_mark)]$ []

Question 1: I don't have macOS, and there is no iterm2_prompt_mark. I would like to have a result after the expansion. Also, I guess there would be some escape sequences between \[\] which are not printed on the terminal. Could you give me a result of the following command?

$ cat -v <<< "${PS1@P}"

Question 2: Is this problem related to your PS1? For example, does this problem reproduce with a simpler prompt setting something like PS1='\$ '?

@killermoehre
Copy link
Author

killermoehre commented Apr 15, 2020

Thank you for the report! But it doesn't reproduce with my environment. I tried ble-0.4.0-devel2+5327f5d with 5.0.11(1)-release,x86_64-redhat-linux-gnu, 4.4.12(3)-release,x86_64-unknown-cygwin and 5.0.16(2)-release,x86_64-pc-linux-gnu (bash devel branch).

PS1: [][\033]0;\u@\H:\W\007][$(iterm2_prompt_mark)]$ []

Question 1: I don't have macOS, and there is no iterm2_prompt_mark. I would like to have a result after the expansion. Also, I guess there would be some escape sequences between \[\] which are not printed on the terminal. Could you give me a result of the following command?

$ cat -v <<< "${PS1@P}"
^A^[]133;D;0^G^B^A^[]0;cXXXXXXX@apfelkuchen:~^G^B^A^[]133;A^G^B$ ^A^[]133;B^G^B

The username is redacted.

Question 2: Is this problem related to your PS1? For example, does this problem reproduce with a simpler prompt setting something like PS1='\$ '?

No, it happens regardless of the used PS1.

@killermoehre
Copy link
Author

killermoehre commented Apr 15, 2020

I was able to reproduce the issue on my linux machine. xfce4-terminal, VTE3-based

$ echo $BLE_VERSION 
0.4.0-devel2+5327f5d
$ echo $BASH_VERSION,$MACHTYPE 
5.0.16(1)-release,x86_64-pc-linux-gnu
$ cat <<< "${PS1@P}"
$

@akinomyoga
Copy link
Owner

akinomyoga commented Apr 15, 2020

Thank you for the information!

^A^[]133;D;0^G^B^A^[]0;cXXXXXXX@apfelkuchen:~^G^B^A^[]133;A^G^B$ ^A^[]133;B^G^B

This corresponds to the following setting. I tried but the problem wasn't reproduced.

PS1='\[\e]133;D;0\a\]\[\e]0;cXXXXXXX@apfelkuchen:~\a\]\[\e]133;A\a\]$ \[\e]133;B\a\]'

No, it happens regardless of the used PS1.

OK. Thank you. Then the problem should be somewhere else.

Question 3: Is the problem reproduced with a pure environment where only ble.sh is loaded in Bash? You can try this by the following steps.

  1. Open terminal
  2. Type the following commands
$ bash --norc
$ source /path/to/ble.sh
  1. Resize before any inputs

@akinomyoga
Copy link
Owner

I was able to reproduce the issue on my linux machine. xfce4-terminal, VTE3-based

Thank you. I newly built bash-5.0.16 now and tried xfce4-terminal in Linux machine, but I still cannot get the behavior.

[murase@chatoyancy 0 ~]$ echo $BLE_VERSION
0.4.0-devel2+5327f5d
[murase@chatoyancy 0 ~]$ echo $BASH_VERSION,$MACHTYPE
5.0.16(1)-release,x86_64-pc-linux-gnu
[murase@chatoyancy 0 ~]$ xfce4-terminal --version
xfce4-terminal 0.8.8 (Xfce 4.14)
```

@killermoehre
Copy link
Author

Question 3: Is the problem reproduced with a pure environment where only ble.sh is loaded in Bash? You can try this by the following steps.

  1. Open terminal
  2. Type the following commands
$ bash --norc
$ source /path/to/ble.sh
  1. Resize before any inputs

I can't reproduce it that way. I also tried xfce4-terminal --disable-server --rcfile ~/.local/share/blesh/ble.sh, but this works as expected, too.
You find my bash config at https://github.com/killermoehre/personal-bash-config, maybe you know better what you are searching for.

@akinomyoga
Copy link
Owner

akinomyoga commented Apr 15, 2020

Thank you! I'll check your config from now. I have another question. In the case that the CPU usage 100% is caused in a shell-script infinite loop, I want to take a shell stack trace of the infinite loop.

Question 4: Can you try the following steps to try to get a stack trace?

  1. Add a line before ble-attach in .bashrc as follows:
# bashrc

[[ $- == *i* ]] && source /path/to/blesh/ble.sh --noattach

# ...

trap 'ble-stackdump >/dev/tty' USR1 # <-- ADD THIS LINE
((_ble_bash)) && ble-attach
  1. Open a terminal (Terminal #1)
  2. Resize Terminal #1
  3. Check the process ID of CPU 100% Bash in another terminal (Terminal #2).
  4. Execute the following command several times in Terminal #2 (Please replace <PID> with the process ID obtained in Step 3).
$ kill -USR1 <PID>
  1. Is something output in Terminal #1? If there is something, it should look like the following. Could you provide me the contents?
stackdump:
  @ /home/murase/.mwg/src/ble.sh/out/ble.sh:1 (ble-stackdump)
  @ something:123 (something)
  @ something:321 (something)
  @ something:123 (something)

Edit: There will be filename paths in the output. If you want to hide your user name, please feel free to replace them!

@killermoehre
Copy link
Author

killermoehre commented Apr 15, 2020

Thank you! I'll check your config from now. I have another question. In the case that the CPU usage 100% is caused in a shell-script infinite loop, I want to take a shell stack trace of the infinite loop.

Thank you very much for your time and effort.

Question 4: Can you try the following steps to try to get a stack trace?

  1. Add a line before ble-attach in .bashrc as follows:
# bashrc

[[ $- == *i* ]] && source /path/to/blesh/ble.sh --noattach

# ...

trap 'ble-stackdump >/dev/tty' USR1 # <-- ADD THIS LINE
((_ble_bash)) && ble-attach

Done in ~/.local/share/bash/zzz-attach-ble.sh.

  1. Open a terminal (Terminal #1)
  2. Resize Terminal #1
  3. Check the process ID of CPU 100% Bash in another terminal (Terminal #2).
  4. Execute the following command several times in Terminal #2 (Please replace <PID> with the process ID obtained in Step 3).
$ kill -USR1 <PID>
  1. Is something output in Terminal #1? If there is something, it should look like the following. Could you provide me the contents?
stackdump:
  @ /home/murase/.mwg/src/ble.sh/out/ble.sh:1 (ble-stackdump)
  @ something:123 (something)
  @ something:321 (something)
  @ something:123 (something)

Edit: There will be filename paths in the output. If you want to hide your user name, please feel free to replace them!

Unfortunately there is no output at all. Just a process running on 100% CPU not reacting to any signals but SIGKILL and SIGSTOP.

@akinomyoga
Copy link
Owner

Unfortunately there is no output at all.

OK... Thank you for trying out the stack trace! You can now remove the added line "trap 'ble-stackdump >/dev/tty' USR1".

I'm now checking your configs. Maybe the treatment of PROMPT_COMMAND in ble.sh does something wrong. I actually recently changed the treatment of PROMPT_COMMAND in ble.sh. You are now using the commit 0.4.0-devel2+5327f5d. I want to check if the problem is reproduced with the commit f1a2818.

Question 5: Can you check if the problem persists after the following steps?

  1. Build and install f1a2818
$ cd /path/to/blesh-repository
$ git checkout f1a2818
$ make
$ make INSDIR=~/.local/share/blesh install
  1. Open a terminal
  2. Resize
  3. Check CPU usage.

Note: if you have finished the check, you can restore the ble.sh repository state by the following commands:

$ cd /path/to/blesh-repository
$ git checkout master
$ make
$ make INSDIR=~/.local/share/blesh install

@akinomyoga
Copy link
Owner

OK, I could reproduce the problem.

@akinomyoga
Copy link
Owner

akinomyoga commented Apr 15, 2020

It turned out that it is caused by a bug of Bash 4.4+. When return without argument is used in the context of signal handlers, the function always succeeds regardless of the exit status of the previous command. This is a terrible bug... (Seems to be an intended change?)

$ test_command='f1() { false; return; }; trap "f1 && echo unexpected" WINCH; kill -WINCH 0'
$ bash-2.05b -c "$test_command"
$ bash-3.1 -c "$test_command"
$ bash-3.2 -c "$test_command"
$ bash-4.0 -c "$test_command"
$ bash-4.1 -c "$test_command"
$ bash-4.2 -c "$test_command"
$ bash-4.3 -c "$test_command"
$ bash-4.4 -c "$test_command"
unexpected
$ bash-5.0 -c "$test_command"
unexpected
$ bash-dev -c "$test_command"
unexpected

Edit: Details on Bash change

I was checking the source code of Bash. The related section is if (list == 0) statement in the function get_exitstat in builtins/common.c. The code has been changed in the following way:

$ git blame 7117c2d22 builtins/common.c
7117c2d22 (Jari Aalto 2002-07-17 14:10:11 +0000 496)   if (list == 0)
7117c2d22 (Jari Aalto 2002-07-17 14:10:11 +0000 497)     return (last_command_exit_value);

$ git blame 939d190e0 builtins/common.c
7117c2d22 (Jari Aalto 2002-07-17 14:10:11 +0000 497)   if (list == 0)
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 498)     {
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 499)       if (this_shell_builtin == return_builtin && running_trap)
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 500)          return (trap_saved_exit_value);
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 501)       return (last_command_exit_value);
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 502)     }

$ git blame e2f12fdf5 builtins/common.c
7117c2d22 (Jari Aalto 2002-07-17 14:10:11 +0000 497)   if (list == 0)
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 498)     {
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 499)       /* If we're not running the DEBUG trap, the return builtin, when not
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 500)           given any arguments, uses the value of $? before the trap ran.  If
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 501)           given an argument, return uses it.  This means that the trap can't
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 502)           change $?.  The DEBUG trap gets to change $?, though, since that is
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 503)           part of its reason for existing, and because the extended debug mode
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 504)           does things with the return value. */
e2f12fdf5 (Chet Ramey 2014-04-07 10:46:02 -0400 505)       if (this_shell_builtin == return_builtin && running_trap > 0 && running_trap != DEBUG_TRAP+1)
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 506)           return (trap_saved_exit_value);
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 507)       return (last_command_exit_value);
939d190e0 (Chet Ramey 2014-03-21 14:15:01 -0400 508)     }

The related change logs of Bash are as follows:

           3/11
           ----

builtins/common.c
  - get_exitstat: when running `return' in a trap action, and it is not
    supplied an argument, use the saved exit status in
    trap_saved_exit_value.  Fixes Posix problem reported by
    Eduardo A. Bustamante L坦pez <dualbus@gmail.com>


           3/18
           ----

builtins/common.c
  - get_exitstat: update fix of 3/11 to allow the DEBUG trap to use the
    current value of $? instead of the value it had before the trap
    action was run.  This is one reason the DEBUG trap exists, and
    extended debug mode uses it.  Might want to do this only in Posix
    mode

The changes have been introduced by the following discussion

In the discussion, the POSIX is quoted:

POSIX XCU 2.14 Special Built-In Utilities - return - EXIT STATUS

The value of the special parameter '?' shall be set to n, an unsigned decimal integer, or to the exit status of the last command executed if n is not specified. If n is not an unsigned decimal integer, or is greater than 255, the results are unspecified. When return is executed in a trap action, the last command is considered to be the command that executed immediately preceding the trap action.

The interpretation of this part "When return is executed in a trap action" is ambiguous. The original reporter Eduardo A. Bustamante López assumed a trap action to be the argument passed to the trap builtin so that returns in the argument is affected. But the Bash maintainer Chet seems to had assumed a trap action to be the entire execution process of the trap handler so that every return in a function call tree is affected. In fact, the interpretation varies among the shells:

$ test_command='f1() { false; return; }; f2() { f1 && echo XXX; }; trap f2 WINCH; kill -WINCH 0'
$ mksh -c "$test_command"
unexpected
$ ksh -c "$test_command"
unexpected
$ yash -c "$test_command"
unexpected
$ ash -c "$test_command"
$ dash -c "$test_command"
$ zsh -c "$test_command"
$ busybox sh -c "$test_command"
$ posh -c "${test_command//WINCH/INT}"
$

@killermoehre
Copy link
Author

Do you know that you are in fact awesome?
But now the question is: what is the correct behavior here and how can it be fixed?

@akinomyoga
Copy link
Owner

what is the correct behavior here and how can it be fixed?

I don't know why the POSIX defines the behavior of return in a trap action in such a way, but I guess POSIX wanted to provide a way to preserve the value of $? on the start of the trap action. If that is the case, the current Bash behavior that every return in the function-call tree is affected is unreasonable. Only return at the top level of the trap action should be affected.

I think I'm going to report it in bug-bash mailing list, but even if the behavior would be changed in Bash 5.1, the current behavior of Bash-4.4 and 5.0 is unlikely to be modified. So as far as ble.sh supports those versions, I have to work around this problem. I decided to specify an explicit argument for every return aa09d15. We cannot use return without arguments in Bash-4.4 and 5.0 for this problem.

@killermoehre
Copy link
Author

aa09d15 works for me as expected. Thank you very much.

@akinomyoga
Copy link
Owner

Thank you very much for all your report and testing!

@akinomyoga
Copy link
Owner

@akinomyoga akinomyoga changed the title shell becomes unresponsive on resize [iterm2, bash-4.4] shell becomes unresponsive on resize May 17, 2020
@akinomyoga akinomyoga changed the title [iterm2, bash-4.4] shell becomes unresponsive on resize [iTerm2, Bash-4.4] shell becomes unresponsive on resize May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants