Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctrl+C has unusual behaviour while Python waits on input() #1130

Open
cloudrac3r opened this issue Mar 31, 2024 · 4 comments
Open

Ctrl+C has unusual behaviour while Python waits on input() #1130

cloudrac3r opened this issue Mar 31, 2024 · 4 comments

Comments

@cloudrac3r
Copy link
Contributor

I ran this file in these builds of Python:

input("hello! ")

print("looping")
a = 0
for i in range(200_000_000):
    a += 1

print("goodbye")
Build Ctrl+C in loop Ctrl+C in input()
Debian Python 3.11.2
Expected behaviour
Print ^C and traceback Print ^C and traceback
Cosmopolitan A.P.P. 3.6.14+
make o//third_party/python/python.com
Print ^C and traceback Press once: No effect
Press twice: Traceback
cosmo.zip Python 3.11.2
https://cosmo.zip/pub/cosmos/bin/python
Print ^C and traceback Press once: No effect
Press repeatedly: No effect
Press Ctrl+C and Enter: Print ^C and traceback

I tested on Linux and macOS and it seems to be the same in both.

I would expect Ctrl+C to traceback straight away regardless of whether the program is running or waiting for input. Is it possible to change this behaviour in Cosmopolitan Python?

@cloudrac3r
Copy link
Contributor Author

cloudrac3r commented Apr 2, 2024

I found out how to change this behaviour in A.P.P. Since I am fairly intimidated by the codebase and not sure what to do, I thought I'd document the process I took in this comment to help other newcomers build up enough confidence and learn the skills to make their own changes.

Click here to read the process Here's the thought process I went through:

Since the A.P.P. source tree is different to the Cosmos Python source tree, they probably have different code for Ctrl+C handling. My findings for one build will probably not apply to the other build, so I should pick one build to investigate and stick to it. I'll pick A.P.P. so that I only have to investigate code that's inside the monorepo. This means I can search the entire codebase at once with ag, and if ag shows no results, it's not there.

Running A.P.P with --strace showed me that just before the call write(1, u"hello! ", 7), the signal produced by Ctrl+C SIGINT is connected to the linenoiseOnInt function. This function is not in the A.P.P. source tree. A web search informs me that linenoise is an input line editor. Maybe the function is part of linenoise and linenoise is part of Python? Wait, Cosmopolitan forked Python, maybe they forked linenoise too. Yep, Cosmopolitan has forked linenoise. Yep, ag tells me that linenoise is in the A.P.P. source tree. (It isn't in the upstream Python source tree.)

Well, regardless, since the default SIGINT handler does what I want but the input SIGINT handler doesn't, the affecting code can only be in linenoise, so I've narrowed down my investigation to that.

Upon seeing the big list of changes at the top of Cosmopolitan's linenoise, it's clear that it's been carefully designed to have a lot of features. I start to think that the doubled Ctrl+C behaviour may be deliberately added as a feature. In the Shortcuts section at the top of the file, it says CTRL-C CTRL-C: INTERRUPT PROCESS, so it is deliberate. Investigation done.

If I wanted to be really sure this behaviour isn't anything to do with Python, I could build a short C program that calls Cosmopolitan linenoise for input and I could try Ctrl+C there to see what happens. I decide not to do this because I'm not confident enough in my ability to write C, but the idea was still good.

Now I want to figure out why and how linenoise does it so I can turn it off. I check the blame for that line and find a commit which changed the comment from single Ctrl+C to doubled Ctrl+C. Comments aren't code - let's see if this code also changed in this commit. This code was added:

      case CTRL('C'):
        if (linenoiseRead(l.ifd, seq, sizeof(seq), &l) != 1) break;
        switch (seq[0]) {
          CASE(CTRL('C'), linenoiseEditInterrupt(&l));
          CASE(CTRL('B'), linenoiseEditBarf(&l));
          CASE(CTRL('S'), linenoiseEditSlurp(&l));
          default:
            break;
        }
        break;

It looks like after Ctrl+C it reads a second time and if it sees a second Ctrl+C it calls linenoiseEditInterrupt. (And there's also some shortcuts on Ctrl+C Ctrl+B and Ctrl+C Ctrl+C called "barf" and "slurp" - a web search tells me these are clones of Emacs Paredit features.) In theory I can replace the if body with just linenoiseEditInterrupt(l) and it'll do what I want (and unbind barf and slurp). But I'm viewing an old diff. I need to find out where this code is in the current commit. It's pretty easy to search for it on line 2044:

      // handle (1) emacs keyboard combos
      //        (2) otherwise sigint exit
      if (seq[0] == CTRL('C')) {
        DUFF_ROUTINE_READ(3);
        if (rc == 1) {
          switch (seq[0]) {
            CASE(CTRL('C'), linenoiseEditInterrupt(l));
            CASE(CTRL('B'), linenoiseEditBarf(l));
            CASE(CTRL('S'), linenoiseEditSlurp(l));
            default:
              goto HandleRead;
          }
          continue;
        } else {
          goto HandleRead;
        }
      }

OK, let's try changing the code.

diff --git a/third_party/linenoise/linenoise.c b/third_party/linenoise/linenoise.c
index cf673ba1f..bcd5080fe 100644
--- a/third_party/linenoise/linenoise.c
+++ b/third_party/linenoise/linenoise.c
@@ -2044,19 +2044,7 @@ ssize_t linenoiseEdit(struct linenoiseState *l, const char *prompt, char **obuf,
       // handle (1) emacs keyboard combos
       //        (2) otherwise sigint exit
       if (seq[0] == CTRL('C')) {
-        DUFF_ROUTINE_READ(3);
-        if (rc == 1) {
-          switch (seq[0]) {
-            CASE(CTRL('C'), linenoiseEditInterrupt(l));
-            CASE(CTRL('B'), linenoiseEditBarf(l));
-            CASE(CTRL('S'), linenoiseEditSlurp(l));
-            default:
-              goto HandleRead;
-          }
-          continue;
-        } else {
-          goto HandleRead;
-        }
+        linenoiseEditInterrupt(l);
       }

       // handle (1) unpausing terminal after ctrl-s

Now I have to rebuild Python. The wiki tells me how - make o//third_party/python/python.com. The output looks like it rebuilt some things. I run my program with A.P.P. again. To my astonishment, it worked. A single Ctrl+C now causes KeyboardInterrupt during input().


Conclusion: In A.P.P., this behaviour is deliberate because Python's input reader has been replaced with linenoise, and linenoise has special code that maps Ctrl+C Ctrl+C = SIGINT. Changing the shortcut in linenoise to single Ctrl+C = SIGINT makes it work as I expected.

I'm tempted to open a pull request for this keyboard shortcut change, but I also removed the barf and slurp shortcuts and left them as dead code, so first I'll need to find some new key chords to put those shortcuts on.

Next I will investigate the Cosmos Python behaviour, since as far as I can tell from the source tree, it doesn't have linenoise so the fix might be a bit different.

@cloudrac3r
Copy link
Contributor Author

For Cosmos Python it turns out import readline at the start of a script will then handle Ctrl+C correctly for all future input() calls, but I'm still trying to figure out why.

I think it would be good to have Cosmos Python import readline by default. I think it doesn't do this because readline isn't a supported module on Windows (because it's GNU Readline) but I bet it would work in APE format.

@cloudrac3r
Copy link
Contributor Author

In Cosmos Python, Ctrl+C doesn't work as expected because Cosmopolitan Libc fgets has different behaviour to GNU Libc fgets. The unexpected behaviour applies not only to Python, but also to simple C programs built with Cosmopolitan Libc. Here is a simple example:

#include <stdio.h>
#include <signal.h>

static void handler(int signo) {
  fprintf(stderr, "SIGINT");
  fflush(stderr);
}

int main(void) {
  // Call `handler` when SIGINT received (when Ctrl+C pressed)
  struct sigaction action;
  action.sa_handler = handler;
  sigaction(SIGINT, &action, NULL);

  // Get some input with `fgets`
  char buf[20];
  printf("hello! ");
  fflush(stdout);
  fgets(buf, 20, stdin);

  // End program after getting input
  return 0;
}
> gcc ctrl-c.c -o ctrl-c
> ./ctrl-c
hello! ^CSIGINT                            <-- program ends straight away
> cosmocc ctrl-c.c -o ctrl-c
> ./ctrl-c
hello! ^CSIGINT^CSIGINT^CSIGINT^CSIGINT    <-- program doesn't end

Cosmopolitan's behaviour is easy to visualise with --ftrace:

FUN 678924 678924         18'594'286     512   &fgets
FUN 678924 678924         18'597'720     560     &flockfile
FUN 678924 678924         18'601'106     576       &pthread_mutex_lock
FUN 678924 678924         18'604'560     624         &gettid
FUN 678924 678924         18'607'960     560     &fgets_unlocked
FUN 678924 678924         18'611'433     640       &getc_unlocked
FUN 678924 678924         18'614'926     672         &fread_unlocked
FUN 678924 678924         18'646'693     784           &readv
FUN 678924 678924         18'650'846     832             &sys_readv
^CFUN 678924 678924        340'276'246   2'272               &handler
FUN 678924 678924        340'284'673   2'304                 &fwrite
FUN 678924 678924        340'289'180   2'352                   &flockfile
FUN 678924 678924        340'293'073   2'368                     &pthread_mutex_lock
FUN 678924 678924        340'305'420   2'416                       &gettid
FUN 678924 678924        340'309'420   2'352                   &fwrite_unlocked
FUN 678924 678924        340'313'106   2'448                     &__robust_writev
FUN 678924 678924        340'317'040   2'496                       &writev
FUN 678924 678924        340'322'193   2'544                         &sys_writev
SIGINTFUN 678924 678924        340'329'620   2'352                   &funlockfile
FUN 678924 678924        340'333'053   2'368                     &pthread_mutex_unlock
FUN 678924 678924        340'336'713   2'400                       &gettid
FUN 678924 678924        340'340'220   2'304                 &fflush
FUN 678924 678924        340'343'740   2'336                   &flockfile
FUN 678924 678924        340'347'146   2'352                     &pthread_mutex_lock
FUN 678924 678924        340'350'633   2'400                       &gettid
FUN 678924 678924        340'354'106   2'336                   &fflush_unlocked
FUN 678924 678924        340'357'626   2'384                     &__fflush_impl
FUN 678924 678924        340'361'086   2'336                   &funlockfile
FUN 678924 678924        340'364'506   2'352                     &pthread_mutex_unlock
FUN 678924 678924        340'367'966   2'384                       &gettid
FUN 678924 678924        340'372'360     832             &__errno_location
FUN 678924 678924        340'404'333     640       &ferror_unlocked
FUN 678924 678924        340'409'293     672         &__errno_location
FUN 678924 678924        340'412'866     640       &getc_unlocked
FUN 678924 678924        340'416'380     672         &fread_unlocked
FUN 678924 678924        340'420'086     784           &readv
FUN 678924 678924        340'423'706     832             &sys_readv
^CFUN 678924 678924        659'664'353   2'272               &handler
FUN 678924 678924        659'673'213   2'304                 &fwrite
FUN 678924 678924        659'677'393   2'352                   &flockfile
FUN 678924 678924        659'681'333   2'368                     &pthread_mutex_lock
FUN 678924 678924        659'685'420   2'416                       &gettid
FUN 678924 678924        659'689'486   2'352                   &fwrite_unlocked
FUN 678924 678924        659'703'493   2'448                     &__robust_writev
FUN 678924 678924        659'708'540   2'496                       &writev
FUN 678924 678924        659'713'073   2'544                         &sys_writev
SIGINTFUN 678924 678924        659'721'413   2'352                   &funlockfile
FUN 678924 678924        659'724'886   2'368                     &pthread_mutex_unlock
FUN 678924 678924        659'728'493   2'400                       &gettid
FUN 678924 678924        659'732'006   2'304                 &fflush
FUN 678924 678924        659'735'540   2'336                   &flockfile
FUN 678924 678924        659'738'986   2'352                     &pthread_mutex_lock
FUN 678924 678924        659'742'453   2'400                       &gettid
FUN 678924 678924        659'745'866   2'336                   &fflush_unlocked
FUN 678924 678924        659'749'440   2'384                     &__fflush_impl
FUN 678924 678924        659'752'946   2'336                   &funlockfile
FUN 678924 678924        659'756'386   2'352                     &pthread_mutex_unlock
FUN 678924 678924        659'759'826   2'384                       &gettid
FUN 678924 678924        659'764'620     832             &__errno_location
FUN 678924 678924        659'769'326     640       &ferror_unlocked
FUN 678924 678924        659'772'960     672         &__errno_location
FUN 678924 678924        659'776'440     640       &getc_unlocked
FUN 678924 678924        659'779'926     672         &fread_unlocked
FUN 678924 678924        659'783'620     784           &readv
FUN 678924 678924        659'787'500     832             &sys_readv

I found out that this happens because in fgets when Ctrl+C is pressed and the signal handler is triggered, fgetc_unlocked will return with errno = EINTR. Cosmopolitan has code to stay in fgets and keep calling fgetc_unlocked to read a whole line when it sees EINTR, but GNU Libc doesn't do this.

I edited Cosmopolitan to change continue to break

if (ferror_unlocked(f) == EINTR) {
continue;
} else {

and rebuilt Cosmos Python. Ctrl+C in Python input() in my new build now works like I expected.

I don't know what the C standard says fgets is supposed to do here, but either way, I thought @jart should be informed about this in case this deviation from GNU is undesirable. In my view, I found it really confusing when Python wasn't exiting when I pressed Ctrl+C. I found it so confusing that I went to the trouble of investigating this far. But I also see that this fgets behaviour could be useful when writing C programs, because it would let me easily read an entire line without being interrupted by other non-user-initiated events like alarm()s.

@cloudrac3r
Copy link
Contributor Author

After I reading about SA_RESTART and running cosmopolitan/examples/ctrlc.c it looks like SA_RESTART in read is implemented correctly.

In my previous comment, I wrote:

I also see that this fgets behaviour could be useful when writing C programs, because it would let me easily read an entire line without being interrupted by other non-user-initiated events like alarm()s.

Turns out this is what the SA_RESTART flag is for. read works as expected with/without this flag, but the EINTR checker in the loop in fgets basically makes it act as if SA_RESTART is always set.

I'm pretty sure this Cosmopolitan Libc fgets EINTR behaviour is a bug, so I'm going to create a pull request to change it.

cloudrac3r added a commit to cloudrac3r/cosmopolitan that referenced this issue Apr 23, 2024
See investigation in jart#1130.

fgets internally calls readv. readv is a @restartable function that
understands SA_RESTART. If SA_RESTART is set, readv already handles
restarting the system call and eventually the string is transparently
returned to the fgets caller.

When SA_RESTART is not set, -1 EINTR should bubble up to the fgets
caller. However, until this commit, fgets itself would detect EINTR and
keep retrying until it read an entire line.

This commit fixes this behaviour so that fgets understands SA_RESTART.

I hereby waive copyright to this commit and place it in public domain.

Signed-off-by: Cadence Ember <cadence@disroot.org>
cloudrac3r added a commit to cloudrac3r/cosmopolitan that referenced this issue Apr 26, 2024
See investigation in jart#1130.

fgets internally calls readv. readv is a @restartable function that
understands SA_RESTART. If SA_RESTART is set, readv already handles
restarting the system call and eventually the string is transparently
returned to the fgets caller.

When SA_RESTART is not set, -1 EINTR should bubble up to the fgets
caller. However, until this commit, fgets itself would detect EINTR and
keep retrying until it read an entire line.

This commit fixes this behaviour so that fgets understands SA_RESTART.

I hereby assign copyright for this commit to Justine Tunney.

Signed-off-by: Cadence Ember <cadence@disroot.org>
cloudrac3r added a commit to cloudrac3r/cosmopolitan that referenced this issue Apr 26, 2024
See investigation in jart#1130.

fgets internally calls readv. readv is a @restartable function that
understands SA_RESTART. If SA_RESTART is set, readv already handles
restarting the system call and eventually the string is transparently
returned to the fgets caller.

When SA_RESTART is not set, -1 EINTR should bubble up to the fgets
caller. However, until this commit, fgets itself would detect EINTR and
keep retrying until it read an entire line.

This commit fixes this behaviour so that fgets understands SA_RESTART.

I hereby assign copyright for this commit to Justine Tunney.

Signed-off-by: Cadence Ember <cadence@disroot.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant