-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ksh93: random behaviour of read -n <nchar>
for multi-byte characters.
#22
Comments
I seem to be getting some rather odd behavior here. file read-n_bytes_or_characters.sh
In this case, ksh seems to get it right:
But if I switch it back to "echo '€€€€€€€€' | read ..." then it fails again. |
2017-05-22 09:46:36 -0700, zakukai:
[...]
It seems to go wrong when the input is a socket,
tty, or pipe, but it does OK when the input is a file.
[...]
The fact that we get a different code path when the input goes
from a regular file doesn't surprise me.
When getting input from a seekable file (like a regular file and
some device files), it's easier. To read the requested amount of
characters, ksh93 can read a large block, find out how many
bytes those characters occupy and seek back after that.
While for a non-seekable file like a pipe/socket/tty device, it
needs to read one byte at a time to make sure it doesn't read
past the requested amount of characters.
I suppose that's when it gets confused.
…--
Stephane
|
I haven't quite nailed it down yet but it looks like it comes down to this line
Basically, when using -n (that's N_FLAG, as opposed to -N which is NN_FLAG), the code loops: On each iteration there are (x) characters remaining, so the implementation reads (x) bytes (since each character must take at least one byte). Then the newly-completed multi-byte characters are counted (starting from the pointer (up) and extending to the pointer (cur)) to determine how many characters will be left on the next iteration. Usually there will be some number of extra bytes read in that are carried over to the next iteration. But when the end of the last read coincides with a character boundary, it hits an edge case: I'll try to work out the parts I'm not quite understanding yet and put a patch together. |
Ksh uses sockets instead of real pipes to implement pipes in shell. This has caused issues on multiple occasions. For e.g.
I tried removing this code. It fixes above issue, but it caused some of the
should always output Regarding fix for this bug, you can see my experiments in this branch. |
Of course some of you remember why KSH had to use sockets on some platforms: KSH needs to occasionally read only up to a new-line character and so it needs to PEEK into the input byte stream. Although older SysV type UNIXi allow for PEEKs on pipes, many newer systems (Linux?) do not allow for that (PEEK's on pipes). So KSH has to resort to sockets to get the PEEK capability on platforms that do not support PEEK for pipes. What KSH does now-a-days, exactly, on each platform, I do not know. I would like to think that it only resorts to sockets for shell "pipes" when needed, but I do not know if this is the case any longer (it might be using sockets on all platforms now, for all I know). |
See also issue #1186 which is almost certainly the same problem as this issue. |
src/cmd/ksh93/sh/name.c: - Correct the check for when a function is currently running to fix a segmentation fault that occurred when a POSIX function tries to unset itself while it is running. This bug fix was backported from ksh93v-. src/cmd/ksh93/sh/xec.c: - If a function tries to unset itself, unset the function with '_nv_unset(np, NV_RDONLY)' to fix a silent failure. This fix was also backported from ksh93v-. src/cmd/ksh93/tests/functions.sh: - Add four regression tests for when a function unsets itself. Resolves att#21
The fix in sh/xec.c, which was backported from the ksh 93v- beta to delay the actual removal of a running function that unsets itself, caused a segfault in the variables.sh regression tests (see att#23). src/cmd/ksh93/sh/xec.c: - Comment out the backported code pending a correct fix for att#21. Now both types of functions silently fail to unset themselves (unless they're discipline functions). src/cmd/ksh93/tests/functions.sh: - Disable regression tests checking that the function was actually unset, pending a correct fix for att#21. Resolves: att#23 Reopens: att#21
Applying the fix for 'unset -f' exposed a crashing bug in lookup() in sh/nvdisc.c, which is the function for looking up discipline functions. This is what caused tests/variables.sh to crash. Ref.: ksh93#23 (comment) src/cmd/ksh93/sh/nvdisc.c: lookup(): - To avoid segfault, check that the function pointer nq->nvalue.rp is actually set before checking if nq->nvalue.rp->running==1. src/cmd/ksh93/sh/xec.c, src/cmd/ksh93/tests/functions.sh: - Uncomment the 'unset -f' fix from b7932e8. Resolves att#21 (again).
Reproduced with
version sh (AT&T Research) 93u+ 2012-08-01
andversion sh (AT&T Research) 93v- 2014-12-24
on Debian GNU/Linux amd64:According to the man page
read -n
reads a number of bytes, whileread --help
says characters.Tests are inconsistent: here testing in a UTF-8 locale with the 3-byte € (EURO U+20AC) character:
The 1 case suggests a number of characters, the 3 case a number of bytes, the rest doesn't seem to make any sense.
read -N
doesn't have the issue (and seems to take a number of characters):The text was updated successfully, but these errors were encountered: