-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read builtin command doesn't work as expected in Japanese locale #1186
Comments
The first thing that occurred to me to try was how this behaves in the Where did the sequence of bytes in your perl command come from? What encoding does that stream of bytes utilize? ASCII 0x5C is the backslash character which has special meaning for the I am ambivalent about supporting non-UTF-8 encodings now that Unicode has been a standard for almost three decades. Which means that even though the current behavior is wrong for non-UTF-8 and non-ISO-8859 encodings it is not obvious we should expend any effort fixing this bug. |
This issue is a variation on issue #43. |
@sowmya573 I personally do not intend to expend any effort to fix this bug because I only care about Unicode (specifically the UTF-8 encoding). And for those encodings this problem does not occur. But if you, or anyone else, creates a change to fix this bug we will be more than happy to merge it. |
usecase is from one of our Japanese customer. Basically customer is seeing the difference between ksh88 and ksh93. on Ksh93, using read -n 2 (mb_cur_max for Ja_JP) worked. But this cannot be generalised in the application as each locale has different mb_cur_max. ============================================ ============================================ RESULTS: 93.ksh: 948e 67 20 945c 8e67 ============================================ So the question is can ksh88 code be brought into ksh93? or how do we have the behaviour same on both ksh88 and ksh93. |
No. Not least because ksh88 was never open sourced so we don't have access to it. But even if we did have the source code it is almost a certainty that it is radically different from the current code. Which would make it impractical to just "bring it into ksh93." It is unlikely your customer actually requires the special-casing of a backslash before a newline. In which case they can simply use The current behavior is definitely broken. The code should be checking for a backslash only on fully formed chars, not individual bytes. Over the past couple of years @siteshwar and I have invested a huge amount of effort to clean up the code, fix unit tests, add interactive unit tests, and switch to a modern build system. We would love to see vendors like IBM contribute fixes for problems like this one. The fix will probably have to come from the CJK community since this doesn't affect UTF-8 or legacy encodings like ISO 8859 which are ASCII compatible. |
Description of problem:
ksh93t version builtin 'read' command ignores '0x5c' which comes as a part of Japaneese character under locale "Ja_JP" thinking it is ''.
Ksh version:
It exists in all versions of ksh, even on the latest ksh93 u+
version sh (AT&T Research) 93u+ 2012-08-01
How reproducible:
when LANG=Ja_JP
Steps to reproduce:
Actual results:
0x5c which comes as a part of japaneese multibyte char is ignored.
Expected results:
data should be processed correctly.
Additional info:
NA
The text was updated successfully, but these errors were encountered: