Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we still support EBCDIC? #742

Closed
krader1961 opened this issue Jul 30, 2018 · 8 comments

Comments

@krader1961
Copy link
Collaborator

commented Jul 30, 2018

Coverity pointed out a condition that is always true on ASCII using systems. Which lead @siteshwar to open PR #741. That transformation, however, is only valid on ASCII using systems. There are other places in the code which attempt to support both ASCII and EBCDIC. See, for example, the definition of getop() in src/cmd/ksh93/sh/streval.c. But given all the places in the code which have hardcoded chars like \033 (escape) my confidence is low that the current code will function correctly in an EBCDIC environment.

If we continue to support EBCDIC it should be done via build time tests such as #if ('a' == 97). Probably coupled with formal abstractions and preprocessor symbols. Personally I would just as soon drop whatever support there is for EBCDIC and formally state that ksh only supports ASCII and encodings like ISO-8859 and UNICODE which are compatible with ASCII. Which basically means everything but EBCDIC.

@etscrivner

This comment has been minimized.

Copy link

commented Jul 31, 2018

👍 from me - EBCDIC appears to be mostly IBM-specific and I would rather devote any energy we'd all expend towards EBCDIC towards better UTF-8/UNICODE support where possible.

@krader1961

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 31, 2018

There are just a handful of places where something like grep '^#if .*\'' returns matches. And most of those are in src/lib/libast/include/ast_ccode.h. Which seems to do, more or less, what I proposed in my original comment. Assuming we want to retain support for EBCDIC systems. Note that symbols like CC_esc are used elsewhere in the code. Which suggests that the run-time test in src/cmd/ksh93/sh/string.c is a mistake that should have used the symbols from ast_ccode.h.

One option is to replace the problematic run-time test in src/cmd/ksh93/sh/string.c that caused Coverity to report a problem with the CC_esc preprocessor symbol. My preference is to drop support for IBM EBCDIC.

@kdudka

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2018

I do not think we should support EBCDIC. And, to be honest, I doubt that the EBCDIC support ever worked in the recent releases of ksh.

@dannyweldon

This comment has been minimized.

Copy link
Collaborator

commented Jul 31, 2018

Yes, I don't care for EBCDIC support, either. How do we know that it was ever built successfully on EBCDIC machines? It may have been only partly implemented. If bash runs on mainframes, does it support EBCDIC or just ASCII?

@krader1961

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 31, 2018

How do we know that it was ever built successfully on EBCDIC machines? It may have been only partly implemented.

That's what I'm wondering. There are lots of tricks programmers use that work for ASCII but not EBCDIC. For example, tests like c >= 'a' && c <= 'z'. I wonder how many places in the AST/ksh code rely on those tricks.

Also, per this EE Times article there are 57 variants of EBCDIC. Most of those are the equivalent of ISO-8859-1, ISO-8859-2, etc. (or Windows code pages) to support alphabets in other countries and aren't really relevant to this issue. However, I'm aware of at least two variants that differ with respect to codepoints that might affect ksh.

@siteshwar

This comment has been minimized.

Copy link
Collaborator

commented Jul 31, 2018

I would be surprised if anyone objects to our decision of dropping it. Bad day for EBCDIC!

@krader1961 krader1961 self-assigned this Aug 1, 2018

@krader1961

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 1, 2018

@kdudka's intuition is correct. Not surprisingly some judicious grep'ing does find places in the code which are ipso facto broken on an EBCDIC system. From src/lib/libast/hash/strkey.c:

if (c >= 'a' && c <= 'z')

From src/cmd/ksh93/sh/name.c:

if (*buff >= 'A' && *buff <= 'Z') *buff += 'a' - 'A';

There are others but you get the idea. I'll gin up a PR to remove the non-ASCII getop() implementation and add a clear ASCII or ASCII compatible char set only warning.

krader1961 added a commit to krader1961/ast that referenced this issue Aug 1, 2018
Disclaim support for EBDCIC
This removes all the attempts to accomodate EBCDIC I could find other
than those in src/lib/libast/include/ast_ccode.h. That's because I expect
those to be removed in the future when we eliminate the AST locale code
in favor of the platform implementation.

Resolves att#742
siteshwar added a commit to siteshwar/ast that referenced this issue Aug 1, 2018
siteshwar added a commit that referenced this issue Aug 1, 2018

@krader1961 krader1961 closed this in 8dc36a9 Aug 1, 2018

@krader1961

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 7, 2018

While working on #579 to remove the AST locale subsystem I stumbled upon src/lib/libast/include/ast_ccode.h which contains, among other things, this:

#define CC_ASCII 1    /* ISO-8859-1            */
#define CC_EBCDIC_E 2 /* Xopen dd(1) EBCDIC        */
#define CC_EBCDIC_I 3 /* Xopen dd(1) IBM        */
#define CC_EBCDIC_O 4 /* IBM-1047 mvs OpenEdition    */
#define CC_EBCDIC_S 5 /* Siemens posix-bc        */
#define CC_EBCDIC_H 6 /* IBM-37 AS/400        */
#define CC_EBCDIC_M 7 /* IBM mvs cobol        */
#define CC_EBCDIC_U 8 /* microfocus cobol        */

So the situation is even worse than I remembered vis-a-vis incompatible EBCDIC definitions.

My guess is that early in the life of this code there was a strong effort made to support EBCDIC but over time idioms, like those in my previous comment, which only work with ASCII, crept in. When I'm done with #579 I'll remove the ast_ccode.h and uses of the symbols it defines.

@krader1961 krader1961 reopened this Aug 7, 2018

krader1961 added a commit to krader1961/ast that referenced this issue Aug 8, 2018
Remove remaining AST_LC_* symbols
Note that the big block removed from src/cmd/ksh93/sh/init.c is for
building on EBCDIC systems which we no longer support (see att#742).

Related att#579
krader1961 added a commit to krader1961/ast that referenced this issue Aug 8, 2018
More removal of EBCDIC system support
This only eliminates the uses of CC_NATIVE which clearly are due to
determining if we're not building on an EBCDIC system. Removing that,
and related, symbols and header files will occur in a future change
because I need to explore if the AST iconv implementation can be
eliminated.

Related att#742
krader1961 added a commit to krader1961/ast that referenced this issue Aug 10, 2018
More removal of EBCDIC system support
This only eliminates the uses of CC_NATIVE which clearly are due to
determining if we're not building on an EBCDIC system. Removing CC_NATIVe
in other contexts, and related symbols and header files, will occur in
a future change.

Related att#742
krader1961 added a commit to krader1961/ast that referenced this issue Aug 11, 2018
Remove all support for building on EBCDIC systems
This should remove the remaining code for building on platforms (e.g.,
IBM mainframes) that use EBCDIC as its native encoding.

Resolves att#742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.