Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit" #4754

xiaoxiang781216 · 2021-10-31T17:45:10Z

Summary

It's better to apply the default compiler option to improve the compatibility
This reverts commit 3fc06ff.
Here is the output from x64 gcc:

gcc -dM -E - < /dev/null | grep -i wchar
#define __WCHAR_MAX__ 0x7fffffff
#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
#define __WCHAR_WIDTH__ 32
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __WCHAR_TYPE__ int
#define __SIZEOF_WCHAR_T__ 4

And arm gcc:

arm-none-eabi-gcc -dM -E - < /dev/null | grep -i wchar
#define __ARM_SIZEOF_WCHAR_T 4
#define __WCHAR_MAX__ 0xffffffffU
#define __WCHAR_MIN__ 0U
#define __WCHAR_WIDTH__ 32
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 1
#define __WCHAR_TYPE__ unsigned int
#define __SIZEOF_WCHAR_T__ 4

Impact

sim

Testing

pass CI and ostest

yamt · 2021-10-31T22:55:14Z

It's better to apply the default compiler option to improve the compatibility

compatibility with what?

xiaoxiang781216 · 2021-11-01T02:41:16Z

It's better to apply the default compiler option to improve the compatibility

compatibility with what?

3rd party application, sorry I can't disclose the name publicly.
The major reason to revert this patch is that it's wrong to make the different size of wchar_t between sim and real device(e.g. arm) as I list in the summary.

xiaoxiang781216 · 2021-11-01T02:53:33Z

From https://pubs.opengroup.org/onlinepubs/007908799/xsh/stddef.h.html:

wchar_t
Integral type whose range of values can represent distinct wide-character codes for all members of the largest character set specified among the locales supported by the compilation environment: the null character has the code value 0 and each member of the Portable Character Set has a code value equal to its value when used as the lone character in an integer character constant.

wchar_t require to save all possible character encoding.
And from https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html:

Data type: wchar_t
This data type is used as the base type for wide character strings. In other words, arrays of objects of this type are the equivalent of char[] for multibyte character strings. The type is defined in stddef.h.

The ISO C90 standard, where wchar_t was introduced, does not say anything specific about the representation. It only requires that this type is capable of storing all elements of the basic character set. Therefore it would be legitimate to define wchar_t as char, which might make sense for embedded systems.

But in the GNU C Library wchar_t is always 32 bits wide and, therefore, capable of representing all UCS-4 values and, therefore, covering all of ISO 10646. Some Unix systems define wchar_t as a 16-bit type and thereby follow Unicode very strictly. This definition is perfectly fine with the standard, but it also means that to represent all characters from Unicode and ISO 10646 one has to use UTF-16 surrogate characters, which is in fact a multi-wide-character encoding. But resorting to multi-wide-character encoding contradicts the purpose of the wchar_t type.

To cover all possible Unicode encoding, wchar_t require at least 4 bytes.

So, from the standard perspective, this patch should be revert too.

yamt · 2021-11-01T03:07:45Z

It's better to apply the default compiler option to improve the compatibility

compatibility with what?

3rd party application, sorry I can't disclose the name publicly.

what assumption does the app make?

The major reason to revert this patch is that it's wrong to make the different size of wchar_t between sim and real device(e.g. arm) as I list in the summary.

how about using -fshort-wchar for arm then?

yamt · 2021-11-01T03:13:52Z

From https://pubs.opengroup.org/onlinepubs/007908799/xsh/stddef.h.html:

wchar_t
Integral type whose range of values can represent distinct wide-character codes for all members of the largest character set specified among the locales supported by the compilation environment: the null character has the code value 0 and each member of the Portable Character Set has a code value equal to its value when used as the lone character in an integer character constant.

wchar_t require to save all possible character encoding. And from https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html:

Data type: wchar_t
This data type is used as the base type for wide character strings. In other words, arrays of objects of this type are the equivalent of char[] for multibyte character strings. The type is defined in stddef.h.

The ISO C90 standard, where wchar_t was introduced, does not say anything specific about the representation. It only requires that this type is capable of storing all elements of the basic character set. Therefore it would be legitimate to define wchar_t as char, which might make sense for embedded systems.

But in the GNU C Library wchar_t is always 32 bits wide and, therefore, capable of representing all UCS-4 values and, therefore, covering all of ISO 10646. Some Unix systems define wchar_t as a 16-bit type and thereby follow Unicode very strictly. This definition is perfectly fine with the standard, but it also means that to represent all characters from Unicode and ISO 10646 one has to use UTF-16 surrogate characters, which is in fact a multi-wide-character encoding. But resorting to multi-wide-character encoding contradicts the purpose of the wchar_t type.

To cover all possible Unicode encoding, wchar_t require at least 4 bytes.

So, from the standard perspective, this patch should be revert too.

no standard mandates unicode or how it's implemented as far as i know.
__STDC_ISO_10646__ is optional.
i feel it makes sense only if we are going to implement unicode wchar_t. are we?

yamt · 2021-11-01T03:14:58Z

anyway, if we are going to revert this, don't forget to update this. https://github.com/apache/incubator-nuttx/blob/b58379b7380dcef3346bf9ff1ac76d40dc094095/include/sys/types.h#L170

xiaoxiang781216 · 2021-11-01T03:15:09Z

It's better to apply the default compiler option to improve the compatibility

compatibility with what?

3rd party application, sorry I can't disclose the name publicly.

what assumption does the app make?

It assume wchar_t is four bytes and then has the enough space to encode UCS4 encoding.

The major reason to revert this patch is that it's wrong to make the different size of wchar_t between sim and real device(e.g. arm) as I list in the summary.

how about using -fshort-wchar for arm then?

short will make it's very hard to process surrogate pair(https://en.wikipedia.org/wiki/UTF-16) as I mention in the last reply. So I don't see there is any benefit to add the seldom used flag(-fshort-wchar), could you explain why you stick with this flag?

yamt · 2021-11-01T03:19:06Z

It's better to apply the default compiler option to improve the compatibility

compatibility with what?

3rd party application, sorry I can't disclose the name publicly.

what assumption does the app make?

It assume wchar_t is four bytes and then has the enough space to encode UCS4 encoding.

The major reason to revert this patch is that it's wrong to make the different size of wchar_t between sim and real device(e.g. arm) as I list in the summary.

how about using -fshort-wchar for arm then?

short will make it's very hard to process surrogate pair(https://en.wikipedia.org/wiki/UTF-16) as I mention in the last reply. So I don't see there is any benefit to add the seldom used flag(-fshort-wchar), could you explain why you stick with this flag?

actually i don't insist.
nuttx wchar_t has been 16 bit. (i don't know why.)
i just haven't seen a good reason to change it, especially when the change increases memory usage.

yamt · 2021-11-01T03:20:17Z

It's better to apply the default compiler option to improve the compatibility

compatibility with what?

3rd party application, sorry I can't disclose the name publicly.

what assumption does the app make?

It assume wchar_t is four bytes and then has the enough space to encode UCS4 encoding.

does it assume __STDC_ISO_10646__?

xiaoxiang781216 · 2021-11-01T05:42:31Z

no standard mandates unicode or how it's implemented as far as i know.

unicode is accepted by many countries as a standard to represent the character worldwide. If we want to support language other English, Unicode is the best option.

__STDC_ISO_10646__ is optional. i feel it makes sense only if we are going to implement unicode wchar_t. are we?

But if we want to support wchar_t, it's better to follow __STDC_ISO_10646__.

anyway, if we are going to revert this, don't forget to update this.

https://github.com/apache/incubator-nuttx/blob/b58379b7380dcef3346bf9ff1ac76d40dc094095/include/sys/types.h#L170

Fix here: 339e54b

actually i don't insist. nuttx wchar_t has been 16 bit. (i don't know why.) i just haven't seen a good reason to change it, especially when the change increases memory usage.

The size increasement should be small because:

The code which use wchar_t is very rare.
wchar_t is builtin type in C++ which mean it is already 4 bytes

yamt · 2021-11-01T05:55:00Z

if your app is just abusing wchar_t where it should use uint32_t, maybe this PR is enough.
otherwise, you likely need a proper implementation of mbrtowc etc.
which is the case?

xiaoxiang781216 · 2021-11-01T06:09:50Z

if your app is just abusing wchar_t where it should use uint32_t, maybe this PR is enough. otherwise, you likely need a proper implementation of mbrtowc etc. which is the case?

Yes, we plan to implement mbrtowc for UTF8 encoding.

It's better to apply the default compiler option to improve the compatibility This reverts commit 3fc06ff.

xiaoxiang781216 · 2021-12-08T13:55:47Z

@yamt please merge this PR if you don't have more concern. #4881 depends on it.

include/sys/types.h

Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>

yamt

lgtm

masayuki2009 · 2021-12-09T11:08:29Z

@xiaoxiang781216

Hmm, the following commit has a problem with uSD card on Spresense.
If I revert the commit locally, it works fine.

nsh> df
  Block    Number
  Size     Blocks       Used   Available Mounted on
 16384     971908       5718      966190 /mnt/sd0
  4096       1024          8        1016 /mnt/spif
     0          0          0           0 /proc
nsh> ls -l /mnt/sd0
/mnt/sd0:
nsh: ls: stat failed: 22

commit 6357523
Author: Xiang Xiao xiaoxiang@xiaomi.com
Date: Mon Nov 1 12:40:51 2021 +0800

arch: Add _wchar_t typedef like other basic types

Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>

masayuki2009 · 2021-12-09T11:14:52Z

@xiaoxiang781216

Hmm, the following commit has a problem with uSD card on Spresense.

@xiaoxiang781216
stm32f4discovery with uSD over SPI has the same problem too.

xiaoxiang781216 · 2021-12-09T11:44:00Z

Ok, we will take a look tonight.

masayuki2009 · 2021-12-10T00:23:15Z

Hmm, the following commit has a problem with uSD card on Spresense.
If I revert the commit locally, it works fine.

@xiaoxiang781216
Because the nuttx/fs/fat uses wchar_t inside, the side effect happened with this PR.
If I added -fshort-wchar to the compile options, it worked again.

However, there are many linker warnings.

arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/thumb/v7e-m+fp/hard/libgcc.a(_udivmoddi4.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/thumb/v7e-m+fp/hard/libgcc.a(_fixunsdfdi.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-w_pow.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-e_pow.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-e_sqrt.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_fabs.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_finite.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_lib_ver.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_nan.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_rint.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_scalbn.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail

yamt · 2021-12-10T01:04:14Z

Hmm, the following commit has a problem with uSD card on Spresense.
If I revert the commit locally, it works fine.

@xiaoxiang781216 Because the nuttx/fs/fat uses wchar_t inside, the side effect happened with this PR. If I added -fshort-wchar to the compile options, it worked again.

However, there are many linker warnings.

arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/thumb/v7e-m+fp/hard/libgcc.a(_udivmoddi4.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/thumb/v7e-m+fp/hard/libgcc.a(_fixunsdfdi.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-w_pow.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-e_pow.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-e_sqrt.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_fabs.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_finite.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_lib_ver.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_nan.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_rint.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail
arm-none-eabi-ld: warning: /opt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libm.a(lib_a-s_scalbn.o) uses 4-byte wchar_t yet the output is to use 2-byte wchar_t; use of wchar_t values across objects may fail

i guess nuttx/fs/fat should use uint16_t instead.
using wchar_t to mean particular encoding (utf16, utf32, or whatever) is a bad practice.

masayuki2009 · 2021-12-10T01:21:30Z

i guess nuttx/fs/fat should use uint16_t instead.
using wchar_t to mean particular encoding (utf16, utf32, or whatever) is a bad practice.

@yamt
Thanks for the comments.
As far as I tried your advice, it worked.
I will create a new PR later.

xiaoxiang781216 requested review from yamt and gustavonihei October 31, 2021 17:47

Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit"

2903a27

It's better to apply the default compiler option to improve the compatibility This reverts commit 3fc06ff.

xiaoxiang781216 mentioned this pull request Dec 8, 2021

libc: Update include/nuttx/input/x11_keysymdef.h to the latest version #4881

Merged

yamt reviewed Dec 8, 2021

View reviewed changes

include/sys/types.h Outdated Show resolved Hide resolved

yamt approved these changes Dec 8, 2021

View reviewed changes

xiaoxiang781216 force-pushed the sim branch from 339e54b to 9922903 Compare December 9, 2021 06:09

arch: Add _wchar_t typedef like other basic types

4fe9d9d

Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>

xiaoxiang781216 force-pushed the sim branch from 9922903 to 4fe9d9d Compare December 9, 2021 06:11

xiaoxiang781216 requested a review from yamt December 9, 2021 06:12

yamt approved these changes Dec 9, 2021

View reviewed changes

yamt merged commit 6357523 into apache:master Dec 9, 2021

xiaoxiang781216 deleted the sim branch December 9, 2021 09:05

masayuki2009 mentioned this pull request Dec 10, 2021

fs: fat: Use uint16_t instead of wchar_t #4968

Merged

xiaoxiang781216 mentioned this pull request May 31, 2022

arch: Define WCHAR_[MIN|MAX] in arch/include/limits.h #6351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit" #4754

Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit" #4754

xiaoxiang781216 commented Oct 31, 2021 •

edited

Loading

yamt commented Oct 31, 2021

xiaoxiang781216 commented Nov 1, 2021 •

edited

Loading

xiaoxiang781216 commented Nov 1, 2021 •

edited

Loading

yamt commented Nov 1, 2021

yamt commented Nov 1, 2021

yamt commented Nov 1, 2021

xiaoxiang781216 commented Nov 1, 2021

yamt commented Nov 1, 2021

yamt commented Nov 1, 2021

xiaoxiang781216 commented Nov 1, 2021 •

edited

Loading

yamt commented Nov 1, 2021 •

edited

Loading

xiaoxiang781216 commented Nov 1, 2021

xiaoxiang781216 commented Dec 8, 2021 •

edited

Loading

yamt left a comment

masayuki2009 commented Dec 9, 2021

masayuki2009 commented Dec 9, 2021 •

edited

Loading

xiaoxiang781216 commented Dec 9, 2021

masayuki2009 commented Dec 10, 2021 •

edited

Loading

yamt commented Dec 10, 2021

masayuki2009 commented Dec 10, 2021

Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit" #4754

Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit" #4754

Conversation

xiaoxiang781216 commented Oct 31, 2021 • edited Loading

Summary

Impact

Testing

yamt commented Oct 31, 2021

xiaoxiang781216 commented Nov 1, 2021 • edited Loading

xiaoxiang781216 commented Nov 1, 2021 • edited Loading

yamt commented Nov 1, 2021

yamt commented Nov 1, 2021

yamt commented Nov 1, 2021

xiaoxiang781216 commented Nov 1, 2021

yamt commented Nov 1, 2021

yamt commented Nov 1, 2021

xiaoxiang781216 commented Nov 1, 2021 • edited Loading

yamt commented Nov 1, 2021 • edited Loading

xiaoxiang781216 commented Nov 1, 2021

xiaoxiang781216 commented Dec 8, 2021 • edited Loading

yamt left a comment

Choose a reason for hiding this comment

masayuki2009 commented Dec 9, 2021

masayuki2009 commented Dec 9, 2021 • edited Loading

xiaoxiang781216 commented Dec 9, 2021

masayuki2009 commented Dec 10, 2021 • edited Loading

yamt commented Dec 10, 2021

masayuki2009 commented Dec 10, 2021

xiaoxiang781216 commented Oct 31, 2021 •

edited

Loading

xiaoxiang781216 commented Nov 1, 2021 •

edited

Loading

xiaoxiang781216 commented Nov 1, 2021 •

edited

Loading

xiaoxiang781216 commented Nov 1, 2021 •

edited

Loading

yamt commented Nov 1, 2021 •

edited

Loading

xiaoxiang781216 commented Dec 8, 2021 •

edited

Loading

masayuki2009 commented Dec 9, 2021 •

edited

Loading

masayuki2009 commented Dec 10, 2021 •

edited

Loading