Skip to content

Fix utf8 conversions for s390x#128394

Closed
saitama951 wants to merge 2 commits into
dotnet:mainfrom
saitama951:fix_treatAsLe
Closed

Fix utf8 conversions for s390x#128394
saitama951 wants to merge 2 commits into
dotnet:mainfrom
saitama951:fix_treatAsLe

Conversation

@saitama951
Copy link
Copy Markdown
Contributor

@saitama951 saitama951 commented May 20, 2026

while adding eventpipe support for s390x in microsoft/perfview I came across some bugs where string utf8_to_utf16 convertsion file was incorrect in the .nettrace file for s390x.

here we set the MINIPAL_TREAT_AS_LITTLE_ENDIAN flags

static
ep_char16_t *
ep_utf8_to_utf16le_string_impl (
const ep_char8_t *str,
size_t len)
{
if (len == 0) {
// Return an empty string if the length is 0
ep_char16_t * empty_str = ep_rt_utf16_string_alloc(1);
if(empty_str == NULL)
return NULL;
*empty_str = '\0';
return empty_str;
}
int32_t flags = MINIPAL_MB_NO_REPLACE_INVALID_CHARS | MINIPAL_TREAT_AS_LITTLE_ENDIAN;
size_t ret = minipal_get_length_utf8_to_utf16 (str, len, flags);
if (ret <= 0)
return NULL;
ep_char16_t * converted_str = ep_rt_utf16_string_alloc(ret + 1);
if (converted_str == NULL)
return NULL;
ret = minipal_convert_utf8_to_utf16 (str, len, (CHAR16_T *)converted_str, ret, flags);
converted_str[ret] = '\0';
return converted_str;
}

and here we invalidate the flag even if the flag is set.
https://github.com/dotnet/runtime/blob/main/src/native/minipal/utf8.c#L1095-L1110

while adding eventpipe support s390x microsoft/perfview
I came across some bugs where string utf8_to_utf16 convertsion file was incorrect
in the .nettrace file for s390x.
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 20, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label May 20, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saitama951
Copy link
Copy Markdown
Contributor Author

cc: @am11

@saitama951
Copy link
Copy Markdown
Contributor Author

@Dotnet-s390x build

@Dotnet-s390x
Copy link
Copy Markdown

Build Queued..

To cancel the current build, please comment:

@Dotnet-s390x cancel

@am11
Copy link
Copy Markdown
Member

am11 commented May 20, 2026

@Dotnet-s390x
Copy link
Copy Markdown

Build Failed
Please check the build logs: http://148.100.85.217:8080/job/dotnet-builds/77/console.

Build Error Summary
Build FAILED.

EXEC : error : Invalid Option: -OPUUT/vT=/lar/jibkien/wnsksorcepaot/dt-neilbu/rdso/eptiarctfaobs/Tej/ILstseAslymbeb/D/nugstetdaan2.rdTe0/ILstseAslymblld. [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]
/var/lib/jenkins/.nuget/packages/microsoft.net.sdk.il/11.0.0-preview.5.26257.113/targets/Microsoft.NET.Sdk.IL.targets(139,5): error MSB3073: The command ""/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/bin/coreclr/linux.s390x.Release/ilasm" -QUIET -NOLOGO -DLL  -OUTPUT="/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/obj/TestILAssembly/Debug/netstandard2.0/TestILAssembly.dll" -KEY="/var/lib/jenkins/.nuget/packages/microsoft.dotnet.arcade.sdk/11.0.0-beta.26257.113/tools/snk/MSFT.snk" "TestILAssembly.il"" exited with code 1. [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]
    0 Warning(s)
    2 Error(s)

Time Elapsed 00:20:42.29
Build failed with exit code 1. Check errors above.

@saitama951
Copy link
Copy Markdown
Contributor Author

The CI threw the following error.

EXEC : error : Invalid Option: -OPUUT/vT=/lar/jibkien/wnsksorcepaot/dt-neilbu/rdso/eptiarctfaobs/Tej/ILstseAslymbeb/D/nugstetdaan2.rdTe0/ILstseAslymblld. [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]

/var/lib/jenkins/.nuget/packages/microsoft.net.sdk.il/11.0.0-preview.5.26257.113/targets/Microsoft.NET.Sdk.IL.targets(139,5): error MSB3073: The command ""/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/bin/coreclr/linux.s390x.Release/ilasm" -QUIET -NOLOGO -DLL  -OUTPUT="/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/obj/TestILAssembly/Debug/netstandard2.0/TestILAssembly.dll" -KEY="/var/lib/jenkins/.nuget/packages/microsoft.dotnet.arcade.sdk/11.0.0-beta.26257.113/tools/snk/MSFT.snk" "TestILAssembly.il"" exited with code 1. **[/var/lib/jenkins/workspace/dotnet-**

now ilasm uses this WideCharToMultiByte function to convert the command line arguments into utf8 strings which in-turn uses minipal_convert_utf16_to_utf8 (which then converts in native endian order) while everything else assumes to use LittleEndian order

WideCharToMultiByte(uCodePage, 0, &argv[i][1], 3, szOpt, sizeof(szOpt), NULL, NULL);

https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/locale/unicode.cpp#L184

@saitama951
Copy link
Copy Markdown
Contributor Author

with all the fixes in place clr.PalTests pass successfully.

.
Finished running PAL tests.

PAL Test Results:
  Passed: 305
  Failed: 0

@saitama951
Copy link
Copy Markdown
Contributor Author

@Dotnet-s390x build

@Dotnet-s390x
Copy link
Copy Markdown

Build Queued..

To cancel the current build, please comment:

@Dotnet-s390x cancel

IN int cchWideChar)
{
INT retval =0;
dwFlags |= MINIPAL_TREAT_AS_LITTLE_ENDIAN;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look right. MultiByteToWideChar should use native endian.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas ,
Based on the previous behavior before PR

MultiByteToWideChar called UTF8ToUnicode

https://github.com/dotnet/runtime/blob/v7.0.0/src/coreclr/pal/src/locale/unicode.cpp#L265

where as UTF8ToUnicode called GetChars

https://github.com/dotnet/runtime/blob/v7.0.0/src/coreclr/pal/src/locale/utf8.cpp#L2880

GetChars already did the endian conversions by-default for big endian systems.

https://github.com/dotnet/runtime/blob/v7.0.0/src/coreclr/pal/src/locale/utf8.cpp#L1938-L1949

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The links you have shared show that MultiByteToWideChar used native endian output. It does not make sense to force little endian output by passing MINIPAL_TREAT_AS_LITTLE_ENDIAN flag.

Looking at the whole change, I think you may be trying to change MINIPAL_TREAT_AS_LITTLE_ENDIAN to mean opposite of what its name says.

Copy link
Copy Markdown
Contributor Author

@saitama951 saitama951 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I'm confused here, I believe the intent of MINIPAL_TREAT_AS_LITTLE_ENDIAN is to treat the data as little endian?

I agree with what you said previously code in the #if BIGENDIAN produces output in native endian.
while the latter should always produce output in little endian always,

I was trying to debug on why eventpipe handler is writing in-correctly into a .nettrace file (LE-only) on big-endian systems (I misunderstood previously)

the raw .nettrace file on big-endian system looks like this

000005f0  00 00 00 00 00 00 00 2f  00 68 00 6d 00 6f 00 61  |......./.h.m.o.a| <----- marker 
00000600  00 73 00 2f 00 65 00 6d  00 61 00 6a 00 6e 00 72  |.s./.e.m.a.j.n.r| <------ marker
00000610  00 6f 00 63 00 2f 00 72  00 6c 00 63 00 65 00 74  |.o.c./.r.l.c.e.t|
00000620  00 6f 00 64 00 2f 00 2d  00 74 00 65 00 6e 00 30  |.o.d./.-.t.e.n.0|
00000630  00 39 00 33 00 73 00 68  00 73 00 2f 00 78 00 64  |.9.3.s.h.s./.x.d|
00000640  00 65 00 72 00 61 00 63  00 69 00 4d 00 2f 00 6f  |.e.r.a.c.i.M./.o|
00000650  00 73 00 6f 00 72 00 4e  00 2e 00 74 00 66 00 6f  |.s.o.r.N...t.f.o|
00000660  00 43 00 54 00 45 00 41  00 2e 00 65 00 72 00 31  |.C.T.E.A...e.r.1|
00000670  00 2f 00 70 00 70 00 2e  00 30 00 2e 00 30 00 65  |./.p.p...0...0.e|
00000680  00 64 00 2d 00 31 00 69  00 6c 00 2f 00 76 00 72  |.d.-.1.i.l./.v.r|
00000690  00 6f 00 63 00 62 00 72  00 6c 00 63 00 65 00 2e  |.o.c.b.r.l.c.e..|
000006a0  00 73 00 6f 00 00 81 03  fc 1c 22 09 00 00 52 00  |.s.o......"...R.|

on the debugger we have this

#0  GetChars (self=0x3ffc9877380, bytes=0x3ffc9877af8 "/home/sanjam/coreclr/dotnet-s390x/shared/Microsoft.NETCore.App/10.0.1-dev/libcoreclr.so", byteCount=87, chars=0x2aa0cbf7d2b, charCount=87)
    at /home/sanjam/runtime/src/native/minipal/utf8.c:1116

#1  0x000003ffacd160fc in minipal_convert_utf8_to_utf16 (source=0x3ffc9877af8 "/home/sanjam/coreclr/dotnet-s390x/shared/Microsoft.NETCore.App/10.0.1-dev/libcoreclr.so", sourceLength=87, destination=0x2aa0cbf7d2b, 
    destinationLength=87, flags=30) at /home/sanjam/runtime/src/native/minipal/utf8.c:2119

#2  0x000003ffacaa16de in g_utf8_to_utf16le_custom_alloc_impl (str=0x3ffc9877af8 "/home/sanjam/coreclr/dotnet-s390x/shared/Microsoft.NETCore.App/10.0.1-dev/libcoreclr.so", len=87, items_read=0x0, items_written=0x0, 
    custom_alloc_func=0x3ffacaa3280 <monoeg_g_fixed_buffer_custom_allocator>, custom_alloc_data=0x3ffc9877628, err=0x0, treatAsLE=true) at /home/sanjam/runtime/src/mono/mono/eglib/giconv.c:260

for example let consider string "e/sa" in "/home/sanjam/coreclr/dotnets390x/shared/Microsoft.NETCore.App/10.0.1-dev/libcoreclr.so"

for this specific code
https://github.com/dotnet/runtime/blob/main/src/native/minipal/utf8.c#L1081-L1125

where as char* pSrc = "e/sa" (truncated for simplicity) = 0x65 0x2f 0x73 0x61
now

int ch = *(int*)pSrc
little - endian                                   big - endian

ch = 0x61732f65                               ch = 0x652f7361
                     *pTarget = (CHAR16_T)(ch & 0x7F);
                    *(pTarget + 1) = (CHAR16_T)((ch >> 8) & 0x7F);
                    *(pTarget + 2) = (CHAR16_T)((ch >> 16) & 0x7F);
                    *(pTarget + 3) = (CHAR16_T)((ch >> 24) & 0x7F);
little-endian                                  big endian 
                                                 
*pTarget	= 65 00    	                *pTarget = 00 61
                                                 
*(pTarget+1) = 2f 00                    *(pTarget+1) = 00 73
                                                 
*(pTarget+2) = 73 00		              *(pTarget+2) = 00 2f
                                                
*(pTarget + 3) = 61 00                  *(pTarget+3) = 00 65

which I think is incorrect on a big endian platform (I think bytes should be in reverse order to comform with the LE format), we can validate this based on the output shown in the raw hex bytes with the marker.

Copy link
Copy Markdown
Member

@jkotas jkotas May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that this does not look right.

I think the code should be doing the following on BE systems for your example:

*pTarget = 00 65                                                 
*(pTarget+1) = 00 2f
*(pTarget+2) = 00 73                                                
*(pTarget+3) = 00 61

Copy link
Copy Markdown
Contributor Author

@saitama951 saitama951 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be this on big endian systems ?

*pTarget = 65 00                                              
*(pTarget+1) = 2f 00
*(pTarget+2) = 73 00                                                
*(pTarget+3) = 61 00

because we are trying to write it down in little-endian utf-16 format. the above one is still in big endian utf-16 format

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas Thank you for you're comments, I will close this and open a new PR with the fixes.

@Dotnet-s390x
Copy link
Copy Markdown

Build Failed
Please check the build logs: http://148.100.85.217:8080/job/dotnet-builds/78/console.

Build Error Summary
Build FAILED.

TestILAssembly.il(13): error : Failed to open key file '/var/lib/jenkins/.nuget/packages/microsoft.dotnet.arcade.sdk/11.0.0-beta.26257.113/tools/snk/MSFT.snk': 0x80070002 [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]
/var/lib/jenkins/.nuget/packages/microsoft.net.sdk.il/11.0.0-preview.5.26257.113/targets/Microsoft.NET.Sdk.IL.targets(139,5): error MSB3073: The command ""/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/bin/coreclr/linux.s390x.Release/ilasm" -QUIET -NOLOGO -DLL  -OUTPUT="/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/obj/TestILAssembly/Debug/netstandard2.0/TestILAssembly.dll" -KEY="/var/lib/jenkins/.nuget/packages/microsoft.dotnet.arcade.sdk/11.0.0-beta.26257.113/tools/snk/MSFT.snk" "TestILAssembly.il"" exited with code 1. [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]
    0 Warning(s)
    2 Error(s)

Time Elapsed 00:21:50.51
Build failed with exit code 1. Check errors above.

@saitama951
Copy link
Copy Markdown
Contributor Author

@Dotnet-s390x build

@Dotnet-s390x
Copy link
Copy Markdown

Build Queued..

To cancel the current build, please comment:

@Dotnet-s390x cancel

@Dotnet-s390x
Copy link
Copy Markdown

Build Failed
Please check the build logs: http://148.100.85.217:8080/job/dotnet-builds/79/console.

Build Error Summary
Build FAILED.

TestILAssembly.il(13): error : Failed to open key file '/var/lib/jenkins/.nuget/packages/microsoft.dotnet.arcade.sdk/11.0.0-beta.26257.113/tools/snk/MSFT.snk': 0x80070002 [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]
/var/lib/jenkins/.nuget/packages/microsoft.net.sdk.il/11.0.0-preview.5.26257.113/targets/Microsoft.NET.Sdk.IL.targets(139,5): error MSB3073: The command ""/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/bin/coreclr/linux.s390x.Release/ilasm" -QUIET -NOLOGO -DLL  -OUTPUT="/var/lib/jenkins/workspace/dotnet-builds/repo/artifacts/obj/TestILAssembly/Debug/netstandard2.0/TestILAssembly.dll" -KEY="/var/lib/jenkins/.nuget/packages/microsoft.dotnet.arcade.sdk/11.0.0-beta.26257.113/tools/snk/MSFT.snk" "TestILAssembly.il"" exited with code 1. [/var/lib/jenkins/workspace/dotnet-builds/repo/src/libraries/Common/tests/System/TestILAssembly/TestILAssembly.ilproj]
    0 Warning(s)
    2 Error(s)

Time Elapsed 00:21:06.39
Build failed with exit code 1. Check errors above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants