Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Use unsigned char when computing UTF8 string hashes #83273

Merged
merged 6 commits into from
Mar 13, 2023

Conversation

lambdageek
Copy link
Member

The C standard does not specify whether char is signed or unsigned,
it is implementation defined.

Apparently Android aarch64 makes a different choice than other
platforms (at least macOS arm64 and Windows x64 give different
results).

Mono uses mono_metadata_str_hash in the AOT compiler and AOT runtime
to optimize class name lookup. As a result, classes whose names
include UTF-8 continuation bytes (with the high bit = 1) will hash
differently in the AOT compiler and on the device.

Fixes #82187
Fixes #78638


Also add DEBUG_AOT_NAME_TABLE ifdef (undef'd by default) for debugging name hashing issues

AOT compiler: Emits a second "class_name_table_debug" symbol that has all the class
names and hashes as strings.

AOT runtime: warns if a class is not found in the name cache

The C standard does not specify whether `char` is signed or unsigned,
it is implementation defined.

Apparently Android aarch64 makes a different choice than other
platforms (at least macOS arm64 and Windows x64 give different
results).

Mono uses `mono_metadata_str_hash` in the AOT compiler and AOT runtime
to optimize class name lookup.  As a result, classes whose names
include UTF-8 continuation bytes (with the high bit = 1) will hash
differently in the AOT compiler and on the device.

Fixes dotnet#82187
Fixes dotnet#78638
AOT compiler: Emits a second "class_name_table_debug" symbol that has all the class
names and hashes as strings.

AOT runtime: warns if a class is not found in the name cache
@lambdageek
Copy link
Member Author

We will need to backport this to net7, net6 and mono 2020-02

@lambdageek
Copy link
Member Author

I want to add a regression test here, but we don't run Android-arm64 device tests with AOT, AFAICT. We run libraries tests with the minijit, but we don't run runtime tests at all.

I'm going to check if linux arm64 repros this issue - that might be good enough for the purposes of adding a regression test

@lewing
Copy link
Member

lewing commented Mar 10, 2023

see the discussion here #82833 (comment)

@lambdageek
Copy link
Member Author

verified that linux-arm64 has char == unsigned char while macos-arm64 has char == signed char. But realized that might not be good enough for regression testing - if I run both the AOT compiler and the AOT runtime on the same machine the answer will always be the same. I really need different platforms. (or hardcode a unit test for mono_metadata_str_hash but we don't really have unit tests)

@lambdageek lambdageek added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 10, 2023
@lambdageek
Copy link
Member Author

Temporarily reverted #59772 and turned on AOT compilation for the runtime tests on Android devices.

@lambdageek
Copy link
Member Author

see the discussion here #82833 (comment)

@lewing I don't totally follow how that's related. In this case the issue is not the project or assembly names, it's the names of classes inside an assembly. They're kind of related problems, but in this case even if all the filesystem names are ascii, there are still problems

@lambdageek
Copy link
Member Author

/azp run runtime-extra-platforms-android

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@lambdageek
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lambdageek
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lambdageek lambdageek removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 11, 2023
@lambdageek

This comment was marked as outdated.

@lambdageek

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as resolved.

@github-actions

This comment was marked as resolved.

@github-actions

This comment was marked as resolved.

@github-actions

This comment was marked as resolved.

@lambdageek
Copy link
Member Author

/azp run sync-runtime-to-mono

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@lambdageek
Copy link
Member Author

mono/mono main backport is mono/mono#21632

@teo-tsirpanis
Copy link
Contributor

(reassigned to area-VM-meta-mono, the only AOT changes are debug code)

akoeplinger pushed a commit to mono/mono that referenced this pull request Mar 13, 2023
Backport of dotnet/runtime#83273 to mono/mono `main`

The C standard does not specify whether `char` is signed or unsigned, it is implementation defined.

Apparently Android aarch64 makes a different choice than other platforms (at least macOS arm64 and Windows x64 give different results).

Mono uses `mono_metadata_str_hash` in the AOT compiler and AOT runtime to optimize class name lookup.  As a result, classes whose names include UTF-8 continuation bytes (with the high bit = 1) will hash differently in the AOT compiler and on the device.

Contributes to dotnet/runtime#82187 
Contributes to dotnet/runtime#78638
akoeplinger pushed a commit to mono/mono that referenced this pull request Mar 13, 2023
Backport of dotnet/runtime#83273 to mono/mono `2020-02`

The C standard does not specify whether `char` is signed or unsigned, it is implementation defined.

Apparently Android aarch64 makes a different choice than other platforms (at least macOS arm64 and Windows x64 give different results).

Mono uses `mono_metadata_str_hash` in the AOT compiler and AOT runtime to optimize class name lookup.  As a result, classes whose names include UTF-8 continuation bytes (with the high bit = 1) will hash differently in the AOT compiler and on the device.

Contributes to dotnet/runtime#82187 
Contributes to dotnet/runtime#78638
@lambdageek lambdageek merged commit d3f7d59 into dotnet:main Mar 13, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants