Fix fatal sched_setaffinity EPERM failure during PAL initialization on restricted kernels#125361
Merged
janvorli merged 2 commits intodotnet:mainfrom Mar 10, 2026
Merged
Fix fatal sched_setaffinity EPERM failure during PAL initialization on restricted kernels#125361janvorli merged 2 commits intodotnet:mainfrom
janvorli merged 2 commits intodotnet:mainfrom
Conversation
Contributor
Author
|
cc @janvorli |
Pietrodjaowjao
added a commit
to All-Of-Us-Mods/runtime
that referenced
this pull request
Mar 10, 2026
…n restricted kernels (dotnet#125361) **Problem** On Android devices running vendor-modified kernel 5.10 with strict SELinux policy, `coreclr_initialize` silently fails with `HRESULT 0x8007054F` (`ERROR_INTERNAL_ERROR`). No diagnostic output is produced, making this extremely difficult to debug. The failure affects a broad category of devices from multiple OEMs shipping Android 13 on kernel 5.10 GKI bases. **Root Cause** During PAL thread initialization in `thread.cpp`, the runtime calls `sched_getaffinity` to retrieve the current thread's CPU affinity mask, then immediately calls `sched_setaffinity(0, ...)` to re-apply it. On these kernels, `sched_setaffinity` returns `EPERM` even when passed a mask that was just obtained from `sched_getaffinity` on the same thread. The existing error handling unconditionally treats any failure as fatal: ```cpp st = sched_setaffinity(0, sizeof(cpu_set_t), &cpuSet); if (st != 0) { ASSERT("sched_setaffinity failed!\n"); palError = ERROR_INTERNAL_ERROR; goto fail; } ``` This aborts PAL initialization entirely, returning `ERROR_INTERNAL_ERROR` to the caller with no further diagnostics. **Fix** Treat `EPERM` and `EACCES` from `sched_setaffinity` as non-fatal, logging a warning and continuing. The thread will run on any available CPU rather than the originally affinitized one, which is acceptable behavior in restricted environments. Any other error code still triggers the existing assert and hard failure. This is consistent with the snap confinement workarounds already documented and applied in the same file and in `gcenv.unix.cpp`, which handle similar `sched_setaffinity` restrictions under snap's default strict confinement. This PR extends that handling to cover additional restricted environments. **Testing** Verified on an Android 13 device with a vendor-modified kernel 5.10, where `coreclr_initialize` previously failed with `0x8007054F`. After this fix, initialization succeeds. Devices running kernel 6.1 (Android 14) are unaffected. **References** - Related snap issue: dotnet#1634 (cherry picked from commit b78173c)
Contributor
Author
|
Thanks for merging! Could a .NET 10 backport be considered? |
Contributor
Author
|
/backport to release/10.0 |
Contributor
|
Started backporting to |
Contributor
|
@Pietrodjaowjao an error occurred while backporting to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Android devices running vendor-modified kernel 5.10 with strict SELinux policy,
coreclr_initializesilently fails withHRESULT 0x8007054F(ERROR_INTERNAL_ERROR). No diagnostic output is produced, making this extremely difficult to debug. The failure affects a broad category of devices from multiple OEMs shipping Android 13 on kernel 5.10 GKI bases.Root Cause
During PAL thread initialization in
thread.cpp, the runtime callssched_getaffinityto retrieve the current thread's CPU affinity mask, then immediately callssched_setaffinity(0, ...)to re-apply it. On these kernels,sched_setaffinityreturnsEPERMeven when passed a mask that was just obtained fromsched_getaffinityon the same thread. The existing error handling unconditionally treats any failure as fatal:This aborts PAL initialization entirely, returning
ERROR_INTERNAL_ERRORto the caller with no further diagnostics.Fix
Treat
EPERMandEACCESfromsched_setaffinityas non-fatal, logging a warning and continuing. The thread will run on any available CPU rather than the originally affinitized one, which is acceptable behavior in restricted environments. Any other error code still triggers the existing assert and hard failure.This is consistent with the snap confinement workarounds already documented and applied in the same file and in
gcenv.unix.cpp, which handle similarsched_setaffinityrestrictions under snap's default strict confinement. This PR extends that handling to cover additional restricted environments.Testing
Verified on an Android 13 device with a vendor-modified kernel 5.10, where
coreclr_initializepreviously failed with0x8007054F. After this fix, initialization succeeds. Devices running kernel 6.1 (Android 14) are unaffected.References