Skip to content

Fix fatal sched_setaffinity EPERM failure during PAL initialization on restricted kernels#125361

Merged
janvorli merged 2 commits intodotnet:mainfrom
Pietrodjaowjao:main
Mar 10, 2026
Merged

Fix fatal sched_setaffinity EPERM failure during PAL initialization on restricted kernels#125361
janvorli merged 2 commits intodotnet:mainfrom
Pietrodjaowjao:main

Conversation

@Pietrodjaowjao
Copy link
Copy Markdown
Contributor

Problem

On Android devices running vendor-modified kernel 5.10 with strict SELinux policy, coreclr_initialize silently fails with HRESULT 0x8007054F (ERROR_INTERNAL_ERROR). No diagnostic output is produced, making this extremely difficult to debug. The failure affects a broad category of devices from multiple OEMs shipping Android 13 on kernel 5.10 GKI bases.

Root Cause

During PAL thread initialization in thread.cpp, the runtime calls sched_getaffinity to retrieve the current thread's CPU affinity mask, then immediately calls sched_setaffinity(0, ...) to re-apply it. On these kernels, sched_setaffinity returns EPERM even when passed a mask that was just obtained from sched_getaffinity on the same thread. The existing error handling unconditionally treats any failure as fatal:

st = sched_setaffinity(0, sizeof(cpu_set_t), &cpuSet);
if (st != 0)
{
    ASSERT("sched_setaffinity failed!\n");
    palError = ERROR_INTERNAL_ERROR;
    goto fail;
}

This aborts PAL initialization entirely, returning ERROR_INTERNAL_ERROR to the caller with no further diagnostics.

Fix

Treat EPERM and EACCES from sched_setaffinity as non-fatal, logging a warning and continuing. The thread will run on any available CPU rather than the originally affinitized one, which is acceptable behavior in restricted environments. Any other error code still triggers the existing assert and hard failure.

This is consistent with the snap confinement workarounds already documented and applied in the same file and in gcenv.unix.cpp, which handle similar sched_setaffinity restrictions under snap's default strict confinement. This PR extends that handling to cover additional restricted environments.

Testing

Verified on an Android 13 device with a vendor-modified kernel 5.10, where coreclr_initialize previously failed with 0x8007054F. After this fix, initialization succeeds. Devices running kernel 6.1 (Android 14) are unaffected.

References

@github-actions github-actions Bot added the area-PAL-coreclr only for closed issues label Mar 10, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 10, 2026
@Pietrodjaowjao
Copy link
Copy Markdown
Contributor Author

cc @janvorli

Copy link
Copy Markdown
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@janvorli janvorli merged commit b78173c into dotnet:main Mar 10, 2026
105 checks passed
Pietrodjaowjao added a commit to All-Of-Us-Mods/runtime that referenced this pull request Mar 10, 2026
…n restricted kernels (dotnet#125361)

**Problem**

On Android devices running vendor-modified kernel 5.10 with strict
SELinux policy, `coreclr_initialize` silently fails with `HRESULT
0x8007054F` (`ERROR_INTERNAL_ERROR`). No diagnostic output is produced,
making this extremely difficult to debug. The failure affects a broad
category of devices from multiple OEMs shipping Android 13 on kernel
5.10 GKI bases.

**Root Cause**

During PAL thread initialization in `thread.cpp`, the runtime calls
`sched_getaffinity` to retrieve the current thread's CPU affinity mask,
then immediately calls `sched_setaffinity(0, ...)` to re-apply it. On
these kernels, `sched_setaffinity` returns `EPERM` even when passed a
mask that was just obtained from `sched_getaffinity` on the same thread.
The existing error handling unconditionally treats any failure as fatal:

```cpp
st = sched_setaffinity(0, sizeof(cpu_set_t), &cpuSet);
if (st != 0)
{
    ASSERT("sched_setaffinity failed!\n");
    palError = ERROR_INTERNAL_ERROR;
    goto fail;
}
```

This aborts PAL initialization entirely, returning
`ERROR_INTERNAL_ERROR` to the caller with no further diagnostics.

**Fix**

Treat `EPERM` and `EACCES` from `sched_setaffinity` as non-fatal,
logging a warning and continuing. The thread will run on any available
CPU rather than the originally affinitized one, which is acceptable
behavior in restricted environments. Any other error code still triggers
the existing assert and hard failure.

This is consistent with the snap confinement workarounds already
documented and applied in the same file and in `gcenv.unix.cpp`, which
handle similar `sched_setaffinity` restrictions under snap's default
strict confinement. This PR extends that handling to cover additional
restricted environments.

**Testing**

Verified on an Android 13 device with a vendor-modified kernel 5.10,
where `coreclr_initialize` previously failed with `0x8007054F`. After
this fix, initialization succeeds. Devices running kernel 6.1 (Android
14) are unaffected.

**References**

- Related snap issue: dotnet#1634

(cherry picked from commit b78173c)
@Pietrodjaowjao
Copy link
Copy Markdown
Contributor Author

Thanks for merging! Could a .NET 10 backport be considered?

@Pietrodjaowjao
Copy link
Copy Markdown
Contributor Author

/backport to release/10.0

@github-actions
Copy link
Copy Markdown
Contributor

Started backporting to release/10.0 (link to workflow run)

@github-actions
Copy link
Copy Markdown
Contributor

@Pietrodjaowjao an error occurred while backporting to release/10.0. See the workflow output for details.

@github-actions github-actions Bot locked and limited conversation to collaborators Apr 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-PAL-coreclr only for closed issues community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants