Skip to content

Fix false-positive assertion in CheckRegDisplaySP due to stale cached stack limit#126740

Closed
Copilot wants to merge 2 commits intomainfrom
copilot/fix-assert-failure-alt-stack
Closed

Fix false-positive assertion in CheckRegDisplaySP due to stale cached stack limit#126740
Copilot wants to merge 2 commits intomainfrom
copilot/fix-assert-failure-alt-stack

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 10, 2026

On Linux, the main thread's stack grows on demand — the kernel commits new pages below the initial boundary as SP moves lower. m_CacheStackLimit is set once at thread init via pthread_attr_getstack() and never refreshed. Under JIT stress modes (jitminopts, tailcallstress, interpmode1) that drive deeper recursion, GC/profiler stack walks encounter frames with SP below the stale cached limit, triggering a false-positive _ASSERTE in CheckRegDisplaySP.

Description

In CheckRegDisplaySP (#ifdef DEBUG_REGDISPLAY, checked/debug builds only), replace the stale GetCachedStackLimit() with GetStackLowerBound(), which queries the current committed stack boundary dynamically:

// Before
_ASSERTE(...|| PTR_VOID(pRD->SP) >= pRD->_pThread->GetCachedStackLimit());

// After — dynamic query avoids stale cached value on grown stacks
#ifndef DACCESS_COMPILE
_ASSERTE(...|| PTR_VOID(pRD->SP) >= pRD->_pThread->GetStackLowerBound());
#else
_ASSERTE(...|| PTR_VOID(pRD->SP) >= pRD->_pThread->GetCachedStackLimit());
#endif

GetStackLowerBound() is not available in DAC builds (#ifndef DACCESS_COMPILE), so DAC retains the original GetCachedStackLimit() path. No release build impact — the entire function is inside #ifdef DEBUG_REGDISPLAY.

Fixes #124839

Use GetStackLowerBound() instead of GetCachedStackLimit() in the
CheckRegDisplaySP debug assertion. On Linux, the main thread's stack
grows on demand, so the cached limit set at thread initialization can
become stale. GetStackLowerBound() queries the current stack boundary
dynamically, preventing false-positive assertion failures.

Since GetStackLowerBound() is only available in non-DAC builds
(#ifndef DACCESS_COMPILE), DAC builds retain the original
GetCachedStackLimit() check.

Fixes #124839, #125760, #117918

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/62ad3d3e-698b-4e2e-bab3-75d91ecb80c6

Co-authored-by: mangod9 <61718172+mangod9@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 10, 2026 03:37
Copilot AI changed the title [WIP] Fix assert failure in CheckRegDisplaySP function Fix false-positive assertion in CheckRegDisplaySP due to stale cached stack limit Apr 10, 2026
Copilot AI requested a review from mangod9 April 10, 2026 03:38
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

// and the cached limit can be stale if the stack has grown past the initially committed boundary.
// GetStackLowerBound() is only available in non-DAC builds.
#ifndef DACCESS_COMPILE
_ASSERTE(pRD->_pThread->IsExecutingOnAltStack() || PTR_VOID(pRD->SP) >= pRD->_pThread->GetStackLowerBound());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAL is caching the cache limits as well:

CPalThread::GetCachedStackBase()
{
_ASSERT_MSG(this == InternalGetCurrentThread(), "CPalThread::GetStackBase called from foreign thread");
if (m_stackBase == NULL)
{
m_stackBase = GetStackBase();
}
return m_stackBase;
. This is still going to use cached stack limit as far as I can tell.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to pass a bool flag all the way to the PAL to fetch non-cached stack limits.

Also, we make sure to use the cached stack limits most of the time to make this check cheap.

@@ -5687,7 +5687,14 @@ void CheckRegDisplaySP (REGDISPLAY *pRD)
if (pRD->SP && pRD->_pThread)
{
#ifndef NO_FIXED_STACK_LIMIT
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO_FIXED_STACK_LIMIT is working around the same problem, just for Musl. If we find a better way to fix this, we should delete NO_FIXED_STACK_LIMIT.

@janvorli
Copy link
Copy Markdown
Member

It's been a long time since I've implemented the stack limits stuff for Unix, so I don't remember all the details, but I am not sure the copilot analysis is correct. I'll experiment locally to see if it is right or not.

@janvorli
Copy link
Copy Markdown
Member

My feeling was right. On non-MUSL linux, the pthread_attr_getstack returns hard stack limit. Going over (or I should say below) that limit results in stack overflow. On MUSL based linux, there is no hard limit and the stack grows until it cannot grow anymore due to hitting virtual memory range that's already mapped by something or due to OOM when trying to get physical memory page to backup the next stack page.
That's why on MUSL based linux we define the NO_FIXED_STACK_LIMIT.

Looking at the failed legs listed in the actual issue, they were all failing when running with interpreter. So I think it is an issue when we somehow ended up passing a wrong SP to the CheckRegDisplaySP. Could have been due to some issues in the interpreter that were fixed since, as I cannot see recent failures due to this issue.

@mangod9
Copy link
Copy Markdown
Member

mangod9 commented Apr 10, 2026

ok should we close the issue then and wait for a repro? Looks like there was another hit a few weeks ago: #125760 (comment), but doesnt look like the dumps are still available for it.

@janvorli
Copy link
Copy Markdown
Member

I think it would be reasonable to close it and wait for a repro with dump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assert failure: pRD->_pThread->IsExecutingOnAltStack() || PTR_VOID(pRD->SP) >= pRD->_pThread->GetCachedStackLimit()

4 participants