Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm32: System.Threading.Tasks.Dataflow.Tests failing with NRE #80857

Closed
jkotas opened this issue Jan 19, 2023 · 4 comments
Closed

Arm32: System.Threading.Tasks.Dataflow.Tests failing with NRE #80857

jkotas opened this issue Jan 19, 2023 · 4 comments
Labels
arch-arm32 area-System.Threading.Tasks blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' bug Known Build Error Use this to report build issues in the .NET Helix tab

Comments

@jkotas
Copy link
Member

jkotas commented Jan 19, 2023

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=140623
Build error leg or test failing: System.Threading.Tasks.Dataflow.Tests.WorkItemExecution
Pull request: #80323

Error Message

Fill the error message using known issues guidance.

{
  "ErrorMessage": "System.Threading.Tasks.ConcurrentExclusiveSchedulerPair.ProcessConcurrentTasks",
  "BuildRetry": false
}

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@jkotas jkotas added arch-arm32 blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Jan 19, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 19, 2023
@ghost
Copy link

ghost commented Jan 19, 2023

Tagging subscribers to this area: @dotnet/area-system-threading-tasks
See info in area-owners.md if you want to be subscribed.

Issue Details

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=140623
Build error leg or test failing: System.Threading.Tasks.Dataflow.Tests.WorkItemExecution
Pull request: #80323

Error Message

Fill the error message using known issues guidance.

{
  "ErrorMessage": "System.Threading.Tasks.ConcurrentExclusiveSchedulerPair.ProcessConcurrentTasks",
  "BuildRetry": false
}
Author: jkotas
Assignees: -
Labels:

arch-arm32, area-System.Threading.Tasks, blocking-clean-ci, untriaged, Known Build Error

Milestone: -

@stephentoub
Copy link
Member

According to the stack trace, the NRE is occurring here:

for (int i = 0; i < m_maxItemsPerTask; i++)
{
// Get the next available concurrent task. If we can't find one, bail.
if (!m_concurrentTaskScheduler.m_tasks.TryDequeue(out Task? concurrentTask)) break;
// Execute the task. If the scheduler was previously faulted,
// this task could have been faulted when it was queued; ignore such tasks.
if (!concurrentTask.IsFaulted) m_concurrentTaskScheduler.ExecuteTask(concurrentTask);
// Now check to see if exclusive tasks have arrived; if any have, they take priority
// so we'll bail out here. Note that we could have checked this condition
// in the for loop's condition, but that could lead to extra overhead
// in the case where a concurrent task arrives, this task is launched, and then
// before entering the loop an exclusive task arrives. If we didn't execute at
// least one task, we would have spent all of the overhead to launch a
// task but with none of the benefit. There's of course also an inherent
// race condition here with regards to exclusive tasks arriving, and we're ok with
// executing one more concurrent task than we should before giving priority to exclusive tasks.
if (!m_exclusiveTaskScheduler.m_tasks.IsEmpty) break;
}

on the line:

if (!concurrentTask.IsFaulted) m_concurrentTaskScheduler.ExecuteTask(concurrentTask); 

The only state being read here are readonly fields initialized in the ctor or the concurrentTask that's read from the queue. The queue is either a ConcurrentQueue<T> or a:
https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/System/Collections/Concurrent/SingleProducerSingleConsumerQueue.cs
It's not clear from the stack trace what exact test is failing and thus which queue is in use, but my money is on there either being a bug around the use (or lack thereof) of volatile accesses in the SPSCQ or a codegen bug on arm32, resulting in the concurrentTask dequeued from the queue being read as null.

@stephentoub stephentoub added bug and removed untriaged New issue has not been triaged by the area owner labels Jan 22, 2023
@VSadov
Copy link
Member

VSadov commented Apr 4, 2023

failed in #84151

@ericstj
Copy link
Member

ericstj commented Aug 8, 2023

codegen bug on arm32, resulting in the concurrentTask dequeued from the queue being read as null.

Given we have zero recent hits of this I'm leaning towards this as the cause. Closing this as no-repo. Reactivate if the failures show up again.

@ericstj ericstj closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm32 area-System.Threading.Tasks blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' bug Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests

4 participants