Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Flaky failures withApplication orchestrator dependency check returned an error: The operation has timed out #5195

Open
radical opened this issue Aug 6, 2024 · 7 comments
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication blocking-clean-ci Blocking a green CI testing ☑️
Milestone

Comments

@radical
Copy link
Member

radical commented Aug 6, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=765658
Build error leg or test failing: Aspire.Hosting.Testing.Tests.TestingBuilderTests.SetsCorrectContentRoot(genericEntryPoint: False)
Pull request: #5129

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "Application orchestrator dependency check returned an error: The operation has timed out",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=765658
Error message validated: [Application orchestrator dependency check returned an error: The operation has timed out]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 8/6/2024 4:24:19 AM UTC

Report

Build Definition Test Pull Request
854941 dotnet/aspire Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages
852790 dotnet/aspire Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages
852786 dotnet/aspire Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages #6451
851665 dotnet/aspire Aspire.Playground.Tests.AppHostTests.Aspire.Playground.Tests.AppHostTests.TestEndpointsReturnOk #6451
851451 dotnet/aspire Aspire.Hosting.Elasticsearch.Tests.WorkItemExecution #6458

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 5
@radical
Copy link
Member Author

radical commented Aug 6, 2024

@radical radical added area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication testing ☑️ and removed area-engineering-systems labels Aug 6, 2024
@radical
Copy link
Member Author

radical commented Aug 6, 2024

There doesn't seem to be any output from the dcp-info command. Multiple tests failed in the linked test run.

public async Task<DcpInfo?> GetDcpInfoAsync(CancellationToken cancellationToken = default)
{
await _lock.WaitAsync(cancellationToken).ConfigureAwait(false);

Could this be due to the process exiting quickly causing us to miss some output?
I think it would be useful to add some more info to the exception message like the _dcpOptions.DependencyCheckTimeout. Also, cancellationToken is passed to the method so it would be useful to differentiate a timeout due to the token being cancelled, or the timeout being hit.

@radical
Copy link
Member Author

radical commented Aug 6, 2024

cc @eerhardt @sebastienros @karolz-ms

related - #4640

@mitchdenny mitchdenny added this to the Backlog milestone Aug 6, 2024
@karolz-ms
Copy link
Member

Is the dependency check timeout set to default value (which is 25 seconds) during test runs?

If so, as a first step I would bump it up to 180 i.e. 3 minutes. This can be done via startup parameters, or, probably more conveniently, via an environment variable https://github.com/dotnet/aspire/blob/main/src/Aspire.Hosting/Dcp/DcpOptions.cs#L155 DOTNET_ASPIRE_DEPENDENCY_CHECK_TIMEOUT. The reason is there is probably a lot going on on the machine when the tests are starting; I have seen very long delays that had to do with AV scanning etc and nothing with the test itself.

If that does not help, we can tweak the dependency check logic further.

radical added a commit to radical/aspire that referenced this issue Aug 6, 2024
Prompted by dotnet#5195 showing
frequent dcp-info timeouts. Based on
dotnet#5195 (comment) ,
bump the timeout to 3 minutes from the default 25 seconds.

Issue: dotnet#5195
eerhardt pushed a commit that referenced this issue Aug 6, 2024
…5204)

Prompted by #5195 showing
frequent dcp-info timeouts. Based on
#5195 (comment) ,
bump the timeout to 3 minutes from the default 25 seconds.

Issue: #5195
@radical
Copy link
Member Author

radical commented Aug 20, 2024

The timeout bump seems to have helped this. Closing it for now.

@radical radical closed this as completed Aug 20, 2024
@eerhardt
Copy link
Member

I just saw this on the 8.2 branch:

https://dev.azure.com/dnceng-public/public/_build/results?buildId=791699&view=logs&jobId=acc6f692-388e-5b31-5997-e154da29f5b3&j=acc6f692-388e-5b31-5997-e154da29f5b3&t=59fafa2a-4be0-5f74-c845-0779cc54f4a3

https://helixre107v0xd1eu3ibi6ka.blob.core.windows.net/dotnet-aspire-refs-pull-5477-merge-c9cb6937fcd94fb483/Aspire.Hosting.Elasticsearch.Tests/1/console.80efd9c6.log?helixlogtype=result

info: Aspire.Hosting.DistributedApplication[0]
      Aspire version: 8.2.0
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application starting.
info: Aspire.Hosting.DistributedApplication[0]
      Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
fail: Microsoft.Extensions.Hosting.Internal.Host[11]
      Hosting failed to start
      Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check returned an error: The operation has timed out. 
      
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
         at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
         at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
         at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
         at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
[xUnit.net 00:08:07.13]     Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource [FAIL]
[xUnit.net 00:08:07.16]       Aspire.Hosting.DistributedApplicationException : Application orchestrator dependency check returned an error: The operation has timed out. 
[xUnit.net 00:08:07.16]       
[xUnit.net 00:08:07.16]       Stack Trace:
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs(116,0): at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs(130,0): at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/Dcp/DcpHostService.cs(67,0): at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]            at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
[xUnit.net 00:08:07.18]            at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
[xUnit.net 00:08:07.18]            at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/src/Aspire.Hosting/DistributedApplication.cs(274,0): at Aspire.Hosting.DistributedApplication.StartAsync(CancellationToken cancellationToken)
[xUnit.net 00:08:07.18]         /_/tests/Aspire.Hosting.Elasticsearch.Tests/ElasticsearchFunctionalTests.cs(40,0): at Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource()
[xUnit.net 00:08:07.18]         --- End of stack trace from previous location ---
[xUnit.net 00:08:07.18]       Output:
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Aspire version: 8.2.0
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Distributed application starting.
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
[xUnit.net 00:08:07.18]         | [2024-08-28T22:12:56] Microsoft.Extensions.Hosting.Internal.Host Error: Hosting failed to start
[xUnit.net 00:08:07.18]         | Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check returned an error: The operation has timed out. 
[xUnit.net 00:08:07.18]         |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
[xUnit.net 00:08:07.18]         |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
[xUnit.net 00:08:07.18]         |    at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
[xUnit.net 00:08:07.18]         |    at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
[xUnit.net 00:08:07.18]         |    at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
[xUnit.net 00:08:07.29]   Finished:    Aspire.Hosting.Elasticsearch.Tests
Data collector 'Blame' message: All tests finished running, Sequence file will not be generated.
  Failed Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource [30 s]
  Error Message:
   Aspire.Hosting.DistributedApplicationException : Application orchestrator dependency check returned an error: The operation has timed out. 

  Stack Trace:
     at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
   at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
   at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
   at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Aspire.Hosting.DistributedApplication.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/DistributedApplication.cs:line 274
   at Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.VerifyElasticsearchResource() in /_/tests/Aspire.Hosting.Elasticsearch.Tests/ElasticsearchFunctionalTests.cs:line 40
--- End of stack trace from previous location ---
  Standard Output Messages:
 | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Aspire version: 8.2.0
 | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Distributed application starting.
 | [2024-08-28T22:12:30] Aspire.Hosting.DistributedApplication Information: Application host directory is: /mnt/vss/_work/1/s/tests/Aspire.Hosting.Tests
 | [2024-08-28T22:12:56] Microsoft.Extensions.Hosting.Internal.Host Error: Hosting failed to start
 | Aspire.Hosting.DistributedApplicationException: Application orchestrator dependency check returned an error: The operation has timed out. 
 |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 116
 |    at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 130
 |    at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 67
 |    at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
 |    at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)


Results File: /datadisks/disk1/work/B77F0977/w/A3C608F8/uploads/logs/TestResults.trx

Test Run Failed.
Total tests: 15
     Passed: 14
     Failed: 1
 Total time: 8.1511 Minutes

@github-actions github-actions bot locked and limited conversation to collaborators Sep 30, 2024
@JamesNK JamesNK reopened this Oct 10, 2024
@JamesNK
Copy link
Member

JamesNK commented Oct 10, 2024

Test on main failed with this error.

Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages(useVolume: False)

Aspire.Hosting.DistributedApplicationException : Application orchestrator dependency check returned an error: The operation has timed out.
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 111
   at Aspire.Hosting.Dcp.DcpDependencyCheck.GetDcpInfoAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpDependencyCheck.cs:line 125
   at Aspire.Hosting.Dcp.DcpHostService.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/DcpHostService.cs:line 71
   at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
   at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Aspire.Hosting.DistributedApplication.StartAsync(CancellationToken cancellationToken) in /_/src/Aspire.Hosting/DistributedApplication.cs:line 274
   at Aspire.Hosting.Elasticsearch.Tests.ElasticsearchFunctionalTests.WithDataShouldPersistStateBetweenUsages(Boolean useVolume) in /_/tests/Aspire.Hosting.Elasticsearch.Tests/ElasticsearchFunctionalTests.cs:line 149
--- End of stack trace from previous location ---

https://dev.azure.com/dnceng-public/public/_build/results?buildId=837103&view=results

Aspire.Hosting.Elasticsearch.Tests.zip

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication blocking-clean-ci Blocking a green CI testing ☑️
Projects
None yet
Development

No branches or pull requests

5 participants