Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Diagnostics.Tests.EventLogSourceCreationTests failing on PRs #36135

Open
safern opened this issue May 8, 2020 · 28 comments
Open

System.Diagnostics.Tests.EventLogSourceCreationTests failing on PRs #36135

safern opened this issue May 8, 2020 · 28 comments
Labels
area-System.Diagnostics.EventLog bug disabled-test The test is disabled in source code against the issue
Milestone

Comments

@safern
Copy link
Member

safern commented May 8, 2020

Console Log Summary

Builds

Build Pull Request Test Failure Count
#634139 #35818 1
#636227 #35857 1
#636339 #36102 1
#636552 #35573 1
#636741 #36111 1
#636785 #36115 1
#636821 #35936 1
#636842 #35790 1
#636870 #36116 2

Configurations

  • net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
  • net5.0-Windows_NT-Debug-x64-Mono_release-Windows.10.Amd64.Server19H1.ES.Open
  • net5.0-Windows_NT-Release-x86-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open

Helix Logs

Build Pull Request Console Core Test Results Run Client
#634139 #35818 console.log testResults.xml run_client.py
#636227 #35857 console.log testResults.xml run_client.py
#636339 #36102 console.log testResults.xml run_client.py
#636552 #35573 console.log testResults.xml run_client.py
#636741 #36111 console.log testResults.xml run_client.py
#636785 #36115 console.log testResults.xml run_client.py
#636821 #35936 console.log testResults.xml run_client.py
#636842 #35790 console.log testResults.xml run_client.py
#636870 #36116 console.log testResults.xml run_client.py
#636870 #36116 console.log testResults.xml run_client.py

I will put up a PR to disable the test.

cc: @dotnet/runtime-infrastructure

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-Infrastructure untriaged New issue has not been triaged by the area owner labels May 8, 2020
@ghost
Copy link

ghost commented May 8, 2020

Tagging subscribers to this area: @ViktorHofer
Notify danmosemsft if you want to be subscribed.

@ghost
Copy link

ghost commented May 8, 2020

Tagging subscribers to this area: @tommcdon, @krwq
Notify danmosemsft if you want to be subscribed.

@stephentoub
Copy link
Member

Presumably this was caused by #35911 ?

@safern
Copy link
Member Author

safern commented May 8, 2020

Hmm, yeah looks suspicious.

These tests intention was to not run them on CI. According to: 48a222c

but given this is potential break, should we reconsider that?

@ViktorHofer
Copy link
Member

but given this is potential break, should we reconsider that?

@danmosemsft

@safern
Copy link
Member Author

safern commented May 8, 2020

Also, another point: 48a222c -- was in February 18th, and I don't think these tests that were left enabled have failed ever since until now.

@safern
Copy link
Member Author

safern commented May 8, 2020

cc: @tarekgh

@tarekgh
Copy link
Member

tarekgh commented May 8, 2020

I'll revert #35911. it is good we have a test catch such issues.

@stephentoub
Copy link
Member

it is good we have a test catch such issues.

Unfortunately that test is being made to not run in CI in #36138. How often are these tests run locally?

@tarekgh
Copy link
Member

tarekgh commented May 8, 2020

could someone approve #36143 and merge to unblock?

@tarekgh
Copy link
Member

tarekgh commented May 8, 2020

How often are these tests run locally?

I have no idea about these tests. @Anipik @maryamariyan may know better.

@tarekgh
Copy link
Member

tarekgh commented May 8, 2020

how often this test used to fail before 48a222c?

@tarekgh tarekgh added bug and removed untriaged New issue has not been triaged by the area owner labels May 8, 2020
@safern
Copy link
Member Author

safern commented May 8, 2020

Looking at the kusto data now and I’ll update the issue.

@safern
Copy link
Member Author

safern commented May 8, 2020

It seems like it has had its hicups, some failures on 4/26, some on 2/13-2/16 and the ones today. Will keep looking at the data to make sure the revert fixed it, let's leave it enabled for now and see how it behaves.

Will leave the issue open to track the data.

@safern safern removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label May 8, 2020
@safern safern self-assigned this May 8, 2020
@safern
Copy link
Member Author

safern commented May 8, 2020

Actually closing the issue and if we see it again we can reopen.

@safern safern closed this as completed May 8, 2020
@safern
Copy link
Member Author

safern commented May 11, 2020

I found an interesting pattern.

It seems like there is a test that is corrupting a machine to get in a weird state and when this test runs in that particular machine after that it fails every time. So until that Machine is recycled to a new one if the test happens to run again on that machine it will fail.

This test has had 4 bad days since February, and all days that we have had a failure it shows the same pattern, it always fails only on the same machine.

cc: @Anipik

@safern safern removed their assignment May 11, 2020
@safern
Copy link
Member Author

safern commented May 13, 2020

Test failing again with the same pattern... always fails on the same machine and only in that machine:

Console Log Summary

Builds

Build Pull Request Test Failure Count
#639597 #35683 1
#640835 #35573 1
#640869 #35734 1
#640943 #36268 2
#641098 #36277 1
#641423 #36116 1
#641432 #35961 1
#641450 #36141 1
#641673 #36002 1
#641676 #36257 2
#642180 #36315 1

Configurations

  • net5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open
  • net5.0-Windows_NT-Debug-x64-Mono_release-Windows.10.Amd64.Server19H1.ES.Open
  • net5.0-Windows_NT-Release-x86-CoreCLR_release-Windows.10.Amd64.Server19H1.ES.Open

Helix Logs

Build Pull Request Console Core Test Results Run Client
#639597 #35683 console.log testResults.xml run_client.py
#640835 #35573 console.log testResults.xml run_client.py
#640869 #35734 console.log testResults.xml run_client.py
#640943 #36268 console.log testResults.xml run_client.py
#640943 #36268 console.log testResults.xml run_client.py
#641098 #36277 console.log testResults.xml run_client.py
#641423 #36116 console.log testResults.xml run_client.py
#641432 #35961 console.log testResults.xml run_client.py
#641450 #36141 console.log testResults.xml run_client.py
#641673 #36002 console.log testResults.xml run_client.py
#641676 #36257 console.log testResults.xml run_client.py
#641676 #36257 console.log testResults.xml run_client.py
#642180 #36315 console.log testResults.xml run_client.py

@safern safern reopened this May 13, 2020
@safern
Copy link
Member Author

safern commented May 13, 2020

I re-opened: #36138

@safern safern changed the title System.Diagnostics.Tests.EventLogSourceCreationTests.CheckSourceExistenceAndDeletion test failing on PRs System.Diagnostics.Tests.EventLogSourceCreationTests failing on PRs May 13, 2020
@Anipik Anipik self-assigned this May 14, 2020
@safern
Copy link
Member Author

safern commented Jun 1, 2020

So @Anipik it seems like your fix didn’t mitigate the issue and the machine is somehow still getting busted and subsequent runs fail in that machine?

@Anipik
Copy link
Contributor

Anipik commented Jun 2, 2020

@MattGal would it be possible to get hold of this machine ?

@MattGal
Copy link
Member

MattGal commented Jun 2, 2020

@MattGal would it be possible to get hold of this machine ?

It's possible but something we generally avoid. I'd prefer we have someone from DDFUN or the First Responders team check it out to ensure the Helix client isn't affected as part of investigation. You'll also need access to corpnet as the only way onto these machines is over a local KVM device.

If you think you know what's wrong with the machine, we should probably just have DDFUN fix that. Otherwise ping me on Teams and we can coordinate something.

@safern
Copy link
Member Author

safern commented Jun 2, 2020

@Anipik happened again:

https://helix.dot.net/api/2019-06-17/jobs/5c44ceea-3ad0-4bfc-b0ff-d9a95d342440/workitems/System.Diagnostics.EventLog.Tests/console

Will go ahead and revert the change that re-enabled the tests.

@safern safern added the disabled-test The test is disabled in source code against the issue label Jun 2, 2020
@tommcdon tommcdon added this to the 5.0 milestone Jun 8, 2020
@Anipik Anipik modified the milestones: 5.0.0, Future Jul 17, 2020
@ericstj
Copy link
Member

ericstj commented Dec 11, 2020

I don't suppose anyone happened to capture the logged failure? All the links in this issue are now dead.

@MattGal
Copy link
Member

MattGal commented Dec 11, 2020

I don't suppose anyone happened to capture the logged failure? All the links in this issue are now dead.

Can confirm all the Helix-side stuff is long deleted.

@jaredpar
Copy link
Member

No failures in last 21 days. Should probably close. Can re-open if it happens again.

@ericstj
Copy link
Member

ericstj commented Dec 11, 2020

No failures because the test is active-issued. :-/

@ericstj
Copy link
Member

ericstj commented Dec 11, 2020

https://github.com/dotnet/runtime/blob/6072e4d3a7a2a1493f514cdf4be75a3d56580e84/src/libraries/System.Diagnostics.EventLog/tests/EventLogTests/EventLogSourceCreationTests.cs

I suspect this is file-in use on the log file, failure to delete, then future tests fail to recreate. These tests are writing to a custom log name which will result in a new file. I might see if I can refactor the tests a bit to avoid this (like using a unique log name).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Diagnostics.EventLog bug disabled-test The test is disabled in source code against the issue
Development

No branches or pull requests