Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Race between FileUserRolesStore and disk cleanup #44006

Closed
droberts195 opened this issue Jul 5, 2019 · 3 comments
Closed

[CI] Race between FileUserRolesStore and disk cleanup #44006

droberts195 opened this issue Jul 5, 2019 · 3 comments
Labels
:Security/Security Security issues without another label Team:Security Meta label for security team >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

An @After method doing post test cleanup failed in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=oraclelinux-7/103/console

The relevant part of the log file is this:

  1> [2019-06-30T20:56:25,503][INFO ][o.e.n.Node               ] [suite] closing ...
  1> [2019-06-30T20:56:25,507][INFO ][o.e.n.Node               ] [suite] closed
  1> [2019-06-30T13:56:25,516][INFO ][o.e.x.s.a.f.FileUserRolesStore] [external_8] users roles file [/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_A5BFFFA50FDD9F61-001/tempDir-003/config/users_roles] changed. updating users roles...
  2> java.io.IOException: Could not remove the following files (in the order of attempts):
       /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_A5BFFFA50FDD9F61-001/tempDir-003/config: java.io.IOException: access denied: /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_A5BFFFA50FDD9F61-001/tempDir-003/config
       /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_A5BFFFA50FDD9F61-001/tempDir-003: java.nio.file.DirectoryNotEmptyException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_A5BFFFA50FDD9F61-001/tempDir-003
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ml:qa:native-multi-node-tests:integTestRunner --tests "org.elasticsearch.xpack.ml.integration.ForecastIT" -Dtests.seed=A5BFFFA50FDD9F61 -Dtests.security.manager=true -Dtests.locale=en-US -Dtests.timezone=UTC -Dcompiler.java=12 -Druntime.java=11
  2> NOTE: test params are: codec=Asserting(Lucene80): {}, docValues:{}, maxPointsInLeafNode=799, maxMBSortInHeap=7.242438413406605, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@39e6b55c), locale=shi-Latn-MA, timezone=Asia/Vientiane
  2> NOTE: Linux 4.14.35-1902.2.0.el7uek.x86_64 amd64/Oracle Corporation 11.0.3 (64-bit)/cpus=16,threads=1,free=90554224,total=536870912
  2> NOTE: All tests run in this JVM: [AutodetectMemoryLimitIT, BasicRenormalizationIT, CategorizationIT, DatafeedJobsIT, DatafeedJobsRestIT, DatafeedWithAggsIT, DelayedDataDetectorIT, DeleteExpiredDataIT, DetectionRulesIT, ForecastIT]

It tried to delete the directory .../tempDir-003/config but could not because the directory wasn't empty.

Presumably it wasn't empty because FileUserRolesStore had just written .../tempDir-003/config/users_roles into it. This is done by writing a temporary file, then renaming it to the eventual path. So it could easily confuse a recursive directory deletion routine running at the same time. The recursive directory deletion probably did successfully delete the previous version of the users_roles file.

Maybe FileUserRolesStore needs to register itself with Node so that Node can wait for it to complete during its shutdown sequence?

This failure has only occurred once as far as I can see, so we don't necessarily need to rush to fix it. But I wanted to log an issue so that if it starts happening frequently there's something to reference with the investigation I've done so far.

@droberts195 droberts195 added >test-failure Triaged test failures from CI :Security/Security Security issues without another label labels Jul 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@tvernum
Copy link
Contributor

tvernum commented Sep 4, 2019

It looks like this failed again for the same reason:

org.elasticsearch.xpack.ml.integration.ForecastIT > classMethod FAILED
--
java.io.IOException: Could not remove the following files (in the order of attempts):
/var/lib/jenkins/workspace/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA/openjdk12/ES_RUNTIME_JAVA/zulu12/nodes/general-purpose/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_49CA3E0C9396C02E-001/tempDir-003/config: java.io.IOException: access denied: /var/lib/jenkins/workspace/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA/openjdk12/ES_RUNTIME_JAVA/zulu12/nodes/general-purpose/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_49CA3E0C9396C02E-001/tempDir-003/config
/var/lib/jenkins/workspace/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA/openjdk12/ES_RUNTIME_JAVA/zulu12/nodes/general-purpose/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_49CA3E0C9396C02E-001/tempDir-003: java.nio.file.DirectoryNotEmptyException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA/openjdk12/ES_RUNTIME_JAVA/zulu12/nodes/general-purpose/x-pack/plugin/ml/qa/native-multi-node-tests/build/testrun/integTestRunner/temp/org.elasticsearch.xpack.ml.integration.ForecastIT_49CA3E0C9396C02E-001/tempDir-003

https://gradle-enterprise.elastic.co/s/mqxsdvj2bcqb6/tests/zcquf3hc3eoda-d2oyncgpb7reu?openStackTraces=WzBd

@jakelandis
Copy link
Contributor

It's been 3 years without any additional comments and I can not find any evidence of this failing within the past 30 days (due to the reasons listed here, only a couple non-related general cluster health related failures).

Closing this issue, nothing to un-mute since (except for #44609) this test has remained un-muted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Security/Security Security issues without another label Team:Security Meta label for security team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

6 participants