Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix too many open files Exception for some tests #13035

Merged
merged 4 commits into from Feb 19, 2024

Conversation

easyice
Copy link
Contributor

@easyice easyice commented Jan 26, 2024

./gradlew :lucene:core:test --tests "org.apache.lucene.index.TestConcurrentMergeScheduler.testNoStallMergeThreads" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=13FCF0E4FD5ABF60 -Ptests.nightly=true -Ptests.gui=false -Ptests.file.encoding=US-ASCII -Ptests.vectorsize=128
  2> org.apache.lucene.index.MergePolicy$MergeException: java.nio.file.FileSystemException: /Users/zhangchao-so/Develop/git/fork/lucene/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-008/index-NIOFSDirectory-001/_cr_Lucene99_0.tip: Too many open files
  2>    at __randomizedtesting.SeedInfo.seed([13FCF0E4FD5ABF60]:0)
  2>    at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
  2>    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
  2> Caused by: java.nio.file.FileSystemException: /Users/zhangchao-so/Develop/git/fork/lucene/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-008/index-NIOFSDirectory-001/_cr_Lucene99_0.tip: Too many open files
  2>    at org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
  2>    at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
  2>    at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202)
  2>    at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:206)
  2>    at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)

git bisect shows this commit as the perpetrator: d6836d3

@rmuir
Copy link
Member

rmuir commented Jan 26, 2024

This is not the correct fix: instead the test must be fixed to not use so many files at once.

@easyice
Copy link
Contributor Author

easyice commented Jan 26, 2024

Thank you! i replaced the suppress with setMaxBufferedDocs, the open files in nightly tests reduced from 4000+ to 400+, does that looks okay?

The testDeleteMerging might also create many files, i added a similar change to limit.
Can not be reproduced:

   >             Caused by:
   >             java.nio.file.FileSystemException: /dev/shm/lucene_candidate/lucene/core/build/tmp/tests-tmp/lucene.index.TestConcurrentMergeScheduler_13FCF0E4FD5ABF60-001/index-NIOFSDirectory-001/_cr_Lucene99_0.tip: Too
many open files
   >                 at org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
   >                 at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
   >                 at org.apache.lucene.tests.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:202)
   >                 at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:206)
   >                 at java.base/java.nio.channels.FileChannel.open(FileChannel.java:309)
   >                 at java.base/java.nio.channels.FileChannel.open(FileChannel.java:369)
   >                 at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
   >                 at org.apache.lucene.tests.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:3014)
   >                 at org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:800)
   >                 at org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader.<init>(Lucene90BlockTreeTermsReader.java:146)
   >                 at org.apache.lucene.codecs.lucene99.Lucene99PostingsFormat.fieldsProducer(Lucene99PostingsFormat.java:428)
   >                 at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:330)
   >                 at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:392)
   >                 at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:98)
   >                 at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
   >                 at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:178)
   >                 at org.apache.lucene.index.ReadersAndUpdates.getReaderForMerge(ReadersAndUpdates.java:784)
   >                 at org.apache.lucene.index.IndexWriter.lambda$mergeMiddle$21(IndexWriter.java:5144)
   >                 at org.apache.lucene.index.MergePolicy$OneMerge.initMergeReaders(MergePolicy.java:469)
   >                 at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5140)
   >                 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4738)
   >                 at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6539)
   >                 at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
   >                 at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
  2> NOTE: reproduce with: gradlew test --tests TestConcurrentMergeScheduler.testDeleteMerging -Dtests.seed=13FCF0E4FD5ABF60 -Dtests.nightly=true -Dtests.locale=mgh-Latn-MZ -Dtests.timezone=Asia/Dili -Dtests.asserts=true -Dtests.file.encoding=UTF-8

@easyice easyice changed the title Do not use HandleLimitFS for TestConcurrentMergeScheduler Fix too many open files for TestConcurrentMergeScheduler Jan 26, 2024
@easyice easyice changed the title Fix too many open files for TestConcurrentMergeScheduler Fix too many open files Exception for TestConcurrentMergeScheduler Jan 26, 2024
@rmuir
Copy link
Member

rmuir commented Jan 26, 2024

Thank you @easyice ! There is also a TestUtil.reduceOpenFiles, I'm not sure if it is appropriate here, but something to look into. It is also nice since it makes it obvious from the code why the settings are being changed.

@easyice
Copy link
Contributor Author

easyice commented Jan 27, 2024

It looks better to use TestUtil.reduceOpenFiles for testDeleteMerging.
In testNoStallMergeThreads, it need many segment to cover LUCENE-6197(i tried to checkout the old version to reproduce, it's difficult in current version), so i change to use setUseCompoundFile(true) to reduce open files.

@easyice
Copy link
Contributor Author

easyice commented Jan 30, 2024

Pushed a new fix for reproducible test failure TestIndexWriterThreadsToSegments.testManyThreadsClose:

./gradlew test --tests TestIndexWriterThreadsToSegments.testManyThreadsClose -Dtests.seed=DFAE4EC5F20E1CD4 -Dtests.nightly=true -Dtests.locale=zgh-MA -Dtests.timezone=SystemV/EST5EDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
java.nio.file.FileSystemException: /Users/xxxx/tests-tmp/lucene.index.TestIndexWriterThreadsToSegments_DFAE4EC5F20E1CD4-001/index-NIOFSDirectory-001/_hk_Lucene90FieldsIndexfile_pointers_ua.tmp: Too many open files
   >                 at org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
   >                 at org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
   >                 at org.apache.lucene.tests.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:163)
   >                 at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:200)
   >                 at java.base/java.nio.file.Files.newOutputStream(Files.java:228)

@easyice easyice changed the title Fix too many open files Exception for TestConcurrentMergeScheduler Fix too many open files Exception for some tests Jan 30, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Feb 14, 2024
@mikemccand
Copy link
Member

This change looks great, and it looks like @rmuir's concern was addressed. Thank you for focusing on test reliability @easyice! I'll merge this soon if there are no objections.

@easyice
Copy link
Contributor Author

easyice commented Feb 14, 2024

@rmuir @mikemccand Thank you for reviewing!

@github-actions github-actions bot removed the Stale label Feb 15, 2024
@mikemccand mikemccand merged commit 55df3e0 into apache:main Feb 19, 2024
4 checks passed
mikemccand pushed a commit that referenced this pull request Feb 19, 2024
* init

* fix review

* fix review

* iter
@mikemccand mikemccand modified the milestones: 9.10.0, 9.11.0 Feb 19, 2024
@mikemccand
Copy link
Member

Thank you @easyice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants