-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtract deleted file size from the cache size of NRTCachingDirectory. #13206
Subtract deleted file size from the cache size of NRTCachingDirectory. #13206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I'm not sure what's the best way of dealing with deletion of files that are open for writing, should we try to make it work properly like you did, or rather fail? @uschindler do you have an opinion?
Hi,
I don't think that adding synchronization for that is really necessary. Actually NRTCachingDir would use a larger cache for a short amount of time. I prefer to fix NRTCachingDirectory to have an overall better tracking of files. Of course in real filesystems, the size of deleted, but still open files is still counted, but that's hard to implement here without refactoring the whole class to use some "inode" like structure, decoupled from filename. I wonder why this bug was not seen before? This looks like a serious issue. Maybe it was introduced when we switched to ByteBufferDirectory. |
Actually the delete while it gets written to should never appear in Lucene. The bigger problem is when the file is still open in an NRTReader and gets deleted. I am not sure how we should handle that case? It may be serious depending on size of cache, because servers like Elasticserach or Solr often have files open longer times for reading and IndexWriter deletes them. Maybe let's think about this a bit more and maybe there's an easy way how to deal with it. Maybe all IndexInputs created by the directory should decrement when they close if the file is "scheduled for delete"? |
Hmm does it actually happen? I thought index files were ref-counted so that files would only get deleted when the NRT reader gets closed? This is what |
Ah you're right. This is to workaround windows issues. |
So I think we are fine then. +1 to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this looks like a serious bug indeed. Wouldn't this mean that NRTCachingDirectory
is only doing something on startup, and then gradually reduces the cached files eventually down to no more caching?
lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java
Outdated
Show resolved
Hide resolved
NRT readers pulled from But in the NRT segment replication case, I think this may still happen? I.e. segments are replicated out to a node that has an open |
lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jfboeuf!
Could you add a changes.txt entry in the 9.11 bugfix section? Will merge this PR tomorrow. |
I did not notice that changes entry was in wrong section, fixed in main branch (376ec27). Now backporting. |
All backported to 9.x branch. |
The size of deleted files is not subtracted from the cache size of NRTCachingDirectory. As a consequence, the cache eventually appears to be full preventing new files from being cached despite there being available memory since files no longer consume memory once they are deleted.
There is a weakness in the fix if the file is concurrently deleted in another thread while being closed but I don't think this pathologic use case deserves the additional synchronization it would require.