Skip to content

Use more efficient ArrayDeque instead of LinkedList#15682

Merged
uschindler merged 3 commits intoapache:mainfrom
renatoh:performacne_improvement_for_CompoundWordTokenFilterBase
Feb 9, 2026
Merged

Use more efficient ArrayDeque instead of LinkedList#15682
uschindler merged 3 commits intoapache:mainfrom
renatoh:performacne_improvement_for_CompoundWordTokenFilterBase

Conversation

@renatoh
Copy link
Contributor

@renatoh renatoh commented Feb 9, 2026

No description provided.

@github-actions github-actions bot added this to the 11.0.0 milestone Feb 9, 2026
Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very old code. Thanks for improving it!

@uschindler
Copy link
Contributor

I think targeting 11.0 is fine, if you want it in next minor relaese, please move the changes entry. But Lucene 10.4 is already time over, we're in release process, so next could be 10.5.

@uschindler uschindler self-assigned this Feb 9, 2026
@renatoh
Copy link
Contributor Author

renatoh commented Feb 9, 2026

I think targeting 11.0 is fine, if you want it in next minor relaese, please move the changes entry. But Lucene 10.4 is already time over, we're in release process, so next could be 10.5.

No need to move it to 10.5, I came across this LinkedList coincidentally debugging HyphenationCompoundWordTokenFilter

Looking where else LinkedList is used, and potentially miss-used as a Queue, I came across this case:
org.apache.lucene.index.DocumentsWriterFlushControl#flushQueue
it is literally called Queue and used on the Queue interface, but the instance is a LinkedList.
Do you think it would make sense to scan the code base for that pattern and change these cases to ArrayDeque as well?

@uschindler
Copy link
Contributor

Ok, then I will merge this one as is.

About the other LinkedLists. Long ago I wanted to get rid of them already and add LinkedList to the forbiddenapis list. But as its not trivial to replace thos, especially if they use the non-Queue legacy APIs (some methods were duplicated in LinkedList to fit the Queue interface).

We already got rid of all java.util.Stack instances (which are synchronized and were worse as they were partners of Hashtable and Vector from Java 1.0).

If you like you can open a PR to replace all others and possibly add it to ForbiddenApis. :-)

@uschindler uschindler merged commit 44d6638 into apache:main Feb 9, 2026
14 checks passed
@uschindler
Copy link
Contributor

Looking where else LinkedList is used, and potentially miss-used as a Queue, I came across this case:
org.apache.lucene.index.DocumentsWriterFlushControl#flushQueue

I don't know that code specifically. LinkedList as a replacement for ArrayDeque is fine if the size of your queue is generally small and you add or remove items not too often. In that case it uses less memory and performance is not bad. But misusing LinkedList also for positional access (like some code does/did) should be fixed in all cases. Thanks for reminding about this old code.

I am still not sure why DocumentsWriterFlushControl uses LinkedList.... That code is not too old. Maybe it si really because of small sizes and memory pressure?

@renatoh
Copy link
Contributor Author

renatoh commented Feb 9, 2026

I don't know that code specifically. LinkedList as a replacement for ArrayDeque is fine if the size of your queue is generally small and you add or remove items not too often. In that case it uses less memory and performance is not bad. But misusing LinkedList also for positional access (like some code does/did) should be fixed in all cases. Thanks for reminding about this old code.

I am still not sure why DocumentsWriterFlushControl uses LinkedList.... That code is not too old. Maybe it si really because of small sizes and memory pressure?

It is pretty old
org.apache.lucene.index.DocumentsWriterFlushControl#blockedFlushes is from its initial implementation back in 2011.
If it used as a Queue like here, then I do not see any benefits of a LinkedList over an ArrayDeque, even with small Lists, or do I miss anything? LinkedList uses more memory due to its pointer and has, as far as I know, a very nich use case, the use case where most of the inserts/delets happens in the middle. If that is not the case and you do need access via index, ArrayList is better. If you use it as a Queue, ArrayDeque is the better option.

@uschindler
Copy link
Contributor

Yeah, maybe open a PR and fix all of them :-) I can do a review, no worries. As usual those changes should possibly be done on main only (Lucene 11).

@uschindler
Copy link
Contributor

I'd really like to get rid of LinkedList globally and make it forbiddenapi :-)

@renatoh
Copy link
Contributor Author

renatoh commented Feb 9, 2026

forbiddenapi, isn't that a bit too strict? what if the niche case occurs where it make sense?

A LinkedList does allow null-element, ArrayDeque does not, dq.add(null) throws and NPE. That could introduce some issues, how much do you trust the unit-test to catch these cases?

@apache apache deleted a comment from renatoh Feb 9, 2026
@uschindler
Copy link
Contributor

uschindler commented Feb 9, 2026

That of course also requires review, but we did similar things with moving from unmodifiableXxx to Map/Set/List.of(). We had no null issues so it is unlikely. But you have to check this. If a unit test does not catch this, it's a bug anyways, because at places where we allow null elements we normally have a test, too.

We have forbiddenapis for Stack/Vector already:

java.util.Stack @ Use more modern java.util.ArrayDeque as it is not synchronized
java.util.Vector @ Use more modern java.util.ArrayList as it is not synchronized
# TODO (needs some fix in forbiddenapis): this also hits java.util.Properties:
# java.util.Hashtable @ Use more modern java.util.HashMap as it is not synchronized

Hashtable is also there, but due to ju.Properties extends hashtable we can't forbid it :-(

@uschindler
Copy link
Contributor

To fix the TODO, I opened #15685.

@renatoh
Copy link
Contributor Author

renatoh commented Feb 9, 2026

@uschindler but there are niche cases, for which LL will be an appropriate choice. Checking all the usage of LL, I belief I came across two instance, for which could make sense to stick to the LL:
FieldTermStack-> poll and push of the LL is used, and the LL is sorted at one point. ArrayList will have worse performacne for poll and push, ArrayDeque cannot be sorted.

SynonymGraphFilter.outputBuffer -> switching this to an ArrayList is a trade off, outputBuffer.get(pathID).endNode will be faster outputBuffer.pollFirst() - removeFirst for an ArrayList - will be slower., but it feels like ArrayList is the way to go since access via index is done in a loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants