New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use non analyzed token stream optimization everywhere #6001
Conversation
In the string type, we have an optimization to reuse the StringTokenStream on a thread local when a non analyzed field is used (instead of creating it each time). We should use this across the board on all places where we create a field with a String. Also, move to a specific XStringField, that we can reuse StringTokenStream instead of copying it.
I don't know much about Elasticsearch's thread churn, but maybe CloseableThreadLocal is interesting here? It is annoying as it seems to produce ghosts in profilers (https://issues.apache.org/jira/browse/LUCENE-4474) as it calls a native method when it purges, but i played some more and this does seem to be a ghost. |
Patch looks good! Maybe add a comment that StringTokenStream comes from Lucene? We are not nervous about StringTokenStream always hanging onto the last value it indexed? I imagine these values are typically small ... |
@rmuir the problem with @mikemccand I think its evident that |
@kimchy OK it's fine with me to skip the comment ... |
@kimchy I think CloseableThreadLocal has a confusing name actually (and documentation). I am not speaking of its close() properties here, instead the fact that it has a built-in garbage collection mechanism: it periodically purges (this is amortized over the get()s, as a multiplier * number of threads enrolled) |
@rmuir ahh, yea, agreed, it is still not requires in ES because of the bounded threads that will be allowed to use it? I can easily change it, but the main merit of using it in the context of ES is actually using close where we can (because threads don't come and go) |
Yeah i am unsure if its appropriate here or not. The only benefit really would be that the "GC" of threadlocal here would be consistent with what is happening with Analyzed fields too (since it uses CTL) |
@rmuir kk, will add it as an additional safety measure |
not for the close method (since its static), but more for its amortized GC of thread locals
added, if all is good, will push it soonish |
+1 I will work with mike to fix this guy in lucene too. |
In the string type, we have an optimization to reuse the StringTokenStream on a thread local when a non analyzed field is used (instead of creating it each time). We should use this across the board on all places where we create a field with a String. Also, move to a specific XStringField, that we can reuse StringTokenStream instead of copying it. closes #6001
@kimchy Turns out StringTokenStream already sets value=null in its close method ... |
ahh, cool! |
I opened https://issues.apache.org/jira/browse/LUCENE-5634 to get this into Lucene. |
In the string type, we have an optimization to reuse the StringTokenStream on a thread local when a non analyzed field is used (instead of creating it each time). We should use this across the board on all places where we create a field with a String.
Also, move to a specific XStringField, that we can reuse StringTokenStream instead of copying it.