New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable turning on IndexWriter's infoStream #5891
Comments
++, I suggest this setting will be real time setting (changed using index update settings API). I think the interesting question would be into which file will it be logged? I think it should be a separate file, that is placed in the same logging dir location. |
I like the idea of having this be a real-time setting, so you can go into an existing index that's doing something strange and turn on IW.infoStream. The challenge is, this is not a real-time setting in IndexWriter today (you must close/reopen the writer), which is annoying. I think we should just make our own delegating InfoStream impl that we pass to IW on init, so IW always logs messages to it, but then we tell it to turn on/off whether it sends those messages to a log file (I agree, it should be a separate file in the normal logging dir). I'll give this a shot. |
Wondering if a delegate InfoStream will not hurt perf, since the string construction to log it will happen now all the time within Lucene. Potentially, this setting can be realtime, but we will close and open the IndexWriter with it within our Engine (we can do that with some settings that require a fresh IW). |
This can safely be a realtime setting. There is actually always a infoStream set, it is just a NullInfoStream that just always returns false for isEnabled? All checks in IndexWriter look like this:
So its sorta like logger apis in that sense. You can safely have one that flips on and off at will: |
About linking InfoStream to a logger, it should be pretty simple. We added this to solr with this class: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/update/LoggingInfoStream.java However, there was more going on there (e.g. legacy settings for infostream and so on). Maybe for ES, there should be no option for infostream at all? it should just be set, and it should check if trace or debug level logging is enabled? Basically: the user would be unaware of infostream. They just crank up logging and see more stuff... |
I like that solution; this way the user doesn't need to learn another setting. This is just increased verbosity. |
we could have it under a specific logging name, yea, and then its just a matter of enabling trace logging on it specifically. I missed the enhancement to InfoStream, nice! |
OK, I pushed an initial commit .... I found a LoggerInfoStream which I'm not sure about passing the indexSettings/shardId down, but I I also hit & "fixed" an unrelated NPE when I set logging to trace but I turned on trace logging in the tests (-Des.logger.level=TRACE) and |
…vel=TRACE Lucene's IndexWriter logs many low-level details to its infoStream which can be helpful for diagnosing; with this change, if you enable TRACE logging for the "lucene.iw" logger name then IndexWriter's infoStream output will be captured and can later be scrutinized e.g. using https://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/src/python/iwLogToGraphs.py to generate graphs like http://people.apache.org/~mikemccand/lucenebench/iw.html Closes #5891
For diagnosing low level IndexWriter issues, it's sometimes useful to enable IndexWriter's infoStream output, but one cannot do this from Elasticsearch today.
The text was updated successfully, but these errors were encountered: