-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NUTCH-2885 Upgrade to Log4j2 #692
Conversation
I've verified that this patch, with some minor additional configuration, enables me to write logs locally, into logz.io and also into enterprise instance of Splunk. The latter two really help with alerts and notifications if something goes wrong i.e. ParseException. |
I put some documentation together for this as well |
Anyone able to give this a look? Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @lewismc.
enables me to write logs locally, into logz.io and also into enterprise instance of Splunk
Sounds great!
I've found that in (pseudo-)distributed mode using Hadoop 3.3.1 there is no effect at all - don't know why, eventually the solution of HADOOP-12956 is required.
One point: the log4j.properties
made only the Nutch tools to write to stdout and most of the Hadoop classes were even set to log on level WARN. Now the console output includes a many messages hardly useful to locate the usual Nutch issues, e.g.
2021-08-02 15:27:14,318 INFO o.a.h.m.t.r.MergeManagerImpl [pool-5-thread-1] Merged 2 segments, 162 bytes to disk to satisfy reduce memory limit
2021-08-02 15:27:14,318 INFO o.a.h.m.t.r.MergeManagerImpl [pool-5-thread-1] Merging 1 files, 164 bytes from disk
2021-08-02 15:27:14,319 INFO o.a.h.m.t.r.MergeManagerImpl [pool-5-thread-1] Merging 0 segments, 0 bytes from memory into reduce
2021-08-02 15:27:14,319 INFO o.a.h.m.Merger [pool-5-thread-1] Merging 1 sorted segments
2021-08-02 15:27:14,319 INFO o.a.h.m.Merger [pool-5-thread-1] Down to the last merge-pass, with 1 segments left of total size: 110 bytes
2021-08-02 15:27:14,319 INFO o.a.h.m.LocalJobRunner [pool-5-thread-1] 2 / 2 copied.
2021-08-02 15:27:14,341 INFO o.a.h.i.c.CodecPool [pool-5-thread-1] Got brand-new compressor [.deflate]
Maybe we should (optionally) provide a configuration with a less verbose logging?
Thanks @sebastian-nagel , I addressed the following two isses
Regarding your comments on HADOOP-12956, I also note following entries in STDOUT
Does HADOOP-12956 stop us from implementing this and then revisiting it once HADOOP-12956 has been included in a forthcoming Hadoop release? |
+1 Regarding the warning about missing appenders: I've also seen it but only in the logs of the ApplicationMaster and nevertheless the
I do not think so. But looks like the upgrade has no effect in (pseudo)distributed mode and the log4j.properties in |
Thanks for your thoughts @sebastian-nagel |
- allow to set log file and directory via system properties `hadoop.log.file` and `hadoop.log.dir`
Hi @lewismc, yes: the branch is ready to be merged. I've added the change to allow to override log file and folder via Java properties. |
* NUTCH-2885 Upgrade to Log4j2
PR for https://issues.apache.org/jira/browse/NUTCH-2885 ready for review.
I feel that this simplifies the logging configuration with precise XML syntax... ultimately less code which I feel is readable.
The configuration uses a RollingFileAppender with the cron triggering policy configured to trigger every day at midnight. Archives are stored in a directory based on the current year and month. All files under the base directory that match the
*/nutch-*.log.gz
glob and are 60 days old or older are deleted at rollover time. Additionally, we retain the ConsoleAppender configuration so everything is also written to STDOUT.I was motivated to work on this issue because I am performing a trade study which is evaluating logz.io. Specifically, this will allow configuring nutch to use the logzio-log4j2-appender.
Comments welcome.