New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NUTCH-2978 -- upgrade to log4j2 throughout #772
Conversation
If folks could test this out on their workloads, that'd be fantastic! It works on mine, but I'm really hesitant to merge until someone else runs it. Thank you! |
# Conflicts: # src/plugin/any23/ivy.xml # src/plugin/any23/plugin.xml
I'll merge this in a day or so unless anyone has objections. |
Give me a few more days, over the weekend. I'd like to test it at least on a pseudo-distributed Hadoop setup. If this is successful, then a failure on a fully distributed Hadoop cluster is rather unlikely. Hadoop uses reload4j and puts the jars likely in front of the classpath. There might be some side effects. Also Nutch task logs are necessarily created via reload4j. |
Y, of course. That'd be fantastic. Thank you! |
+1 A test with the pseudo-distributed Hadoop setup was successful:
|
Fantastic! Thank you so much Sebastian!
…On Sun, Sep 17, 2023 at 9:02 AM Sebastian Nagel ***@***.***> wrote:
+1
A test with the pseudo-distributed Hadoop setup
<https://github.com/sebastian-nagel/nutch-test-single-node-cluster/> was
successful:
- Nutch tools work properly, no issues
- as expected, Hadoop puts slf4j-api-1.7.36.jar and
slf4j-reload4j-1.7.36.jar in the classpath in front of the Nutch job jars
- consequently, task logs are formatted using the format defined in
$HADOOP_HOMe/etc/hadoop/log4j.properties
- (the good thing) log messages from Nutch classes appear in the task
logs, e.g.
2023-09-17 07:29:21,726 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: FetcherThread 33 fetching https://nutch.apache.org/ (queue crawl delay=5000ms)
- the log format defined in $NUTCH_HOME/conf/log4j2.xml is only
applied to the logs of the Yarn job client, e.g.
2023-09-17 07:29:32,432 INFO fetcher.Fetcher: Fetcher: finished at 2023-09-17 07:29:32, elapsed: 00:00:25
- in addition, I've included two PDFs, a XLSX and a ePub document, to
test the Tika parser: the docs were successfully parsed using Tika 2.3.0 -
if necessary I can repeat the test for NUTCH-2959
<https://issues.apache.org/jira/browse/NUTCH-2959>
—
Reply to this email directly, view it on GitHub
<#772 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTNNPTYVXO7AZOVVC4NNYTX23YGLANCNFSM6AAAAAA4GB45VU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks for your contribution to Apache Nutch! Your help is appreciated!
Before opening the pull request, please verify that
NUTCH-2978
)[NUTCH-XXXX] Issue or pull request title
)ant clean runtime test
LICENSE-binary
andNOTICE-binary
updated accordingly?We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Nutch in general, please sign up for the Nutch mailing list. Thanks!