-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FLUME-2050 Upgrade to Log4j 2 #181
Conversation
@@ -0,0 +1,9 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rgoers , thanks for the pull request
I am wondering how this metastore_db folder is necessary to upgrade the log4j version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some leftover from a failed test run. Could you please remove it so we could review the real changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't modify those files so I'm not sure how they got in the commit. Out of curiosity, is there a reason they aren't copied to the target directory if the tests are going to modify them? Maven convention is that only items in the target directory should be modified by the build so I don't have my gitignore set up to skip these files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert those files as soon as I can.
The extra files have been removed. |
@rgoers after: I believe this upgrade should not introduce any change in the default log format. |
The format strings are exactly the same. The difference seems to be how %x is handled. In Log4j 1.x the NDC seems to be returned as null if there is nothing, so I am not sure why it is an empty string. In Log4j 2 the ThreadContext Stack returns an empty list which I believe is why you see "[]". I am curious - Does Flume even use the NDC? Very view applications use the NDC - most use the MDC/ThreadContext Map. A grep of Flume finds the only occurrence of NDC is in the Log4jAppender and that reference only retrieves the NDC from the logging event. Log4j 2 has a way to handle this so I will give it a try and update the PR when I confirm it is working. |
@szaboferee I ran the agent in flume-ng-dist and the results I get don't match yours. With log4j 1.x I got Also, based on what you posted above I modified the log4j 2 configuration so that it doesn't print anything if there is no NDC and there is only one space as that is what I see above. However, when I look at the output I get from Log4j 1 I am seeing 2 spaces before the '-'. I noticed that when I include that string here that the double space is being converted to a single space, so I am guessing I really need to modify the pattern to emit 2 spaces when the NDC isn't present instead of just 1. Are you using the standard conf/log4j.properties in the gzipped tar built by flume-ng-dist? If you are I don't know how you are getting "nov.". |
@@ -1,68 +0,0 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why were you changing log4j.properties to XML.
Are there benefits I am not aware of?
There might be systems where flume was integrated with log4j.properties format. What is the precedence if there are both .properties and .xml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log4j 2 doesn't support the log4j 1 properties format. Even if the log4j 2 properties format is used it is still different than the log4j 1 format. I prefer the XML format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a log4j.properties (log4j 2 format) and a log4j.xml file on the classpath at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log4j uses an @order annotation on the ConfigurationFactories to determine which factories get precedence. Properties have the highest, followed by yaml and then XML. So if a properties file is present it will be used instead of the xml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the information.
@@ -67,6 +67,7 @@ limitations under the License. | |||
<dependency> | |||
<groupId>log4j</groupId> | |||
<artifactId>log4j</artifactId> | |||
<version>1.2.17</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be better to have this in the parent dependencyManagement as there are transitive dependencies that are still using log4j 1.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. All references to log4j 1 have been removed except for this one instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All references in flume however, some dependencies of flume has it as a transitive dependency.
For example hive-shmis, hadoop-common, kafka_2.10.
We should harmonize the versions for these dependencies as well to avoid version mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through and excluded all the transitive dependencies to log4j in the Flume pom.xml files that I could find. They have been replaced with references to log4j-1.2-api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing the output of mvn dependency:tree
there are a few places where it looks like you might have missed it. Could you please double check it to see if I am right?
Wouldn't it be easier to use the dependencyManagement in parent pom to exclude the old version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might. I will take a look again.
@@ -27,7 +27,8 @@ | |||
import org.apache.flume.conf.Configurable; | |||
import org.apache.flume.instrumentation.SinkCounter; | |||
import org.apache.flume.sink.AbstractSink; | |||
import org.apache.log4j.Logger; | |||
import org.apache.logging.log4j.LogManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the slf4j logger here because that way the user could decide to use a different logger implementation.
(despite the fact that the original code already used log4j)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it was using log4j I converted it to use log4j...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flume components are using slf4j. I guess this log4j logger in the HttpSink was overlooked at the review and that is why it is not an slf4j logger. It would be nice if it could be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem if that change is made but it doesn't necessarily have to happen as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I will make a different PR about it. Thank you.
I have updated the PR again but it can't be merged. It requires Log4j 2.10, which hasn't been released yet, to resolve a ClassNotFound error in Hadoop. The release process for Log4j 2.10 will be starting in the next few days. I will update this PR when it has completed. |
The patch has been upgraded to use Log4j 2.10.0. That version includes a version of AppenderSkeleton from Log4j 1 which should eliminate problems with Hadoop. From what I can tell this PR should be ready to go now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the dependency tree, the only log4j 1.x is in the appender, tests ran successfully.
The change looks good to me.
This closes apache#181 Reviewers: Ferenc Szabo (Ralph Goers via Denes Arvay)
This closes apache#181 Reviewers: Ferenc Szabo (Ralph Goers via Denes Arvay) Change-Id: Ifd2175c384f1456d082a054af2b042c7147ed446
pili-stream_oqos(1.0.0):add spark streaming task for the ola of inter…
This PR removes the use of log4j 1.x from Flume.