New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-23393] Fix logging setup #21
Conversation
zentol
commented
Jul 15, 2021
- remove log4j1 dependencies
- rename log4j.properties files
- rename the production log4j2.properties file to log4j2-test.properties file, because log4j first searches for test configuration files, and the test-jar dependencies put theirs on the classpath (and would thus receive priority)
- remove log4j1 dependencies - rename log4j.properties files - rename the production log4j2.properties file to log4j2-test.properties file, because log4j first searches for test configuration files, and the test-jar dependencies put theirs on the classpath
@@ -16,8 +16,13 @@ | |||
# limitations under the License. | |||
################################################################################ | |||
|
|||
log4j.rootLogger=WARN, console | |||
rootLogger.level = ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rootLogger.level = ERROR | |
rootLogger.level = WARN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also set it to OFF if we want to exclude logging overhead from the benchmark (which we supposedly did so far). But this seems reasonably to have because in practice users at least use ERROR/WARN I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked how many error or warn messages are being currently produced? Having default warn/error would be great, as long as it doesn't skew the benchmark results. If it does, we would either need to set the level to OFF
or invest a time into getting rid of those errors/warnings.
The test run from the CI suggests that there were no messages printed at all? Could that be true? I've also tried throwing some exception in the submitted job code, but it looks like it's still not logging anything?
@Benchmark
public void asyncWait(FlinkEnvironmentContext context) throws Exception {
StreamExecutionEnvironment env = context.env;
env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
env.setParallelism(1);
DataStreamSource<Long> source = env.addSource(new LongSource(RECORDS_PER_INVOCATION));
DataStream<Long> result = createAsyncOperator(source);
result.addSink(
new SinkFunction<Long>() {
@Override
public void invoke(Long value) throws Exception {
if (value == RECORDS_PER_INVOCATION / 2) {
throw new UnsupportedOperationException();
}
}
});
env.execute();
}
I mean it's not logging anything more compared to what it's logging if I set level to OFF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check how it works during tests; let me do that now.
Can you set the log level to info and try again with AWOB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked how many error or warn messages are being currently produced?
From what I can tell on warn/error nothing is shown; so far I only ever saw anything on info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test log4j2-test.properties wasn't actually used because the production one has the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried logging an error message in the UDF and it works as expected. But it's also true that if an exception is thrown in the UDF then nothings gets logged, which is a bit surprising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, with INFO
level I can see somethings that are being logged. But indeed exception is not. I'm only getting this:
12:29:30,861 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager a01911f835cbdd195148423ad40f4070@akka://flink/user/rpc/jobmanager_3 for job 0d151cff076a2fc1cb273c9d9d61236d from the resource manager.
<failure>
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
(...)
Caused by: java.lang.UnsupportedOperationException
at org.apache.flink.benchmark.AsyncWaitOperatorBenchmark$1.invoke(AsyncWaitOperatorBenchmark.java:89)
at org.apache.flink.benchmark.AsyncWaitOperatorBenchmark$1.invoke(AsyncWaitOperatorBenchmark.java:85)
at org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:50)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54)
at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:99)
(...)
And if that happens benchmark doesn't shut down:
12:29:30,952 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Stopped Akka RPC service.
<JMH had finished, but forked VM did not exit, are there stray running threads? Waiting 24 seconds more...>
Non-finished threads:
Thread[pool-2-thread-6,5,main]
at sun.misc.Unsafe.park(Native Method)
(...)
But that looks like a pre-existing problem unrelated to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for trying to fix that!
@@ -16,8 +16,13 @@ | |||
# limitations under the License. | |||
################################################################################ | |||
|
|||
log4j.rootLogger=WARN, console | |||
rootLogger.level = ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked how many error or warn messages are being currently produced? Having default warn/error would be great, as long as it doesn't skew the benchmark results. If it does, we would either need to set the level to OFF
or invest a time into getting rid of those errors/warnings.
The test run from the CI suggests that there were no messages printed at all? Could that be true? I've also tried throwing some exception in the submitted job code, but it looks like it's still not logging anything?
@Benchmark
public void asyncWait(FlinkEnvironmentContext context) throws Exception {
StreamExecutionEnvironment env = context.env;
env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
env.setParallelism(1);
DataStreamSource<Long> source = env.addSource(new LongSource(RECORDS_PER_INVOCATION));
DataStream<Long> result = createAsyncOperator(source);
result.addSink(
new SinkFunction<Long>() {
@Override
public void invoke(Long value) throws Exception {
if (value == RECORDS_PER_INVOCATION / 2) {
throw new UnsupportedOperationException();
}
}
});
env.execute();
}
I mean it's not logging anything more compared to what it's logging if I set level to OFF?
since the log4j2 properties files have the same name in production/tests only the production one was actually used in all cases.
Logging wasn't working during tests as expected because it was using the properties file from production, not the test one, because they share the same name. I have removed the test version of the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run a benchmark-request and everything seems fine here