Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-23393] Fix logging setup #21

Merged
merged 2 commits into from Jul 26, 2021
Merged

[FLINK-23393] Fix logging setup #21

merged 2 commits into from Jul 26, 2021

Conversation

zentol
Copy link
Contributor

@zentol zentol commented Jul 15, 2021

  • remove log4j1 dependencies
  • rename log4j.properties files
  • rename the production log4j2.properties file to log4j2-test.properties file, because log4j first searches for test configuration files, and the test-jar dependencies put theirs on the classpath (and would thus receive priority)

- remove log4j1 dependencies
- rename log4j.properties files
- rename the production log4j2.properties file to log4j2-test.properties file, because log4j first searches for test configuration files, and the test-jar dependencies put theirs on the classpath
@@ -16,8 +16,13 @@
# limitations under the License.
################################################################################

log4j.rootLogger=WARN, console
rootLogger.level = ERROR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rootLogger.level = ERROR
rootLogger.level = WARN

Copy link
Contributor Author

@zentol zentol Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also set it to OFF if we want to exclude logging overhead from the benchmark (which we supposedly did so far). But this seems reasonably to have because in practice users at least use ERROR/WARN I suppose.

Copy link
Contributor

@pnowojski pnowojski Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked how many error or warn messages are being currently produced? Having default warn/error would be great, as long as it doesn't skew the benchmark results. If it does, we would either need to set the level to OFF or invest a time into getting rid of those errors/warnings.

The test run from the CI suggests that there were no messages printed at all? Could that be true? I've also tried throwing some exception in the submitted job code, but it looks like it's still not logging anything?

    @Benchmark
    public void asyncWait(FlinkEnvironmentContext context) throws Exception {

        StreamExecutionEnvironment env = context.env;
        env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
        env.setParallelism(1);

        DataStreamSource<Long> source = env.addSource(new LongSource(RECORDS_PER_INVOCATION));
        DataStream<Long> result = createAsyncOperator(source);
        result.addSink(
                new SinkFunction<Long>() {
                    @Override
                    public void invoke(Long value) throws Exception {
                        if (value == RECORDS_PER_INVOCATION / 2) {
                            throw new UnsupportedOperationException();
                        }
                    }
                });

        env.execute();
    }

I mean it's not logging anything more compared to what it's logging if I set level to OFF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check how it works during tests; let me do that now.

Can you set the log level to info and try again with AWOB?

Copy link
Contributor Author

@zentol zentol Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked how many error or warn messages are being currently produced?

From what I can tell on warn/error nothing is shown; so far I only ever saw anything on info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test log4j2-test.properties wasn't actually used because the production one has the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried logging an error message in the UDF and it works as expected. But it's also true that if an exception is thrown in the UDF then nothings gets logged, which is a bit surprising?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, with INFO level I can see somethings that are being logged. But indeed exception is not. I'm only getting this:

12:29:30,861 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager a01911f835cbdd195148423ad40f4070@akka://flink/user/rpc/jobmanager_3 for job 0d151cff076a2fc1cb273c9d9d61236d from the resource manager.
<failure>

org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
	at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
	at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
	(...)
Caused by: java.lang.UnsupportedOperationException
	at org.apache.flink.benchmark.AsyncWaitOperatorBenchmark$1.invoke(AsyncWaitOperatorBenchmark.java:89)
	at org.apache.flink.benchmark.AsyncWaitOperatorBenchmark$1.invoke(AsyncWaitOperatorBenchmark.java:85)
	at org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:50)
	at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:54)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:99)
	(...)

And if that happens benchmark doesn't shut down:

12:29:30,952 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Stopped Akka RPC service.
<JMH had finished, but forked VM did not exit, are there stray running threads? Waiting 24 seconds more...>

Non-finished threads:

Thread[pool-2-thread-6,5,main]
  at sun.misc.Unsafe.park(Native Method)
  (...)

But that looks like a pre-existing problem unrelated to this PR.

@zentol zentol requested a review from pnowojski July 15, 2021 08:08
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for trying to fix that!

@@ -16,8 +16,13 @@
# limitations under the License.
################################################################################

log4j.rootLogger=WARN, console
rootLogger.level = ERROR
Copy link
Contributor

@pnowojski pnowojski Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked how many error or warn messages are being currently produced? Having default warn/error would be great, as long as it doesn't skew the benchmark results. If it does, we would either need to set the level to OFF or invest a time into getting rid of those errors/warnings.

The test run from the CI suggests that there were no messages printed at all? Could that be true? I've also tried throwing some exception in the submitted job code, but it looks like it's still not logging anything?

    @Benchmark
    public void asyncWait(FlinkEnvironmentContext context) throws Exception {

        StreamExecutionEnvironment env = context.env;
        env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
        env.setParallelism(1);

        DataStreamSource<Long> source = env.addSource(new LongSource(RECORDS_PER_INVOCATION));
        DataStream<Long> result = createAsyncOperator(source);
        result.addSink(
                new SinkFunction<Long>() {
                    @Override
                    public void invoke(Long value) throws Exception {
                        if (value == RECORDS_PER_INVOCATION / 2) {
                            throw new UnsupportedOperationException();
                        }
                    }
                });

        env.execute();
    }

I mean it's not logging anything more compared to what it's logging if I set level to OFF?

since the log4j2 properties files have the same name in production/tests only the production one was actually used in all cases.
@zentol
Copy link
Contributor Author

zentol commented Jul 15, 2021

Logging wasn't working during tests as expected because it was using the properties file from production, not the test one, because they share the same name. I have removed the test version of the file.

Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run a benchmark-request and everything seems fine here

@zentol zentol merged commit 8e9bec8 into master Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants