Skip to content

Commit c6696cd

Browse files
gengliangwangdongjoon-hyun
authored andcommitted
[SPARK-47671][CORE] Enable structured logging in log4j2.properties.template and update docs
### What changes were proposed in this pull request? - Rename the current log4j2.properties.template as log4j2.properties.pattern-layout-template - Enable structured logging in log4j2.properties.template - Update `configuration.md` on how to configure logging ### Why are the changes needed? Providing a structured logging template and document how to configure loggings in Spark 4.0.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual test ### Was this patch authored or co-authored using generative AI tooling? No Closes #46349 from gengliangwang/logTemplate. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent d9d79a5 commit c6696cd

File tree

3 files changed

+80
-14
lines changed

3 files changed

+80
-14
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# Set everything to be logged to the console
19+
rootLogger.level = info
20+
rootLogger.appenderRef.stdout.ref = console
21+
22+
# In the pattern layout configuration below, we specify an explicit `%ex` conversion
23+
# pattern for logging Throwables. If this was omitted, then (by default) Log4J would
24+
# implicitly add an `%xEx` conversion pattern which logs stacktraces with additional
25+
# class packaging information. That extra information can sometimes add a substantial
26+
# performance overhead, so we disable it in our default logging config.
27+
# For more information, see SPARK-39361.
28+
appender.console.type = Console
29+
appender.console.name = console
30+
appender.console.target = SYSTEM_ERR
31+
appender.console.layout.type = PatternLayout
32+
appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n%ex
33+
34+
# Set the default spark-shell/spark-sql log level to WARN. When running the
35+
# spark-shell/spark-sql, the log level for these classes is used to overwrite
36+
# the root logger's log level, so that the user can have different defaults
37+
# for the shell and regular Spark apps.
38+
logger.repl.name = org.apache.spark.repl.Main
39+
logger.repl.level = warn
40+
41+
logger.thriftserver.name = org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
42+
logger.thriftserver.level = warn
43+
44+
# Settings to quiet third party logs that are too verbose
45+
logger.jetty1.name = org.sparkproject.jetty
46+
logger.jetty1.level = warn
47+
logger.jetty2.name = org.sparkproject.jetty.util.component.AbstractLifeCycle
48+
logger.jetty2.level = error
49+
logger.replexprTyper.name = org.apache.spark.repl.SparkIMain$exprTyper
50+
logger.replexprTyper.level = info
51+
logger.replSparkILoopInterpreter.name = org.apache.spark.repl.SparkILoop$SparkILoopInterpreter
52+
logger.replSparkILoopInterpreter.level = info
53+
logger.parquet1.name = org.apache.parquet
54+
logger.parquet1.level = error
55+
logger.parquet2.name = parquet
56+
logger.parquet2.level = error
57+
58+
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
59+
logger.RetryingHMSHandler.name = org.apache.hadoop.hive.metastore.RetryingHMSHandler
60+
logger.RetryingHMSHandler.level = fatal
61+
logger.FunctionRegistry.name = org.apache.hadoop.hive.ql.exec.FunctionRegistry
62+
logger.FunctionRegistry.level = error
63+
64+
# For deploying Spark ThriftServer
65+
# SPARK-34128: Suppress undesirable TTransportException warnings involved in THRIFT-4805
66+
appender.console.filter.1.type = RegexFilter
67+
appender.console.filter.1.regex = .*Thrift error occurred during processing of message.*
68+
appender.console.filter.1.onMatch = deny
69+
appender.console.filter.1.onMismatch = neutral

conf/log4j2.properties.template

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,11 @@
1919
rootLogger.level = info
2020
rootLogger.appenderRef.stdout.ref = console
2121

22-
# In the pattern layout configuration below, we specify an explicit `%ex` conversion
23-
# pattern for logging Throwables. If this was omitted, then (by default) Log4J would
24-
# implicitly add an `%xEx` conversion pattern which logs stacktraces with additional
25-
# class packaging information. That extra information can sometimes add a substantial
26-
# performance overhead, so we disable it in our default logging config.
27-
# For more information, see SPARK-39361.
2822
appender.console.type = Console
2923
appender.console.name = console
3024
appender.console.target = SYSTEM_ERR
31-
appender.console.layout.type = PatternLayout
32-
appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n%ex
25+
appender.console.layout.type = JsonTemplateLayout
26+
appender.console.layout.eventTemplateUri = classpath:org/apache/spark/SparkLayout.json
3327

3428
# Set the default spark-shell/spark-sql log level to WARN. When running the
3529
# spark-shell/spark-sql, the log level for these classes is used to overwrite

docs/configuration.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3670,14 +3670,17 @@ Note: When running Spark on YARN in `cluster` mode, environment variables need t
36703670
# Configuring Logging
36713671

36723672
Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a
3673-
`log4j2.properties` file in the `conf` directory. One way to start is to copy the existing
3674-
`log4j2.properties.template` located there.
3673+
`log4j2.properties` file in the `conf` directory. One way to start is to copy the existing templates `log4j2.properties.template` or `log4j2.properties.pattern-layout-template` located there.
36753674

3676-
By default, Spark adds 1 record to the MDC (Mapped Diagnostic Context): `mdc.taskName`, which shows something
3677-
like `task 1.0 in stage 0.0`. You can add `%X{mdc.taskName}` to your patternLayout in
3678-
order to print it in the logs.
3675+
## Structured Logging
3676+
Starting from version 4.0.0, Spark has adopted the [JSON Template Layout](https://logging.apache.org/log4j/2.x/manual/json-template-layout.html) for logging, which outputs logs in JSON format. This format facilitates querying logs using Spark SQL with the JSON data source. Additionally, the logs include all Mapped Diagnostic Context (MDC) information for search and debugging purposes.
3677+
3678+
To implement structured logging, start with the `log4j2.properties.template` file.
3679+
3680+
## Plain Text Logging
3681+
If you prefer plain text logging, you can use the `log4j2.properties.pattern-layout-template` file as a starting point. This is the default configuration used by Spark before the 4.0.0 release. This configuration uses the [PatternLayout](https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternLayout) to log all the logs in plain text. MDC information is not included by default. In order to print it in the logs, you can update the patternLayout in the file. For example, you can add `%X{mdc.taskName}` to print the task name in the logs.
36793682
Moreover, you can use `spark.sparkContext.setLocalProperty(s"mdc.$name", "value")` to add user specific data into MDC.
3680-
The key in MDC will be the string of "mdc.$name".
3683+
The key in MDC will be the string of `mdc.$name`.
36813684

36823685
# Overriding configuration directory
36833686

0 commit comments

Comments
 (0)