Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11929] [core] Make the repl log4j configuration override the root logger. #9816

Closed
wants to merge 2 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Nov 18, 2015

In the default Spark distribution, there are currently two separate
log4j config files, with different default values for the root logger,
so that when running the shell you have a different default log level.
This makes the shell more usable, since the logs don't overwhelm the
output.

But if you install a custom log4j.properties, you lose that, because
then it's going to be used no matter whether you're running a regular
app or the shell.

With this change, the overriding of the log level is done differently;
the log level repl's main class (org.apache.spark.repl.Main) is used
to define the root logger's level when running the shell, defaulting
to WARN if it's not set explicitly.

On a somewhat related change, the shell output about the "sc" variable
was changed a bit to contain a little more useful information about
the application, since when the root logger's log level is WARN, that
information is never shown to the user.

In the default Spark distribution, there are currently two separate
log4j config files, with different default values for the root logger,
so that when running the shell you have a different default log level.
This makes the shell more usable, since the logs don't overwhelm the
output.

But if you install a custom log4j.properties, you lose that, because
then it's going to be used no matter whether you're running a regular
app or the shell.

With this change, the overriding of the log level is done differently;
the log level repl's main class (org.apache.spark.repl.Main) is used
to define the root logger's level when running the shell, defaulting
to WARN if it's not set explicitly.

On a somewhat related change, the shell output about the "sc" variable
was changed a bit to contain a little more useful information about
the application, since when the root logger's log level is WARN, that
information is never shown to the user.
@vanzin
Copy link
Contributor Author

vanzin commented Nov 18, 2015

I wanted to run this idea by a few people before opening a bug. Not sure who generally looks at repl changes... @andrewor14 ? @davies ? @JoshRosen ?

@vanzin
Copy link
Contributor Author

vanzin commented Nov 18, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46239 has finished for PR 9816 at commit 1df1ccd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Nov 20, 2015

ping? any comments on this? if I don't hear back I'll file a bug and go ahead with the patch.

@vanzin vanzin changed the title [RFC] [core] Make the repl log4j configuration override the root logger. [SPARK-11929] [core] Make the repl log4j configuration override the root logger. Nov 23, 2015
// overriding the root logger's config if they're different.
val rootLogger = LogManager.getRootLogger()
val replLogger = LogManager.getLogger("org.apache.spark.repl.Main")
val replLevel = if (replLogger.getLevel() != null) replLogger.getLevel() else Level.WARN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option(replLogger.getLevel()).getOrElse(Level.WARN)

@squito
Copy link
Contributor

squito commented Nov 23, 2015

lgtm, just a tiny comment

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46553 has finished for PR 9816 at commit 7e00e39.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in e6dd237 Nov 24, 2015
@squito
Copy link
Contributor

squito commented Nov 24, 2015

thanks @vanzin, merged to master

@vanzin vanzin deleted the shell-logging branch November 30, 2016 23:01
vanzin pushed a commit that referenced this pull request Dec 13, 2019
…oot logger properly

### What changes were proposed in this pull request?

In the current implementation of `SparkShellLoggingFilter`, if the log level of the root logger and the log level of a message are different, whether a message should logged is decided based on log4j's configuration but whether the message should be output to the REPL's console is not cared.
So, if the log level of the root logger is `DEBUG`, the log level of REPL's logger is `WARN` and the log level of a message is `INFO`, the message will output to the REPL's console even though `INFO < WARN`.
https://github.com/apache/spark/pull/26798/files#diff-bfd5810d8aa78ad90150e806d830bb78L237

The ideal behavior should be like as follows and implemented them in this change.

1. If the log level of a message is greater than or equal to the log level of the root logger, the message should be logged but whether the message is output to the REPL's console should be decided based on whether the log level of the message is greater than or equal to the log level of the REPL's logger.

2. If a log level or custom appenders are explicitly defined for a category, whether a log message via the logger corresponding to the category is logged and output to the REPL's console should be decided baed on the log level of the category.
We can confirm whether a log level or appenders are explicitly set to a logger for a category by `Logger#getLevel` and `Logger#getAllAppenders.hasMoreElements`.

### Why are the changes needed?

This is a bug breaking a compatibility.

#9816 enabled REPL's log4j configuration to override root logger but #23675 seems to have broken the feature.
You can see one example when you modifies the default log4j configuration like as follows.
```
# Change the log level for rootCategory to DEBUG
log4j.rootCategory=DEBUG, console

...
# The log level for repl.Main remains WARN
log4j.logger.org.apache.spark.repl.Main=WARN
```
If you launch REPL with the configuration, INFO level logs appear even though the log level for REPL is WARN.
```
・・・

19/12/08 23:31:38 INFO Utils: Successfully started service 'sparkDriver' on port 33083.
19/12/08 23:31:38 INFO SparkEnv: Registering MapOutputTracker
19/12/08 23:31:38 INFO SparkEnv: Registering BlockManagerMaster
19/12/08 23:31:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/08 23:31:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/08 23:31:38 INFO SparkEnv: Registering BlockManagerMasterHeartbeat

・・・
```
Before #23675 was applied, those INFO level logs are not shown with the same log4j.properties.

### Does this PR introduce any user-facing change?

Yes. The logging behavior for REPL is fixed.

### How was this patch tested?

Manual test and newly added unit test.

Closes #26798 from sarutak/fix-spark-shell-loglevel.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants