Skip to content

[SPARK-48209][CORE] Common (java side): Migrate error/warn/info with variables to structured logging framework#46493

Closed
panbingkun wants to merge 17 commits intoapache:masterfrom
panbingkun:common_java_sl
Closed

[SPARK-48209][CORE] Common (java side): Migrate error/warn/info with variables to structured logging framework#46493
panbingkun wants to merge 17 commits intoapache:masterfrom
panbingkun:common_java_sl

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented May 9, 2024

What changes were proposed in this pull request?

The pr aims to
1.migrate error/warn/info in module common with variables to structured logging framework for java side.
2.convert all dependencies on org.slf4j.Logger & org.slf4j.LoggerFactory to org.apache.spark.internal.Logger & org.apache.spark.internal.LoggerFactory, in order to completely prohibit importing org.slf4j.Logger & org.slf4j.LoggerFactory in java code later.

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

try {
manager.destroy();
} catch (InterruptedException ex) {
logger.info("Interrupted while destroying trust manager: " + ex.toString(), ex);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant ex.toString().

listener.getTransferType(), blockIdsToTransfer.length,
numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e);

if (numRetries > 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print different log message according to the value of numRetries

if (time.isDefined) {
logWarning(
log"Non-Fatal error during RPC execution: ${MDC(ERROR, lastException)}, " +
log"retrying (wait=${MDC(WAIT_TIME, time.get.toMillis)} ms, " +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify WAIT_TIME into RETRY_WAIT_TIME

import org.apache.spark.{SecurityManager, SparkConf}
import org.apache.spark.ExecutorDeadException
import org.apache.spark.internal.config
import org.apache.spark.internal.{config, LogKeys, MDC}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because NettyBlockTransferService extends BlockTransferService
BlockTransferService is java code, and this change triggered it.


public class LoggerFactory {

public static Logger getLogger(String name) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YarnShuffleService will use it:
image

numRemovedBlocks++;
} else {
logger.warn("Failed to delete block: " + file.getAbsolutePath());
logger.warn("Failed to delete block: {}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's appropriate to call this LogKeys.PATH$.MODULE$?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, path is fine

AppExecId id = parseDbAppExecKey(key);
logger.info("Reloading registered executors: " + id.toString());
logger.info("Reloading registered executors: {}",
MDC.of(LogKeys.APP_EXECUTOR_ID$.MODULE$, id.toString()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundancy .toString()

this.reloadCount += 1;
} catch (Exception ex) {
logger.warn(
"Could not load truststore (keep using existing one) : " + ex.toString(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundancy ex.toString()

});
} catch (Exception e) {
logger.error("Error while invoking receiveMergeBlockMetaReq() for appId {} shuffleId {} "
+ "reduceId {}", req.appId, req.shuffleId, req.appId, e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix typo
reduceId {req.appId} -> reduceId {req.reduceId}

} catch (Exception e) {
defaultLogger.error("Unable to create an instance of {}", mergeManagerImplClassName);
defaultLogger.error("Unable to create an instance of {}",
MDC.of(LogKeys.CLASS_NAME$.MODULE$, mergeManagerImplClassName));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or call MERGED_SHUFFLE_FILE_MANAGER ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLASS_NAME looks good.

logger.info("Updated active local dirs {} and sub dirs {} for application {}",
Arrays.toString(activeLocalDirs),subDirsPerLocalDir, appId);
MDC.of(LogKeys.LOCAL_DIRS$.MODULE$, Arrays.toString(activeLocalDirs)),
MDC.of(LogKeys.NUM_SUB_DIRS$.MODULE$, subDirsPerLocalDir),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subDirsPerLocalDir is actually a number (the number of sub directories), not a sub directory path.

@panbingkun
Copy link
Contributor Author

cc @gengliangwang

@panbingkun panbingkun marked this pull request as ready for review May 10, 2024 03:04
logger.info("Application {} removed, cleanupLocalDirs = {}", appId, cleanupLocalDirs);
logger.info("Application {} removed, cleanupLocalDirs = {}",
MDC.of(LogKeys.APP_ID$.MODULE$, appId),
MDC.of(LogKeys.LOCAL_DIRS$.MODULE$, cleanupLocalDirs));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanupLocalDirs is a boolean. let's just use CLEANUP_LOCAL_DIRS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@gengliangwang
Copy link
Member

LGTM overall. Thanks for the work!

@panbingkun
Copy link
Contributor Author

LGTM overall. Thanks for the work!

Updated, thanks! ❤️

case object LOADED_VERSION extends LogKey
case object LOAD_FACTOR extends LogKey
case object LOAD_TIME extends LogKey
case object LOCAL_DIRS extends LogKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ..., the following uses it, do we should merge it to PATH?
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, merge into PATHS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gengliangwang
Copy link
Member

Thanks, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants