[SPARK-3718] FsHistoryProvider should consider spark.eventLog.dir not only spark.history.fs.logDirectory#2573
[SPARK-3718] FsHistoryProvider should consider spark.eventLog.dir not only spark.history.fs.logDirectory#2573sarutak wants to merge 1 commit intoapache:masterfrom
Conversation
|
QA tests have started for PR 2573 at commit
|
|
QA tests have finished for PR 2573 at commit
|
|
Looks like |
|
Basically, I think it's good idea to separate configuration between Driver side and HistoryServer side but if we use HDFS as a storage for event logs, in most case, spark.history.fs.logDirectory and spark.eventLog.dir is set to same. So, I think it's good to choose spark.eventLog.dir as a second candidate of event log's directory. |
|
Actually HistoryServer can read application logs generated by Spark apps on another node. The |
|
Hey @sarutak, these two are distinct in that |
|
Hello! I was reading the explanation for not merging these two and I'm not quite sure I understand the reasoning still. I spent a bit too long trying to figure out how to configure the executors to log to the correct hdfs directory. How exactly does a spark application connect directly to a spark history server? It's my understanding (correct me if I'm wrong) that the application logs to a directory and the history server reads that directory. So even if you had two history servers, they'd presumably both only have one log directory configuration parameter, no? Clearly, the docs should at least be cleared up on the monitoring page. https://spark.apache.org/docs/latest/monitoring.html has no mention of spark.eventLog.dir (although it does mention spark.eventLog.enabled). It seems intuitive that these would be the same property. /cc @andrewor14 |
It's a minor improvement.
FsHistoryProvider reads event logs from the directory represented as spark.history.fs.logDirectory, but I think the directory is nearly equal the directory represented as spark.eventLog.dir so we should consider spark.eventLog.dir too.