Skip to content

[SPARK-3718] FsHistoryProvider should consider spark.eventLog.dir not only spark.history.fs.logDirectory#2573

Closed
sarutak wants to merge 1 commit intoapache:masterfrom
sarutak:SPARK-3718
Closed

[SPARK-3718] FsHistoryProvider should consider spark.eventLog.dir not only spark.history.fs.logDirectory#2573
sarutak wants to merge 1 commit intoapache:masterfrom
sarutak:SPARK-3718

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Sep 29, 2014

It's a minor improvement.

FsHistoryProvider reads event logs from the directory represented as spark.history.fs.logDirectory, but I think the directory is nearly equal the directory represented as spark.eventLog.dir so we should consider spark.eventLog.dir too.

@SparkQA
Copy link

SparkQA commented Sep 29, 2014

QA tests have started for PR 2573 at commit 2de89b4.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 29, 2014

QA tests have finished for PR 2573 at commit 2de89b4.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WangTaoTheTonic
Copy link
Contributor

Looks like spark.history.fs.logDirectory and spark.eventLog.dir is same configuration item on different sides(driver side and HistoryServer side). I thingk distinguishing them between each other is better to keep HistoryServer independent.

@sarutak
Copy link
Member Author

sarutak commented Sep 29, 2014

Basically, I think it's good idea to separate configuration between Driver side and HistoryServer side but if we use HDFS as a storage for event logs, in most case, spark.history.fs.logDirectory and spark.eventLog.dir is set to same. So, I think it's good to choose spark.eventLog.dir as a second candidate of event log's directory.

@WangTaoTheTonic
Copy link
Contributor

Actually HistoryServer can read application logs generated by Spark apps on another node. The spark.eventLog.dir could be different between this and that. So on my opinion it is flexible to seperate the two configs.
Also spark.eventLog.dir is activated only if spark.eventLog.enabled is true. If HistoryServer load data in spark.eventLog.dir, is it necessary to check value of spark.eventLog.enabled?
In a word current solution is simple and loose coupling.

@andrewor14
Copy link
Contributor

Hey @sarutak, these two are distinct in that spark.eventLog.dir is application-specific, while spark.history.fs.logDirectory is not. You may have two history servers for instance, and some applications want to connect to the first and others the second. I don't really see a benefit in conflating these two in any way. Would you mind closing this?

@sarutak sarutak closed this Oct 1, 2014
@sarutak sarutak deleted the SPARK-3718 branch April 11, 2015 05:21
@abraithwaite
Copy link

Hello!

I was reading the explanation for not merging these two and I'm not quite sure I understand the reasoning still. I spent a bit too long trying to figure out how to configure the executors to log to the correct hdfs directory.

How exactly does a spark application connect directly to a spark history server? It's my understanding (correct me if I'm wrong) that the application logs to a directory and the history server reads that directory. So even if you had two history servers, they'd presumably both only have one log directory configuration parameter, no?

Clearly, the docs should at least be cleared up on the monitoring page. https://spark.apache.org/docs/latest/monitoring.html has no mention of spark.eventLog.dir (although it does mention spark.eventLog.enabled). It seems intuitive that these would be the same property.

/cc @andrewor14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants