-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[YARN] SPARK-2668: Add variable of yarn log directory for reference from the log4j configuration #1573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 1573. This patch merges cleanly. |
QA results for PR 1573: |
@renozhang sorry for the delay on this, could you upmerge to the latest? |
0b4028a
to
f70f581
Compare
@tgravescs I've update to the latest, thanks for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename this to be spark.yarn.app.container.log.dir
Can you also update the documentation in docs/running-on-yarn.md to have this config and a good description.
Sorry, I missed writing description in PR. I'll fill description in PR later. As metioned in descritpion of Jira SPARK-2668: Adding this varialbe is for user to define custom log4j.properties, eg: log4j.appender.rolling_file.File = ${spark.yarn.log.dir}/spark.log 发件人: andrewor14 <notifications@github.commailto:notifications@github.com> Where is this config being consumed? Am I missing something obvious? — |
Yup, thanks for answering my deleted question. I realized this afterwards after reading the JIRA. |
@renozhang can you address my comments |
Sorry @tgravescs , these days very busy, I'll address them this weekend. |
f70f581
to
c56aba6
Compare
QA tests have started for PR 1573 at commit
|
@tgravescs patch updated, thanks for your review. |
QA tests have finished for PR 1573 at commit
|
docs/running-on-yarn.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this could be used in more then just streaming applications perhaps we should reword this a little. Perhaps put the information about ${spark.yarn.app.container.log.dir} first and then give the example using RollingFileAppender with streaming.
Something more like: (note feel free to change the exact wording)
If you need a reference to the proper location to put the log files in the YARN so that YARN can properly display and aggregate them, use "${spark.yarn.app.container.log.dir}" in your log4j.properties. For example... (then explain the streaming example).
thanks @renozhang, minor request about the documentation, otherwise looks good. |
c56aba6
to
16c5cb8
Compare
QA tests have started for PR 1573 at commit
|
QA tests have finished for PR 1573 at commit
|
Test FAILed. |
testfailure is unrelated to this pr |
+1 thanks @renozhang ! |
…1573) We recently hit by an issue due to a Hive upgrade doesn’t work with Iceberg. As Apple Spark is heavily used with Iceberg in the production, any change at Spark has a risk to affect Iceberg function. But we don’t run any tests against Iceberg at the moment. To prevent similar issue on Iceberg side, it would be nice if we can run Iceberg unit tests in Apple Spark Rio pipeline.
Assign value of yarn container log directory to java opts "spark.yarn.app.container.log.dir", So user defined log4j.properties can reference this value and write log to YARN container's log directory.
Otherwise, user defined file appender will only write to container's CWD, and log files in CWD will not be displayed on YARN UI,and either cannot be aggregated to HDFS log directory after job finished.
User defined log4j.properties reference example:
log4j.appender.rolling_file.File = ${spark.yarn.app.container.log.dir}/spark.log