Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40747][CORE] Support setting driver log url using env vars on other resource managers #38205

Closed
wants to merge 1 commit into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Oct 11, 2022

What changes were proposed in this pull request?

This PR pulls out the getDriverLogUrls from StandaloneSchedulerBackend to superclass SchedulerBackend, to make it support setting driver log url by env vars w/ prefix SPARK_DRIVER_LOG_URL_ on other resource managers, especially for K8s.

Why are the changes needed?

Since it has such an ability for the executor on K8s mode, we want to align the ability for the driver.

The related code in CoarseGrainedExecutorBackend

  def extractLogUrls: Map[String, String] = {
    val prefix = "SPARK_LOG_URL_"
    sys.env.filterKeys(_.startsWith(prefix))
      .map(e => (e._1.substring(prefix.length).toLowerCase(Locale.ROOT), e._2)).toMap
  }

Does this PR introduce any user-facing change?

Yes, the user could set the log urls by env vars.

How was this patch tested?

Existing UT, if it's not sufficient and the approach is accepted, will add more ut later.

And MT in local mode.

export SPARK_DRIVER_LOG_URL_kibana=https://kibana.svc:8080
export SPARK_DRIVER_LOG_URL_s3_archive=http://log-archive:80
bin/spark-shell --master=local

image

@pan3793 pan3793 changed the title [SPARK-40747] Support setting driver log url using env vars other than standalone mode [SPARK-40747] Support setting driver log url using env vars on other resource managers Oct 11, 2022
@github-actions github-actions bot added the CORE label Oct 11, 2022
@pan3793
Copy link
Member Author

pan3793 commented Oct 11, 2022

cc @HeartSaVioR, would you please take a look? BTW, it does not work on Yarn since YarnClusterSchedulerBackend/YarnCoarseGrainedExecutorBackend override the methods getDriverLogUrls/getExecutorLogUrls.

@pan3793 pan3793 changed the title [SPARK-40747] Support setting driver log url using env vars on other resource managers [SPARK-40747][CORE] Support setting driver log url using env vars on other resource managers Oct 11, 2022
@pan3793
Copy link
Member Author

pan3793 commented Oct 11, 2022

To make it flexible, I'm planning to add the variable substitution support to the log URL just like SPARK-26311 does(SHS only), and let the cluster manager exposes some attributes.

For example, suppose exposing APP_ID, KUBERNETES_POD_NAME in K8s, then the user could integrate to external log service easily.

spark-submit \
  --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
  --deployMode=cluster \
  --conf spark.kubernetes.driverEnv.SPARK_DRIVER_LOG_URL_log=https://spark-log-svc:8080/?app_id={{APP_ID}}&pod_name={{KUBERNETES_POD_NAME}} \
  --conf spark.executorEnv.SPARK_LOG_URL_log=https://spark-log-svc:8080/?app_id={{APP_ID}}&pod_name={{KUBERNETES_POD_NAME}} \
  ...

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pan3793
Copy link
Member Author

pan3793 commented Oct 12, 2022

@dongjoon-hyun @holdenk what do think about the plan? is it the right direction to support the external log service on K8s?

@@ -73,7 +75,12 @@ private[spark] trait SchedulerBackend {
* Executors tab for the driver.
* @return Map containing the log names and their respective URLs
*/
def getDriverLogUrls: Option[Map[String, String]] = None
def getDriverLogUrls: Option[Map[String, String]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we explicitly mention the target resource managers instead of Support setting driver log url using env vars on other resource managers because YarnClusterSchedulerBackend will not use this implementation?

override def getDriverLogUrls: Option[Map[String, String]] = {
YarnContainerInfoHelper.getLogUrls(sc.hadoopConfiguration, container = None)
}

Copy link
Member Author

@pan3793 pan3793 Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense if we want to keep YARN as-is, another direction is to let Yarn support it then it works on all resource managers.

The pseudo-code would like

override def getDriverLogUrls: Option[Map[String, String]] = { 
   YarnContainerInfoHelper.getLogUrls(sc.hadoopConfiguration, container = None) ++
     super.getDriverLogUrls.getOrElse(Map.empty))
 } 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know any instances where the production YARN clusters use the external log services?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was mentioned in SPARK-26311

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to aiming only K8s because YARN is not using this and Mesos is not used in these days. May I ask why don't we have this in KubernetesClusterSchedulerBackend only?

@pan3793
Copy link
Member Author

pan3793 commented Oct 12, 2022

This PR seems to aiming only K8s because YARN is not using this and Mesos is not used in these days. May I ask why don't we have this in KubernetesClusterSchedulerBackend only?

I think it could be a generic way for all cluster managers, and if it's overkill, it's fine to support this feature in KubernetesClusterSchedulerBackend.

@pan3793
Copy link
Member Author

pan3793 commented Oct 23, 2022

Hi @dongjoon-hyun, in #38357, I implement the proposed idea, would you please take a look when you have time?

PS: design doc https://docs.google.com/document/d/1MfB39LD4B4Rp7MDRxZbMKMbdNSe6V6mBmMQ-gkCnM-0/edit?usp=sharing

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but needs consensus from @dongjoon-hyun

@mridulm
Copy link
Contributor

mridulm commented Oct 24, 2022

I tagged @tgravescs on #38357, assuming that is the version that was proposed - or is this what we are looking at ?

@pan3793
Copy link
Member Author

pan3793 commented Oct 25, 2022

@mridulm Yes, SPARK-40887(#38357) is the proposed version.

I opened this PR mostly for collecting feedback in case the community has another idea.

@tgravescs
Copy link
Contributor

Please link the 2 and I would prefer to see this in draft if it isn't meant to be the real solution and just looking for feedback. The jira's should ideally be linked as well.

@pan3793
Copy link
Member Author

pan3793 commented Nov 23, 2022

Close and in favor #38357

@pan3793 pan3793 closed this Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants