Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29720][CORE] Add linux condition to make ProcfsMetricsGetter more complete #26365

Closed
wants to merge 2 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Nov 2, 2019

What changes were proposed in this pull request?

Add Shell.LINUX condition . Related pr 22612 .

Why are the changes needed?

The proc is only support for linux. We can just mkdir /proc on mac os and spark cannot recognize what is . So it should add a linux condition .

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Exists UT.

@ulysses-you
Copy link
Contributor Author

cc @squito @rezasafi

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ulysses-you
Copy link
Contributor Author

Also cc @maropu @dongjoon-hyun

@maropu
Copy link
Member

maropu commented Nov 2, 2019

away from a keyboard now, so will check tonight. cc: @viirya

@maropu
Copy link
Member

maropu commented Nov 2, 2019

This is not my area though, is this any harm without this pr? It seems these metrics are turned off when exception caught:

@ulysses-you
Copy link
Contributor Author

This catch just confirm echo $PPID is support for current os . But not sure whether current os support porc. E.g. mac os, echo $PPID can return the right pid.
Add Linux condition is just to avoid that other os has the same dir like /proc/$pid/stat but in different use case.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -26,6 +26,8 @@ import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
import scala.util.Try

import org.apache.hadoop.util.Shell
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking os.name instead of using hadoop-commons?

@@ -62,13 +64,14 @@ private[spark] class ProcfsMetricsGetter(procfsDir: String = "/proc/") extends L
SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
val shouldLogStageExecutorProcessTreeMetrics =
SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
procDirExists.get && shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics
procDirExists.get && shouldLogStageExecutorProcessTreeMetrics &&
shouldLogStageExecutorMetrics && Shell.LINUX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the os check in the first condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we make a /proc?

@@ -62,13 +64,14 @@ private[spark] class ProcfsMetricsGetter(procfsDir: String = "/proc/") extends L
SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
val shouldLogStageExecutorProcessTreeMetrics =
SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
procDirExists.get && shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics
procDirExists.get && shouldLogStageExecutorProcessTreeMetrics &&
shouldLogStageExecutorMetrics && Shell.LINUX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure to support only Linux? Other Unix-like systems which have /proc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hadoop ProcfsMetrics, it checks this ProcfsBasedProcessTree and it already exists long time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in isProcfsAvailable, we check /proc existence instead of checking Linux like hadoop. Isn't it enough? And for other Unit-like systems which have /proc? We do not need to follow ProcfsBasedProcessTree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIC the procfs is the linux feature, other unix-like os may be try to compatible with it. Check the /proc dir is not a better idea than check linux os. One point is that, someone can change the /proc information which os not support procfs native and it will be a vulnerability.

Copy link
Member

@viirya viirya Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If /proc is in other Unix-like systems, this change actually removes the support on these systems. Unless we are sure current code does not work for such systems, I think we should not simply remove their support.

@srowen
Copy link
Member

srowen commented Nov 4, 2019

Yeah, I don't see the point here. If you've tried to fake /proc then that's the problem. I'm slightly concerned this will make it stop working on an OS that is UNIX-like but without "Linux" in the name.

@rezasafi
Copy link
Contributor

rezasafi commented Nov 4, 2019

I agree with others here. prosfs based systems are not just linux systems ( I have a vague memory that this was discussed when the pr for prosfs metrics was opened) and also this feature isn't enabled by default. But I think we need to add documentation for the config saying that the feature is only working on prosfs based systems. We missed that doc at the time of adding the feature. I mean adding a ".doc" for the config in package.scala

@rezasafi
Copy link
Contributor

rezasafi commented Nov 4, 2019

Actually I am not sure whether we will report the correct metrics in other procfsbased systems since our computation is based on http://man7.org/linux/man-pages/man5/proc.5.html which is linux dependent. I haven't tested this feature on non-linux based systems. So I think it make sense to check whether the OS is actually linux. Thanks for the PR.

@ulysses-you
Copy link
Contributor Author

Thanks for review.
I agree your points otherwise. But If we not sure what will happen in other os, we should forbid the behavior. Isn't it ?

@rezasafi
Copy link
Contributor

rezasafi commented Nov 5, 2019

@ulysses-you In my last comment I said this PR make sense. Although I juts checked the Solaris. There is no stat file there so the prosfcs metrics will raise an exception and return all zero because of the check in this line. It won't give back wrong info to the user in case of solaris. The same may be correct for other OSes. Still no harm in having your change if others also agree.

@ulysses-you
Copy link
Contributor Author

@rezasafi Oh I see... pending the last review.

@srowen
Copy link
Member

srowen commented Nov 6, 2019

The problem isn't Solaris, but rather, Linux distros that for whatever reason don't begin with "Linux" in os.name. I think it's not worth possibly breaking this.

@squito
Copy link
Contributor

squito commented Nov 6, 2019

I agree with Sean, I'm not really sure what the value is of adding this, and it'll potentially prevent the feature from being used in cases it should be fine.

Suppose somebody does create /proc on macos, and then turns on these metrics without this change -- how bad is it? It seems like one time you'll try to grab the metrics and it'll fail. It will then log an exception and set isAvailable = false. Then it won't ever try to grab those metrics again, so its won't spam those exceptions endlessly. (if somebody has manually created a /proc that actually looks like the real /proc -- well, they'll get bogus metrics, but that's their problem.)

@ulysses-you
Copy link
Contributor Author

Thanks for review, I will close this.

@ulysses-you ulysses-you closed this Nov 7, 2019
@ghost
Copy link

ghost commented Sep 25, 2020

Is this why I keep getting the following warning on windows?

WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants