-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20435][CORE] More thorough redaction of sensitive information #17725
Conversation
…from logs/UI, more unit tests
Test build #76051 has finished for PR 17725 at commit
|
because otherwise it gets redacted too
Few decisions that I made here:
Thanks in advance for reviewing. |
with a sensitive value
Test build #76055 has started for PR 17725 at commit |
Test build #76054 has finished for PR 17725 at commit
|
Jenkins, retest this please. |
Test build #76114 has finished for PR 17725 at commit
|
// ... | ||
// where jvmInformation, sparkProperties, etc. are sequence of tuples. | ||
// We go through the various of properties and redact sensitive information from them. | ||
val redactedProps = event.environmentDetails.map{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map { case (name, props) =>
// This does mean we may be accounting more false positives - for example, if the value of an | ||
// arbitrary property contained the term 'password', we may redact the value from the UI and | ||
// logs. In order to work around it, user would have to make the spark.redaction.regex property | ||
// more specific. | ||
kvs.map { kv => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're looking at values now...
.map { case (key, value) =>
runSparkSubmit(args) | ||
val listStatuses = fileSystem.listStatus(testDirPath) | ||
val logData = EventLoggingListener.openEventLog(listStatuses.last.getPath, fileSystem) | ||
Source.fromInputStream(logData).getLines().foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.foreach { line =>
"--conf", "spark.hadoop.fs.defaultFS=unsupported://example.com", | ||
unusedJar.toString) | ||
runSparkSubmit(args) | ||
val listStatuses = fileSystem.listStatus(testDirPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/listStatuses/something else.
Use list, statuses, statusList, but "listStatuses" doesn't parse for me.
Test build #76143 has finished for PR 17725 at commit
|
Hmm, this is weird, it ran for 4h10 mins and got killed due to a timeout. Looking... |
Jenkins, retest this please. |
Kicking off another run here while I am running a run locally. |
Test build #76156 has finished for PR 17725 at commit
|
This time it finished on regular time (2h 29m) but failed a test. I ran that test locally (org.apache.spark.streaming.ReceiverSuite's 'receiver life cycle') and it passed for me, so for mostly laziness, I am going to issue another run here. |
Jenkins, retest this please. |
Test build #76194 has finished for PR 17725 at commit
|
Finally it's passing! |
Merging to master / 2.2. |
This change does a more thorough redaction of sensitive information from logs and UI Add unit tests that ensure that no regressions happen that leak sensitive information to the logs. The motivation for this change was appearance of password like so in `SparkListenerEnvironmentUpdate` in event logs under some JVM configurations: `"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..." ` Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well. ## How was this patch tested? New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it. Author: Mark Grover <mark@apache.org> Closes #17725 from markgrover/spark-20435. (cherry picked from commit 66636ef) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Thanks Marcelo!
…On Apr 26, 2017 5:06 PM, "Marcelo Vanzin" ***@***.***> wrote:
Merging to master / 2.2.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#17725 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABoVi9s2uyakNm-81qWP3aBpRkFHtm3Jks5rz9vugaJpZM4NE66l>
.
|
This change does a more thorough redaction of sensitive information from logs and UI
Add unit tests that ensure that no regressions happen that leak sensitive information to the logs.
The motivation for this change was appearance of password like so in
SparkListenerEnvironmentUpdate
in event logs under some JVM configurations:"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
Previously redaction logic was only checking if the key matched the secret regex pattern, it'd redact it's value. That worked for most cases. However, in the above case, the key (sun.java.command) doesn't tell much, so the value needs to be searched. This PR expands the check to check for values as well.
How was this patch tested?
New unit tests added that ensure that no sensitive information is present in the event logs or the yarn logs. Old unit test in UtilsSuite was modified because the test was asserting that a non-sensitive property's value won't be redacted. However, the non-sensitive value had the literal "secret" in it which was causing it to redact. Simply updating the non-sensitive property's value to another arbitrary value (that didn't have "secret" in it) fixed it.