-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18577. log/probes of HADOOP-18546 presence. #5205
HADOOP-18577. log/probes of HADOOP-18546 presence. #5205
Conversation
* ReadBufferManager logs at trace. * AbfsInputStream.toString() declares fix. * abfs path capabilities also returns true for the probe "HADOOP-18546". This allows for programmatic probes in tests. The cloudstore pathcapabilities command can probe for this; we should move this into the hadoop cli. Change-Id: I01805a41f8db973cc559b6d85657cb94b8266bba
how to use this // ********************************************************************************************
// probe a FS for the bug using path capabilities checks
// ********************************************************************************************
def isVulnerable(fspath: String): Boolean = {
val p = path(fspath)
if (!Set("abfs", "abfss").contains(p.toUri.getScheme)) {
println("not an azure filesystem")
return false;
}
val targetFS = bind(p);
if (!targetFS.hasPathCapability(p, "fs.capability.paths.acls")) {
println("The release predates the bugs addition to the codebase")
return false;
}
if (targetFS.hasPathCapability(p, "HADOOP-18546")) {
println("The release has the HADOOP-18546 fix")
return false;
}
println("The release is recent enough to contain the bug, and does not have the fix")
val fsconf = targetFS.getConf
if (fsconf.getBoolean("fs.azure.enable.readahead", false)) {
println("readahead disabled (cloudera releases and hadoop 3.3.5+)")
return false;
}
val depth = "fs.azure.readaheadqueue.depth"
val queue = fsconf.getInt(depth, -1)
if (queue == 0) {
println(s"${depth} set to 0: safe")
return false;
}
queue match {
case -1 =>
println(s"${depth} will be the default: unsafe")
true;
case 0 =>
println(s"${depth} set to zero: safe")
false
case _ =>
println(s"${depth} set to ${queue}: unsafe")
false
}
} |
* need to make the probe key lower case for the matching * add a test for it * use eventually() to wait for the asserts to become valid, to eliminate test race conditions Change-Id: I6520af5bb8c41d48db978efa3234c005f3cf2a1a
💔 -1 overall
This message was automatically generated. |
@steveloughran When I completed YARN-related PR, I also encountered related issues, and I submitted the repair pr for each moudle. We can refer to #5182 I added a comment before the code to solve related issues. |
@slfan1989 wow, big problem...going to need a lot of changes in the code, even if just needed the package-info.java files. |
Ignoring the javadocs, I believe this is ready. Please can I get reviews as I consider this a blocker for the 3.3.5 release -I need that api probe |
@snvijaya @mukund-thakur @mehakmeet can I get a review of this -i want this in so there is a programmatic check for the presence of the fix. I'm adding a "safeprefetch" command to cloudstore which will identify when an abfs release has the bug (everything with etag_aware), has the fix (the new probe) and if vulnerable review the options, printing out the correct settings in xml and spark conf. we need this probe for it to see when things are good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small nits
public static final int TIMEOUT_OFFSET = 5 * 60_000; | ||
|
||
/** | ||
* Interval between eventually preobes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "probes"
@@ -1636,6 +1636,11 @@ public boolean hasPathCapability(final Path path, final String capability) | |||
new TracingContext(clientCorrelationId, fileSystemId, | |||
FSOperationType.HAS_PATH_CAPABILITY, tracingHeaderFormat, | |||
listener)); | |||
|
|||
// probe for presence of HADOOP-18546 fix. | |||
case "hadoop-18546": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming the probe on a Hadoop Jira makes it a little difficult to understand it from the code directly. Should we have a general name for the probe related to the prefetch inconsistent reads and have the Hadoop jira mentioned in the comments only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was trying to think of one but it is a "is this fix in". that way, if we need to add another we can ask for that as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but having a probe for the fix "fs.azure.readahead.safe" would let us move to a different fix while retaining the same probe...
if (streamStatistics != null) { | ||
sb.append("AbfsInputStream@(").append(this.hashCode()).append("){"); | ||
sb.append(streamStatistics.toString()); | ||
sb.append("}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closing bracket of the log should be outside the statistics if block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
append(","), should also move inside if block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, didn't understand at first, but yes, you are absolutely correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had just a minor comment around toString() of getIOStatistics.
+1.
if (streamStatistics != null) { | ||
sb.append("AbfsInputStream@(").append(this.hashCode()).append("){"); | ||
sb.append(streamStatistics.toString()); | ||
sb.append("}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
append(","), should also move inside if block ?
I agree with you that the javadoc issue is not blocking. |
Change-Id: I955ed8f5c4e0637a9f2c1eea625278ad4c9e6604
💔 -1 overall
This message was automatically generated. |
|
||
public class ITestReadBufferManager extends AbstractAbfsIntegrationTest { | ||
|
||
/** | ||
* Time before the JUnit test times out for eventually() clauses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: causes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. eventually takes a lambda expression, which is what i was referring to
if (streamStatistics != null) { | ||
sb.append("AbfsInputStream@(").append(this.hashCode()).append("){"); | ||
sb.append(streamStatistics.toString()); | ||
sb.append(", ").append(streamStatistics); | ||
sb.append("}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mehakmeet's comment is still vaild I guess.
Shouldn't sb.append("}"); L835 go outside of if block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, fixing
Use "fs.capability.etags.available" as the probe name. Had to create a new file for this as none of the existing ones were appropriate. org.apache.hadoop.fs.azurebfs.constants.InternalConstants Print this value in the toString() of abfs and AbfsInputStream. Tests for this, including a new one in ITestFileSystemInitialization which includes tests for other expected capabilities. Change-Id: I31cbb58ef23ce78b32572c7f86a8acf269040e55
972076f
to
06c0fed
Compare
|
timeout in two tests running with threads=8
|
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
merged; created an explict new jira for this, HADOOP-18577, referring to original patch and overall issue in commit message for easy location |
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (apache#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (apache#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (apache#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (apache#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran
One more followup JIRA based on the experience of trying to work out if a patched
JAR was being picked up through spark builds.
The cloudstore pathcapabilities command can probe for this; we should move this into the hadoop cli.
How was this patch tested?
Currently got spark shell doing repeated validation of 10MB rows; will leave it running
all night.
Not yet run the cloud suites as they stamp on the same container.
I should fix that, really.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?