New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17272. ABFS Streams to support IOStatistics API #2353
Conversation
Change-Id: Ic0cdd0c3707066fff8b21f9cca6e7265a547267c
This is to allow hadoop-aws to compile without needing to move to the moved interfaces; it will also allow anything external which used those methods (unlikely) to keep working. Also: throw synchronized at methods to get findbugs to STFU. Tempting just to turn it off on the basis that it is overkill, but it could maybe be correct. Making all of MeanStatistics synchronized is irritating, as it will hurt performance on what should be a lightweight class. But it is needed to ensure samples and sum are consistently set. Change-Id: I4c3e2726e1e97d705c55dbb43a507ea4d0e81e28
Change-Id: I5f64704a82a196fd8ff66cbde10d4970722e1fd7
Change-Id: I3e9bb83bc32eddc5c7d84c1df7be77cea3e60f5d
- failures are registered and reported differently - helper in statistics.impl to wrap functions and callable with handling for this Move BufferedIOStatistics streams from impl to fs.statistics, so its declared OK for others to use counter increment to ignore negative values; returns existing value. successful increment returns the latest value, as documented. Tests for the counter and gauge behaviour Change-Id: Ic851240ebe94033bf93e0a11abefb7adda534191
AbfsOutputStream and AbfsInputStream to support IOStatistics(#2323). AbfsOutputStream is the first commit and AbfsInputStream integration with IOStatistics would be a different commit. |
on hasNext/Next, close() is called, so as to cleanup any open state/handles etc. Avoids having to tell developers downstream to look for an iterator being closeable, and call it if so. + minor improvements in iterator test suite + fix typo found by Mehakmeet Change-Id: Ibd103941278b8c0bc6c93a09ea469be4c60a58b1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see how you use the API, good to get the feedback
both the input and output streams need to declare implements IOStatisticsSource
, and getIOStatistics to return their value.
...adoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsOutputStreamStatistics.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStreamStatisticsImpl.java
Outdated
Show resolved
Hide resolved
...ure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStreamStatisticsImpl.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java
Outdated
Show resolved
Hide resolved
Have to force push since am rebasing Steve's branch on my commits. Also, Don't know why Yetus isn't running. |
+ add class which can be used to store duration results, primarily for testing. Made public for others to play with too Change-Id: I0a7f274f5a2a180e1800102be177b308050725c0
yetus is sort of running; latest build reports differently. I think it's updating the Hadoop JIRA. If you look at the "all checks have failed" bit below, you get the logs and can click through to the test results instead. Nice integration and avoids PRs filling up with Yetus messages. But: checkstyle and deprecation messages will also trigger the failure notification, so the reporting is overreacting. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM;
- I'm going to tweak trackDuration to make it easier for you
- that's ~it for this patch
we do need to get that main IOStats one in. Add your comments there as someone who has been using it...
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
...zure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamStatisticsImpl.java
Outdated
Show resolved
Hide resolved
@@ -160,7 +159,7 @@ public void testSeekStatistics() throws IOException { | |||
assertEquals("Mismatch in forwardSeekOps value", OPERATIONS, | |||
stats.getForwardSeekOperations()); | |||
assertEquals("Mismatch in bytesBackwardsOnSeek value", | |||
-1 * OPERATIONS * ONE_MB, stats.getBytesBackwardsOnSeek()); | |||
OPERATIONS * ONE_MB, stats.getNegativeBytesBackwardsOnSeek()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the name change? What does this statistic mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytesBackwardsOnSeek had negative increments and thus had a negative value but, since we have a restriction that counters can't increment negative values, I thought to have a negative of bytesBackwardsOnSeek counter so that we can have a counter to represent this value just in positive. Looks confusing I guess, should've come up with better naming :)
* move the methods in fs.impl.FutureIOSupport which turn out to be needed when working with the async code to the public utils.functional package * duration tracking invocation in IOStatisticsBinding tuning based on the ABFS coding experience. Change-Id: Ide9467fa67a8a5d3d8daec94a7a39ff87f106a47
Fix checkstyle in RemoteIOException unwrapping of exception looks for Error and rethrows Change-Id: I673b1e487cae2ec3204ce6ba1103cdae14cfe48c
💔 -1 overall
This message was automatically generated. |
* the trackDuration(DurationTrackerFactory,...) methods support null factory and switch to a stub tracker * there's an overloaded logIOStatisticsAtDebug() method to use the IOStatisticsLogging own logger Change-Id: Ie786dc7aa8920a3fc8b22b55916f4b811810dab3
* PairedDurationTrackerFactory allows listing code to update both FS and instance level stats. * test assertions to become verifyStatistic* assertStatistic* to make clear these are statistic assertions; also helps code completion * stream_read_unbuffered stream key Change-Id: I25272fffeb3e66b5cec90ae6c6af74808c139b26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All your additions LGTM; note the comment about grabbing the AtomicLong reference for frequently-updated counters; I will do that in the the S3A input stream too
"Total number of times a write operation is throttled."), | ||
|
||
//OutputStream statistics. | ||
BYTES_TO_UPLOAD("bytes_upload", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should all be in hadoop-common where possible, so that we have consistent names everywhere for aggregation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think apart from readAhead counter we can have all the others in common.
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
@@ -73,6 +78,8 @@ | |||
private long bytesFromReadAhead; // bytes read from readAhead; for testing | |||
private long bytesFromRemoteRead; // bytes read remotely; for testing | |||
|
|||
private IOStatistics ioStatistics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be null too, depending on the stream statistics instance(Which we can either be null or not, using the AbfsInputStreamContext class).
if (bytes > 0) { | ||
bytesRead += bytes; | ||
} | ||
ioStatisticsStore.incrementCounter(STREAM_READ_BYTES, bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider here is the cost of the map lookup on every IOP. You can ask the IOStatisticsStore for a reference to the atomic counter, and use that direct. I'm doing that for the output stream, reviewing this patch makes me realise I should be doing it for read as well. at least the read byte counters which are incremented on every read.
bytesUploaded = store.getCounterReference(
STREAM_WRITE_TOTAL_DATA.getSymbol());
bytesWritten = store.getCounterReference(
StreamStatisticNames.STREAM_WRITE_BYTES);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. I've added the frequently updated counters as counter ref.
...zure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStreamStatisticsImpl.java
Show resolved
Hide resolved
.trackDurationOfCallable((IOStatisticsStore) ioStatistics, | ||
AbfsStatistic.TIME_SPENT_ON_PUT_REQUEST.getStatName(), | ||
() -> { | ||
AbfsPerfTracker tracker = client.getAbfsPerfTracker(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given we are wrapping callables with callables, maybe the Abfs Perf tracker could join in. Not needed for this patch, but later...
...ure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStreamStatisticsImpl.java
Outdated
Show resolved
Hide resolved
ad96f39
to
a57cbe8
Compare
💔 -1 overall
This message was automatically generated. |
Closing this one, and opening as #2604. |
Tested using: mvn -T 1C -Dparallel-tests=abfs clean verify
Region: East US