-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19729. [ABFS][Perf] Network Profiling for Tailing Requests and Killing Bad Connections Proactively #8043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
| } | ||
|
|
||
| public boolean isTailLatencyRequestTimeoutEnabled() { | ||
| return isTailLatencyRequestTimeoutEnabled && isTailLatencyTrackerEnabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first check should be for isTailLatencyTrackerEnabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
Taken
| return tailLatencyAnalysisWindowInMillis; | ||
| } | ||
|
|
||
| public int getTailLatencyPercentileComputationIntervalInMillis() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name should be shortened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
|
|
||
| public static final boolean DEFAULT_FS_AZURE_ENABLE_CREATE_BLOB_IDEMPOTENCY = true; | ||
|
|
||
| public static final boolean DEFAULT_FS_AZURE_ENABLE_TAIL_LATENCY_TRACKER = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not want this feature to be enabled by default ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no value add currently to just enable profling as we are not consuming it anywhere.
| } catch (TimeoutException e) { | ||
| /* Deadline exceeded, abort the request. | ||
| * This will also kill the underlying socket exception in the HttpClient. | ||
| * Connection will be marker stale and won't be returned back to KAC for reuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo marked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| /** | ||
| * Constant for Static Retry Policy Abbreviation. {@value} | ||
| */ | ||
| public static final String TAIL_LATENCY_TIMEOUT_RETRY_POLICY_ABBREVIATION = "T"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it TL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other Retry policy abbreviations are already single character. Keeping it likewise
| int significantFigures, | ||
| final AbfsRestOperationType operationType) { | ||
| if (windowSizeMillis <= 0) throw new IllegalArgumentException("windowSizeMillis > 0"); | ||
| if (numberOfSegments <= 0) throw new IllegalArgumentException("bucketDurationMillis > 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be numberOfSegments in exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| /** Get any percentile over the current sliding window. */ | ||
| public void computeLatency() { | ||
| if (getCurrentTotalCount() < minSampleSize) { | ||
| LOG.debug("[{}] Not enough data to report percentiles. Current total count: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can return here itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| p50 = tmpForMerge.getValueAtPercentile(50); | ||
| p90 = tmpForMerge.getValueAtPercentile(90); | ||
| p99 = tmpForMerge.getValueAtPercentile(99); | ||
| deviation = (int) ((tailLatency - p50)/p50 * 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chances of division by zero error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
Taken
| activeSegmentRecorder.getIntervalHistogramInto(tmpForDelta); | ||
| currentSegmentAccumulation.add(tmpForDelta); | ||
|
|
||
| if (currentSegmentAccumulation.getTotalCount() <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is less than 0 possible for total count? It is incremented always right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is primarily for equal to 0
| // Next slot is now going to be eradicated. Remove its count from total. | ||
| currentTotalCount.set(currentTotalCount.get() - (completedSegments[currentIdx] == null ? 0 : completedSegments[currentIdx].getTotalCount())); | ||
| // Store an immutable snapshot (make sure we don't mutate the instance after storing) | ||
| completedSegments[currentIdx] = currentSegmentAccumulation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we making sure that this is immutable after this point ? completedSegments[currentIdx] = currentSegmentAccumulation.copy(); ideally we should create a deep copy of the histogram data so that even if currentSegmentAccumulation is reused or reset for the next segment, the data in completedSegments remains unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is happening by reference. The reference earlier held by currentSegmentAccumulation is now saved into completedSegments[currentIdx]. And a new reference is created and saved into currentSegmentAccumulation
| /** Ensure active bucket is aligned to current time; rotate if we've crossed a boundary. */ | ||
| public void rotateIfNeeded() { | ||
| LOG.debug("[{}] Triggering Histogram Rotation", operationType); | ||
| long now = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use of the variable now ? We can directly use System.currentTimeMillis(); in expectedStart as we are doing later in line 195
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| */ | ||
| private String failureReason; | ||
| private AbfsRetryPolicy retryPolicy; | ||
| private boolean shouldTailLatencyTimeout = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be renamed to enableTailLatencyTimeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be misleading.
This variable is not a flag for this feature. Even when feature is enabled, we might have this as false.
This is to indicate that all the retried due to TailLatencyTimeout are exhausted and even though the feature is still enabled, for the next retry we should not Timeout due to tail latency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added javadoc
| } | ||
|
|
||
| // Update Tail Latency Tracker only for successful requests. | ||
| if (tailLatencyTracker != null && statusCode < HttpURLConnection.HTTP_MULT_CHOICE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will get updated for -1 status code as well, should be checked between 200 to 300
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
|
|
||
| @Test | ||
| public void testSlidingWindowHdrHistogram() throws Exception { | ||
| SlidingWindowHdrHistogram histogram = new SlidingWindowHdrHistogram( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for which value represents what
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| // Verify that analysis window is full after full rotation. | ||
| assertThat(histogram.isAnalysisWindowFilled()).isTrue(); | ||
|
|
||
| // Verify that percentiles are not computed due to low deviation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be computed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
|
|
||
| public static final boolean DEFAULT_FS_AZURE_ENABLE_TAIL_LATENCY_TRACKER = false; | ||
| public static final boolean DEFAULT_FS_AZURE_ENABLE_TAIL_LATENCY_REQUEST_TIMEOUT = false; | ||
| public static final int DEFAULT_FS_AZURE_TAIL_LATENCY_PERCENTILE = 99; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be float/double instead of int? Tomorrow we can change default percentile to 99.9 or 99.99.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion. Will take it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not finding an easy way to make this change. Will add a work item for this improvement and take it up in follow up items.
| /** | ||
| * Constructs a TailLatencyRequestTimeoutException with TimeoutException as the cause. | ||
| */ | ||
| public TailLatencyRequestTimeoutException(TimeoutException innerException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@param missing in the java doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| super(ERR_TAIL_LATENCY_REQUEST_TIMEOUT, innerException); | ||
| } | ||
|
|
||
| public TailLatencyRequestTimeoutException() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java doc missing for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
| * @return HTTP response. | ||
| * @throws IOException network error. | ||
| */ | ||
| public HttpResponse executeWithoutDeadline(HttpRequestBase httpRequest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executeWithoutDeadline and executeWithDeadline can be private methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| return ZERO; | ||
| } | ||
|
|
||
| boolean isTailLatencyTimeoutEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java doc missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
|
|
||
| private static final Logger LOG = LoggerFactory.getLogger( | ||
| AbfsTailLatencyTracker.class); | ||
| private static AbfsTailLatencyTracker singleton; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rename this variable to something which is more relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| rotationInterval, rotationInterval, TimeUnit.MILLISECONDS); | ||
|
|
||
|
|
||
| ScheduledExecutorService tailLatencyComputationThread = Executors.newSingleThreadScheduledExecutor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should close this thread pool and one below once the use is done or at least during filesystem close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latency tracker are per account basis. They are shared across all filesystem in a single JVM. These are daemon threads and will be killed when JVM gets killed
| t.setDaemon(true); | ||
| return t; | ||
| }); | ||
| long rotationInterval = configuration.getTailLatencyAnalysisWindowInMillis() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the division could be by 0 if someone sets window granularity as 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| private int tailLatencyAnalysisWindowInMillis; | ||
|
|
||
| @IntegerConfigurationValidatorAnnotation(ConfigurationKey = FS_AZURE_TAIL_LATENCY_ANALYSIS_WINDOW_GRANULARITY, | ||
| DefaultValue = DEFAULT_FS_AZURE_TAIL_LATENCY_ANALYSIS_WINDOW_GRANULARITY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a min, max value for window size above and window granularity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added min value for granularity.
| LOCK.lock(); | ||
| try { | ||
| if (singleton == null) { | ||
| singleton = new AbfsTailLatencyTracker(abfsConfiguration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could log the initialization with the granularity etc configs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already in SlidingWindowHdrHistogram class
|
|
||
| // Update Tail Latency Tracker only for successful requests. | ||
| if (tailLatencyTracker != null && statusCode < HttpURLConnection.HTTP_MULT_CHOICE) { | ||
| tailLatencyTracker.updateLatency(operationType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can 2 threads call updateLatency() for the same operation type simultaneously and create histograms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
Added Lock while creation.
|
|
||
| /** | ||
| * Gets the tail latency for a specific operation type. | ||
| * @param operationType Only applicable for read and write operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only for read, write operations?
we are not making the operationType check inside the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
💔 -1 overall
This message was automatically generated. |
| } | ||
|
|
||
| @Test | ||
| public void testTailLatencyTimeoutEnabled() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java doc missing. Please add it to all newly added test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| @VisibleForTesting | ||
| void updateBackoffMetrics(int retryCount, int statusCode) { | ||
| if (abfsBackoffMetrics != null) { | ||
| if (statusCode < HttpURLConnection.HTTP_OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can we reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
| } | ||
| incrementCounter(AbfsStatistic.GET_RESPONSES, 1); | ||
| //Only increment bytesReceived counter when the status code is 2XX. | ||
| if (httpOperation.getStatusCode() >= HttpURLConnection.HTTP_OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
|
💔 -1 overall
This message was automatically generated. |
| @IntegerConfigurationValidatorAnnotation(ConfigurationKey = FS_AZURE_WRITE_CPU_MONITORING_INTERVAL_MILLIS, | ||
| MinValue = MIN_WRITE_CPU_MONITORING_INTERVAL_MILLIS, | ||
| MaxValue = MAX_WRITE_CPU_MONITORING_INTERVAL_MILLIS, | ||
| DefaultValue = DEFAULT_WRITE_CPU_MONITORING_INTERVAL_MILLIS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs for warnings can be added
| configuration.getTailLatencyMinSampleSize(), | ||
| configuration.getTailLatencyPercentile(), | ||
| configuration.getTailLatencyMinDeviation(), | ||
| talLatencyAnalysisWindowInMillis, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spelling of tail
This comment was marked as outdated.
This comment was marked as outdated.
|
💔 -1 overall
This message was automatically generated. |
|
:::: AGGREGATED TEST RESULT :::: ============================================================
|
Description of PR
Jira: https://issues.apache.org/jira/browse/HADOOP-19729
It has been observed that certain requests taking more time than expected to complete hinders the performance of whole workload. Such requests are known as tailing requests. They can be taking more time due to a number of reasons and the prominent among them is a bad network connection. In Abfs driver we cache network connections and keeping such bad connections in cache and reusing them can be bad for perf.
In this effort we try to identify such connections and close them so that new good connetions can be established and perf can be improved. There are two parts of this effort.
Identifying Tailing Requests: This involves profiling all the network calls and getting percentiles value optimally. By default we consider p99 as the tail latency and all the future requests taking more than tail latency will be considere as Tailing requests.
Proactively Killing Socket Connections: With Apache client, we can now kill the socket connection and fail the tailing request. Such failures will not be thrown back to user and retried immediately without any sleep but from another socket connection.
How was this patch tested?
New tests around both profiling and connection killing added.