Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16612 Track Azure Blob File System client-perceived latency #1611

Closed
wants to merge 13 commits into from

Conversation

jeeteshm
Copy link

@jeeteshm jeeteshm commented Oct 7, 2019

Add instrumentation code to measure the ADLS Gen 2 API performance
Add a feature switch to optionally enable this feature
Add unit tests for correctness and performance

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
0 reexec 40 Docker mode activated.
_ Prechecks _
+1 dupname 0 No case conflicting files found.
+1 @author 0 The patch does not contain any @author tags.
+1 test4tests 0 The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 mvninstall 1077 trunk passed
+1 compile 33 trunk passed
+1 checkstyle 25 trunk passed
+1 mvnsite 35 trunk passed
+1 shadedclient 781 branch has no errors when building and testing our client artifacts.
+1 javadoc 28 trunk passed
0 spotbugs 52 Used deprecated FindBugs config; considering switching to SpotBugs.
+1 findbugs 50 trunk passed
-0 patch 77 Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 mvninstall 30 the patch passed
+1 compile 26 the patch passed
+1 javac 26 the patch passed
-0 checkstyle 18 hadoop-tools/hadoop-azure: The patch generated 8 new + 5 unchanged - 0 fixed = 13 total (was 5)
+1 mvnsite 28 the patch passed
+1 whitespace 0 The patch has no whitespace issues.
+1 xml 2 The patch has no ill-formed XML file.
+1 shadedclient 777 patch has no errors when building and testing our client artifacts.
+1 javadoc 24 the patch passed
+1 findbugs 57 the patch passed
_ Other Tests _
+1 unit 85 hadoop-azure in the patch passed.
+1 asflicense 30 The patch does not generate ASF License warnings.
3240
Subsystem Report/Notes
Docker Client=19.03.3 Server=19.03.3 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1611/2/artifact/out/Dockerfile
GITHUB PR #1611
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml
uname Linux aac8856df1c2 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / c561a70
Default Java 1.8.0_222
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1611/2/artifact/out/diff-checkstyle-hadoop-tools_hadoop-azure.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1611/2/testReport/
Max. process+thread count 438 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1611/2/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

@jeeteshm jeeteshm marked this pull request as ready for review October 14, 2019 02:17
@jeeteshm jeeteshm changed the title Hadoop 16612 Track Azure Blob File System client-perceived latency HADOOP-16612 Track Azure Blob File System client-perceived latency Oct 18, 2019
@steveloughran
Copy link
Contributor

Production code

This is making code fairly verbose. I think you can do more here.

  • add some private methods to invoke the latency tracker
  • Maybe use try-with-resources to managed the life of a specific operation, as we do with DurationInfo
  • it be good to have some documentation including a reference to the "ABFS API logging service"

Without the documentation, features like this only get used by the few people that know about them. And more insidiously, the only get maintained by people that care about. Which of course they don't do unless they or somebody they know is using the feature. It is in your interests to make sure we all do use it. And if we can use it to debug things, very very useful.

Tests

I worry that this is going to be very very brittle towards latency, overloaded machines, network settings, etc. There's already an azure test which is pretty unreliable for this reason (ITestAzureFileSystemInstrumentation). I don't want to this to have the same problems. Have you run any tests against the long haul store, or using parallel test runs to slow down the test?

@apache apache deleted a comment from hadoop-yetus Oct 25, 2019
Copy link
Author

@jeeteshm jeeteshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production code

This is making code fairly verbose. I think you can do more here.

* add some private methods to invoke the latency tracker

* Maybe use try-with-resources to managed the life of a specific operation, as we do with DurationInfo

* it be good to have some documentation including a reference to the "ABFS API logging service"

Without the documentation, features like this only get used by the few people that know about them. And more insidiously, the only get maintained by people that care about. Which of course they don't do unless they or somebody they know is using the feature. It is in your interests to make sure we all do use it. And if we can use it to debug things, very very useful.

Tests

I worry that this is going to be very very brittle towards latency, overloaded machines, network settings, etc. There's already an azure test which is pretty unreliable for this reason (ITestAzureFileSystemInstrumentation). I don't want to this to have the same problems. Have you run any tests against the long haul store, or using parallel test runs to slow down the test?

I am using try-with-resources now -- thanks for suggesting this. I never found this when I was searching for the C#'s using equivalent in Java.

I have also added some details on the ABFS logs, particularly how the logs look, how to enable and obtain them. Would you also want me to elaborate on the handling of these logs by the Azure's internal subsystems?

The tests here have no network IO -- all they are testing is correctness and performance of an isolated unit that uses a ConcurrentQueue at its core. Would you still consider them very very brittle? Passing of these tests also ensures that the addition of the tracking code doesn't add much cost to the existing ABFS code.

@jeeteshm
Copy link
Author

jeeteshm commented Nov 7, 2019

@steveloughran do you know how one can force @hadoop-yetus to run checkstyle scan/validation on these changes?

@jeeteshm jeeteshm changed the title HADOOP-16612 Track Azure Blob File System client-perceived latency Track Azure Blob File System client-perceived latency Nov 7, 2019
@jeeteshm jeeteshm changed the title Track Azure Blob File System client-perceived latency HADOOP-16612 Track Azure Blob File System client-perceived latency Nov 7, 2019
@jeeteshm
Copy link
Author

jeeteshm commented Nov 8, 2019

Looks good. Please fix the format issues and provide the tests result.

@DadanielZ: I have fixed the format/checkstyle issues. Test results are mentioned in the JIRA here: https://issues.apache.org/jira/browse/HADOOP-16612

@jeeteshm jeeteshm requested a review from goiri November 11, 2019 22:02
Copy link
Contributor

@DadanielZ DadanielZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, I will wait for another +1 before committing it.

@steveloughran
Copy link
Contributor

ok, it's good to merge -I'll let @DadanielZ do the work. One followup -should we add any policy doc on adding new rest calls -"tracks latency" should now be a checklist item, shouldn't it? For example #1711 needs it

@steveloughran
Copy link
Contributor

Da Zhou seems away right now, I'll have a go at merging. However before I Do that, I have just run a mvn javadoc:javadoc and it found a new error

[WARNING] ^
[WARNING] /Users/stevel/Hadoop/commit/apache-hadoop/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java:481: warning: no @return
[WARNING] public boolean shouldTrackLatency() {
[WARNING] ^

Can you fix that?

@bgaborg bgaborg self-requested a review November 14, 2019 17:13
@jeeteshm
Copy link
Author

Thank you @steveloughran for your attention! I have fixed all the javadoc warnings arising out of this PR. The test results are mentioned in the JIRA here: https://issues.apache.org/jira/browse/HADOOP-16612
Patch URL: https://issues.apache.org/jira/secure/attachment/12985886/HADOOP-16612.004.patch

@DadanielZ
Copy link
Contributor

Hi @bgaborg, I see you self-requested a review and it is pending, if there are no other comments I will commit this PR.

@bgaborg
Copy link

bgaborg commented Nov 19, 2019

@DadanielZ sure, go ahead. I just wanted to add myself and have this in my github start page since I'm learning this module's codebase.

@DadanielZ
Copy link
Contributor

@bgaborg thanks for the confirmation.

@DadanielZ
Copy link
Contributor

Committed

@DadanielZ DadanielZ closed this Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/azure changes related to azure; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants