-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17864. ABFS: Make provision for adding additional connections type #3335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: 3 tests that failed with timeout is set for more than 100MB buffer size choking the dev VM network when tests are run in parallel over 8 processes. When run individually they pass. Test failure observed in NonHNS-SharedKey: ITestAzureBlobFileSystemCheckAccess.testCheckAccessForAccountWithoutNS |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to jackson 2 and com.fasterxml.jackson.core;
(ideally abfs should cut all use of jackson1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABFS Driver dependency on com.fasterxml.jackson.core was replaced with org.codehaus.jackson in HADOOP-15659. @DadanielZ , Could you please help with the reason for this ?
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsHttpConnection.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why this?
- if bytes are read and discarded: increment an IOStatistic on this because its a sign of waste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldnt find the cause from the commit history, however I noticed many forums that suggest issues with data still being stream even though the stream was closed at client side. Hence believe this is to avoid any issues for that issues.
Have addressed the comment to add a new metric into IOStatistic to track bytes that are being discarded in HADOOP-17890 PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do JSON parse errors report as?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns either an IOException or JsonParseException (which is a child exception type of IOException).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check content type before trying to parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have addressed these in a separate PR HADOOP-17890 PR that refactors the Http handling code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really complex code with a lot of branches. Is there any way it could be cleaned up/refactored for easier reading and testing? For example, if the JSON parsing was independent & took an input stream, you could add a unit test which passed in invalid or incomplete JSON to see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have created a separate PR to address these comments to refactor the Http request handling HADOOP-17890 PR . The JSON parsing code is in a method parseListFilesResponse which takes input stream as a parameter to it.
|
🎊 +1 overall
This message was automatically generated. |
raymondlam12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on creating an abstract class called AbfsOperation and creating AbfsHttpOperation / AbfsFastpathOperation subclasses?
The naming convention is a bit confusing to me right now with AbfsHttpOperation, AbfsHttpConnection & AbfsFastpathConnection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that child class naming deviates from parent, but retained existing class name helped to avoid any pain in backporting these changes to repos maintained specifically by various distros. Renaming will lead to many files being changed. Will check with Steve on this and get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are users expected to call this or is this just internal helper for the constructor?
If so, could this be a private method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, needs to be only private. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a cut + paste of the code in AbfsHttpOperation.java .
Is that right?
And if not, could you point me to the code blocks that have changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all implementation code from the older AbfsHttpOperation class (now abstract) is moved to this class. There is no change.
There were few concerns raised to refactor existing Http handling code, have addressed them in different PR to keep them out of scope for this review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following on the previous comment of these properties, is this expectedAppendPos property used in all AbfsHttpOperation, namely Http and Fastpath connections?
If not, this should be bubbled up to the AbfsHttpConnection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, expect this to be a field to be updated when on Fastpath later on. Hence retained on the abstract class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all these properties reside in the base class?
Are they used in the AbfsFastpathConnection?
If not, I think the property should be bubbled up into the AbfsHttpConnection class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, even when on Fastpath, driver will receive HttpStatus code and other relevant details for the corresponding request, this enables existing logic for retry/throttling handling to continue to work. Has to be on base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to throw an exception here?
By default, when this field is unset, we would just return null.
Should we follow similar behaviour?
Do we expect this exception to ever be thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result here is the response from store backend and List calls will only be supported HttpConnection. Is more of a safe guard check, though it can never get into else case.
If it does for any reason, we will want the workload to fail immediately highlighting the reason explicitly for easy debugging.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
what is the status of this? can we get it into branch-3.3 for the next release? |
Hi @steveloughran, We couldnt get to completion on workload runs and currently the decision was taken to pause this feature work. Can you please guide me on the right way to update the JIRA to deferred status for next release. Thanks |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
@snvijaya what's the status with this? want the PR re-opened? start from scratch? |
This commit aims to facilitate upcoming work as part of adding an alternate connection to store backend - HADOOP-17853
The scope of the change is to make AbfsHttpOperation an abstract class and create a child class AbfsHttpConnection. Future connection types will be added as child of AbfsHttpOperation. Retaining the abstract class name to reduce any backport pain.
ABFS driver tests were run with HNS and non-HNS storage accounts over combinations of authentication types - OAuth and SharedKey. Tests results will be updated in conversation tab with each PR iteration.