-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-15691 Add PathCapabilities to FS and FC to complement StreamCapabilities #568
HADOOP-15691 Add PathCapabilities to FS and FC to complement StreamCapabilities #568
Conversation
8a2c9f4
to
2a1fd85
Compare
javac are about a deprecated method, cannot be fixed. Checkstyles can be
|
2a1fd85
to
3632f66
Compare
checkstyle
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @steveloughran, it looks outstanding. I had some minor comments and a few questions.
General comments
- We probably have to overwrite the function
hasPathCapability
in the following subclasses:- AliyunOSS
- FTPFileSystem
- SwiftNativeFileSystem (from
FileSystem
's descendants) and maybe also - ChRootedFs (from
AbstractFileSystem
's descendants)
- I am not familiar with FsLinkResolution part, but seemed ok to me.
- We can mention in the javadoc of
StreamCapabilities
that it's the Stream counterpart ofPathCapabilities
to make them linking to each other. - It seems to me that all the filesystem contract xml's are duplicating this feature. How would you elaborate this? Should we get rid of the contract xml's and rewrite the tests using the
PathCapabilities
method? If this is the case, that could go in a follow-up a jira as well.
For e.g. inAbstractFSContractTestBase$skipIfUnsupported
we query for thefs.contract.append
config, which could also be returned by thePathCapabilities
.
Questions
- Could you provide some examples how a capability can be supported under a path but not under another at the same time? Also it would make sense to add this example into
filesystem.md
- Can the user have the assumption that if a path supports a feature, then it is also supported for all its children? This could also be added to the markdown file.
...-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonPathCapabilities.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
Show resolved
Hide resolved
...n-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/PathCapabilitiesSupport.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Show resolved
Hide resolved
case CommonPathCapabilities.FS_READ_ONLY_CONNECTOR: | ||
return true; | ||
default: | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use super.hasPathCapability(path, capability)
here?
It would delegate to FileSystem's call, so effectively no difference (including the supportAcls()
call there), but would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way it makes clear that this is R/O, but am happy to change it.
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
Show resolved
Hide resolved
@@ -2017,6 +2027,36 @@ public void setTestProvider(KeyProvider kp) { | |||
testProvider = kp; | |||
} | |||
|
|||
/** | |||
* This filesystem's capabilities must be in sync with that of HDFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we enforce that two function must be in sync? I consider this a bit dangerous.
I'm thinking of moving the switch part (which is basically a static function) to a helper class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. a bit convoluted as we need to indicate when to invoke the superclass.
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFs.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding) (pending on newest Jenkins run).
Thanks for taking care my suggestions.
It was a good idea to move this feature under the fs.capabilities configs, and I also found the DfsPathCapabilities
to be handy, so I gave an approval.
...on-project/hadoop-common/src/main/java/org/apache/hadoop/fs/http/AbstractHttpFileSystem.java
Show resolved
Hide resolved
...p-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
Show resolved
Hide resolved
d200274
to
d42d778
Compare
rebased to trunk to show this is still a live PR; not done the testing on the stores though. |
💔 -1 overall
This message was automatically generated. |
all checksums are legit, will fix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM. A couple of minor comments inline.
@@ -1371,4 +1373,16 @@ public boolean equals(Object other) { | |||
new CompletableFuture<>(), () -> open(path, bufferSize)); | |||
} | |||
|
|||
public boolean hasPathCapability(final Path path, | |||
final String capability) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been talking about something like this for years, thanks for working on this. Taking a Path
instead of a scheme
seems right: it is flexible (general design, specific implementation applies here). Was curious about examples where different paths in the same FS would have different capabilities. I suppose S3A could have different buckets in different regions or with different configurations (e.g. permissions, or S3Guard enabled/disabled, etc.) Just noticed you mentioned ViewFS as well. Another good case for per-path capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was curious about examples where different paths in the same FS would have different capabilities.
Files HDFS encryption zones behave differently; viewfs relays things, and any DFS whose mount points may have different semantics can do it. Oh, and WASB has an option for special path where leases need to be acquired before renames -HBase needs that
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/pathcapabilities.md
Outdated
Show resolved
Hide resolved
|
||
* Define and stabilize new cross-filesystem capability flags (preferred), | ||
and so formally add a new `fs.capability` value. | ||
* Use the schema of the filesystem to as a prefix for their own options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/schema/scheme/? I often mix these up.
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/pathcapabilities.md
Show resolved
Hide resolved
|
||
Similarly, there is no checking that the caller has the permission to | ||
perform a specific operation: just because a feature is available on that | ||
path does not mean that the caller can execute the operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this distinction. We don't want perm checks in the capability query implementations unless necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not just that -it means that we can often avoid any round trip at all
…pabilities. This contains all the work to date, reapplied to trunk so that Yetus will take up the PR again. Change-Id: Icfab71d177c88c75fd651d7eea56d3d1e8618f61
* Cut the Delegation Token probe as it was too complicated to get right; that can be added later. * By having validatePathCapability(path, string) return the string of the locale-english-lower-case capability, it can be used directlly in the switch() statement, reduce complexity/duplication in all uses, and avoid uses forgetting to do the conversion. * Javadocs for CommonPathCapabilities remove backlinks to FileSystem APIs: they're generic to FS and FC after all. * ChRootedFileSystem calls the capability test on the full path. * ContractTestUtils adds assertLacksPathCapabilities() assertion. Change-Id: Ie5fdec9bb1e64d14880bbe8704ec0417d9073e2b Testing: none yet, waiting for Yetus. ADLs, Azure, S3A to follow
also move capabilties off schemas into fs.schema.capability. why? -consistent with config options. -may make it easier to pass these around (e.g in job confs) -allows for more generic use than just FS/FC Change-Id: Idef1162006aee664b16cefdb99e303fa2f3407dd
Change-Id: I30918995e3eee6b9a7c5da8d90c19f9c9aed2496
thanks for the vote; just rebasing and retesting after s/schema/r/scheme in the markdown |
f17bbdb
to
958c402
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Test failure is https://issues.apache.org/jira/browse/HADOOP-16605; going ahead with commit |
committed to trunk -thanks everyone! |
…for TestFramework Adding utilities and corresponding test for low and high level api Author: Sanil Jain <snjain@linkedin.com> Reviewers: Shanthoosh Venkataraman <spvenkat@usc.edu> Closes apache#568 from Sanil15/SAMZA-1759 and squashes the following commits: a4861089 [Sanil Jain] Reverting back travis increase for wait time 876a3a58 [Sanil Jain] Increase travis timeout 9e6482b1 [Sanil Jain] Fixing travis build, removing unused imports 526244e8 [Sanil Jain] Merge branch 'master' into SAMZA-1759 9f489acf [Sanil Jain] Moving tests that use MessageStreamAssert to same package name in test folder to use package private a93e5a14 [Sanil Jain] Marking collection transient to ensure newer api changes work 5e6d3ed1 [Sanil Jain] Making MessageStreamAssert package private a5a521cc [Sanil Jain] Splitting operator assertions outside StreamAssert to MessageStreamAssert, addressing review, renaming utils d1e64180 [Sanil Jain] Cleaning unused imports ff218ff7 [Sanil Jain] Removing contains method for operator level assertios for high level api c5768772 [Sanil Jain] Merge branch 'SAMZA-1759' of https://github.com/Sanil15/samza into SAMZA-1759 c69d1bbb [Sanil Jain] StreamAssert Utilities for Low level and High Level Api, Adding More Test for Low Level api for testing multiple partitions and in mulithreaded mode e3c8e2a5 [Sanil Jain] StreamAssert Utilities for Low level and High Level Api, Adding More Test for Low Level api for testing multiple partitions and in mulithreaded mode
This contains all the work to date, reapplied to trunk so that Yetus will take up the PR again.
Change-Id: Icfab71d177c88c75fd651d7eea56d3d1e8618f61