New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27833: Hive Acid Replication Support for Dell Powerscale #4841
Conversation
catch (AccessControlException e) { | ||
LOG.warn(e.getMessage()); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't catching this wrong? Other systems like HDFS throws ACE for permission issues, & here we will be catching them as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Isilon, while calling fs.getXAttrs
it throws org.apache.hadoop.security.AccessControlException
with message Isilon only supports getXAttrs on /.reserved/raw paths
should I add additional check also in catch? please suggest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function just checks for xAttr support, if it has access issues then it will fail at a much earlier stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't fail, it won't preserve xAttr and replication will go on with DistCp not preserving xAttr, this method was added to check if the FS supports xAttrs or not, if it does support, the xAttrs were always preserved.
The FileSystem description of getXAttr
* @throws UnsupportedOperationException if the operation is unsupported
* (default outcome).
*/
public byte[] getXAttr(Path path, String name) throws IOException {
So, Ideally every HCFS should abide by this, so ignoring ACE ain't an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshal-16, if Isilon only supports getXAttrs on /.reserved/raw paths
then this method returns false is not correct. We need to check for fs.getXAttrs(new Path(Path.SEPARATOR))
at those supported locations only for Isolon then to check whether whether Xattrs is supported. At this stage do we have knowledge about the underlying file system?
We also need to test for extended attributes and relevant replication. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly playing with ACE & this logic doesn't look very safe to me. This method is used by other components apart from replication as well.
As far as I remember, the use of raw paths is governed by this conf (Can check DirCopyTask)
REPL_ADD_RAW_RESERVED_NAMESPACE("hive.repl.add.raw.reserved.namespace", false,
"For TDE with same encryption keys on source and target, allow Distcp super user to access \n"
+ "the raw bytes from filesystem without decrypting on source and then encrypting on target."),
So, maybe we should a coin a logic around this. Like the path to try can be root, if this conf is set we can use a reserved path & remove this ACE logic itself.
diff --git a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java b/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
index 18efe167a6..4f583e2595 100644
--- a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
+++ b/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
@@ -75,6 +75,8 @@
import javax.security.auth.login.LoginException;
+import static org.apache.hadoop.hive.conf.HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE;
+
/**
* Collection of file manipulation utilities common across Hive.
*/
@@ -896,7 +898,8 @@ private static void checkDependencies(FileSystem srcFS, Path src, FileSystem dst
}
public static boolean shouldPreserveXAttrs(HiveConf conf, FileSystem srcFS, FileSystem dstFS) throws IOException {
- if (!Utils.checkFileSystemXAttrSupport(srcFS) || !Utils.checkFileSystemXAttrSupport(dstFS)){
+ Path path = conf.getBoolVar(REPL_ADD_RAW_RESERVED_NAMESPACE) ? new Path("/.reserved/raw/") : new Path("/");
+ if (!Utils.checkFileSystemXAttrSupport(srcFS, path) || !Utils.checkFileSystemXAttrSupport(dstFS, path)) {
return false;
}
for (Map.Entry<String,String> entry : conf.getPropsWithPrefix(Utils.DISTCP_OPTIONS_PREFIX).entrySet()) {
diff --git a/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java b/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java
index 339f0b5e9c..2513eee837 100644
--- a/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java
+++ b/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java
@@ -164,9 +164,9 @@ public static Filter getXSRFFilter() {
return Utils.constructXSRFFilter();
}
- public static boolean checkFileSystemXAttrSupport(FileSystem fs) throws IOException {
+ public static boolean checkFileSystemXAttrSupport(FileSystem fs, Path path) throws IOException {
try {
- fs.getXAttrs(new Path(Path.SEPARATOR));
+ fs.getXAttrs(path);
return true;
} catch (UnsupportedOperationException e) {
LOG.warn("XAttr won't be preserved since it is not supported for file system: " + fs.getUri());
This might hack your use case, if you are using reserved paths, but when not using reserved paths, it would throw you out with ACE, which your current code will also do.
For your current code->
For isilion, if not using reserve paths, you will set preserve Xattrs but during distcp it will throw you an ACE, because it doesn't support XAttrs in generic case, so your replication will anyway fail with your current code as well
--> Thoughts from gallery not from field, It has been long time since I checked the replication code, so I might have old info or wrong info, If other folks are convinced, please proceed no objections from my side
@@ -172,6 +173,11 @@ public static boolean checkFileSystemXAttrSupport(FileSystem fs) throws IOExcept | |||
LOG.warn("XAttr won't be preserved since it is not supported for file system: " + fs.getUri()); | |||
return false; | |||
} | |||
catch (AccessControlException e) { | |||
LOG.warn("Checking for XAttrSupport at {} Location", RAW_RESERVED_VIRTUAL_PATH); | |||
fs.getXAttrs(new Path(RAW_RESERVED_VIRTUAL_PATH)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know you are gonna shoot a distcp on reserved path? Say you have Isilon system, but you aren't using the reserved path. In that case this will return true, but when you shoot an actual distcp, you will fail because you were using the normal paths
Details: * Isilon supports xAttr only on /.reserved/raw path. Because of that checkFileSystemXAttrSupport throws AccessControlException * In case of HDFS encryption zone, a virtual path /.reserved/raw gives superusers direct access to underlying block data in filesystem. This allows superusers to DistCp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data. Isilon system only supports getXAttrs on /.reserved/raw paths Code Change: * Added Additional property dfs.xattr.supported.only.on.reserved.namespace which will be true in case of isilon * Modified logic of checkFileSystemXAttrSupport to return false in case of isilon if path doesn't start with /.reserved/raw Testing: * Tested on Cluster
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
Looks good to me. Because the default value and behavior is kept same. Only if the configuration is enabled, it works. However, please fill the fields of the default PR template for reviewers. |
Filled the default PR template, please have a look @pudidic |
…#4841) (Harshal Patel, reviewed by Teddy Choi)
Details:
This allows superusers to DistCp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data.
Isilon system only supports getXAttrs on /.reserved/raw paths
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?