Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27833: Hive Acid Replication Support for Dell Powerscale #4841

Merged
merged 1 commit into from Nov 22, 2023

Conversation

harshal-16
Copy link
Contributor

@harshal-16 harshal-16 commented Oct 31, 2023

Details:

  • Isilon supports xAttr only on /.reserved/raw path. Because of that checkFileSystemXAttrSupport throws AccessControlException
  • In case of HDFS encryption zone, a virtual path /.reserved/raw gives superusers direct access to underlying block data in filesystem.
    This allows superusers to DistCp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data.
    Isilon system only supports getXAttrs on /.reserved/raw paths

What changes were proposed in this pull request?

  • Added Additional property dfs.xattr.supported.only.on.reserved.namespace which will be true in case of isilon
  • Modified logic of checkFileSystemXAttrSupport to return false in case of isilon if path doesn't start with /.reserved/raw

Why are the changes needed?

  • Isilon only supports xattr on /.reserved/raw path else it throws AccessControlException

Does this PR introduce any user-facing change?

  • From User perspective flow will be same

Is the change a dependency upgrade?

How was this patch tested?

  • Tested on Cluster
  • Added Test case

Comment on lines 176 to 180
catch (AccessControlException e) {
LOG.warn(e.getMessage());
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't catching this wrong? Other systems like HDFS throws ACE for permission issues, & here we will be catching them as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Isilon, while calling fs.getXAttrs it throws org.apache.hadoop.security.AccessControlException with message Isilon only supports getXAttrs on /.reserved/raw paths

should I add additional check also in catch? please suggest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function just checks for xAttr support, if it has access issues then it will fail at a much earlier stage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't fail, it won't preserve xAttr and replication will go on with DistCp not preserving xAttr, this method was added to check if the FS supports xAttrs or not, if it does support, the xAttrs were always preserved.

The FileSystem description of getXAttr

   * @throws UnsupportedOperationException if the operation is unsupported
   *         (default outcome).
   */
  public byte[] getXAttr(Path path, String name) throws IOException {

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3280-L3283

So, Ideally every HCFS should abide by this, so ignoring ACE ain't an option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshal-16, if Isilon only supports getXAttrs on /.reserved/raw paths then this method returns false is not correct. We need to check for fs.getXAttrs(new Path(Path.SEPARATOR)) at those supported locations only for Isolon then to check whether whether Xattrs is supported. At this stage do we have knowledge about the underlying file system?
We also need to test for extended attributes and relevant replication. Thanks.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly playing with ACE & this logic doesn't look very safe to me. This method is used by other components apart from replication as well.

As far as I remember, the use of raw paths is governed by this conf (Can check DirCopyTask)

    REPL_ADD_RAW_RESERVED_NAMESPACE("hive.repl.add.raw.reserved.namespace", false,
        "For TDE with same encryption keys on source and target, allow Distcp super user to access \n"
            + "the raw bytes from filesystem without decrypting on source and then encrypting on target."),

So, maybe we should a coin a logic around this. Like the path to try can be root, if this conf is set we can use a reserved path & remove this ACE logic itself.

diff --git a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java b/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
index 18efe167a6..4f583e2595 100644
--- a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
+++ b/common/src/java/org/apache/hadoop/hive/common/FileUtils.java
@@ -75,6 +75,8 @@
 
 import javax.security.auth.login.LoginException;
 
+import static org.apache.hadoop.hive.conf.HiveConf.ConfVars.REPL_ADD_RAW_RESERVED_NAMESPACE;
+
 /**
  * Collection of file manipulation utilities common across Hive.
  */
@@ -896,7 +898,8 @@ private static void checkDependencies(FileSystem srcFS, Path src, FileSystem dst
   }
 
   public static boolean shouldPreserveXAttrs(HiveConf conf, FileSystem srcFS, FileSystem dstFS) throws IOException {
-    if (!Utils.checkFileSystemXAttrSupport(srcFS) || !Utils.checkFileSystemXAttrSupport(dstFS)){
+    Path path = conf.getBoolVar(REPL_ADD_RAW_RESERVED_NAMESPACE) ? new Path("/.reserved/raw/") : new Path("/");
+    if (!Utils.checkFileSystemXAttrSupport(srcFS, path) || !Utils.checkFileSystemXAttrSupport(dstFS, path)) {
       return false;
     }
     for (Map.Entry<String,String> entry : conf.getPropsWithPrefix(Utils.DISTCP_OPTIONS_PREFIX).entrySet()) {
diff --git a/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java b/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java
index 339f0b5e9c..2513eee837 100644
--- a/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java
+++ b/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java
@@ -164,9 +164,9 @@ public static Filter getXSRFFilter() {
     return Utils.constructXSRFFilter();
   }
 
-  public static boolean checkFileSystemXAttrSupport(FileSystem fs) throws IOException {
+  public static boolean checkFileSystemXAttrSupport(FileSystem fs, Path path) throws IOException {
     try {
-      fs.getXAttrs(new Path(Path.SEPARATOR));
+      fs.getXAttrs(path);
       return true;
     } catch (UnsupportedOperationException e) {
       LOG.warn("XAttr won't be preserved since it is not supported for file system: " + fs.getUri());

This might hack your use case, if you are using reserved paths, but when not using reserved paths, it would throw you out with ACE, which your current code will also do.

For your current code->
For isilion, if not using reserve paths, you will set preserve Xattrs but during distcp it will throw you an ACE, because it doesn't support XAttrs in generic case, so your replication will anyway fail with your current code as well

--> Thoughts from gallery not from field, It has been long time since I checked the replication code, so I might have old info or wrong info, If other folks are convinced, please proceed no objections from my side

@@ -172,6 +173,11 @@ public static boolean checkFileSystemXAttrSupport(FileSystem fs) throws IOExcept
LOG.warn("XAttr won't be preserved since it is not supported for file system: " + fs.getUri());
return false;
}
catch (AccessControlException e) {
LOG.warn("Checking for XAttrSupport at {} Location", RAW_RESERVED_VIRTUAL_PATH);
fs.getXAttrs(new Path(RAW_RESERVED_VIRTUAL_PATH));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know you are gonna shoot a distcp on reserved path? Say you have Isilon system, but you aren't using the reserved path. In that case this will return true, but when you shoot an actual distcp, you will fail because you were using the normal paths

Details:
* Isilon supports xAttr only on /.reserved/raw path. Because of that checkFileSystemXAttrSupport throws AccessControlException
* In case of HDFS encryption zone, a virtual path /.reserved/raw gives superusers direct access to underlying block data in filesystem.
This allows superusers to DistCp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data.
Isilon system only supports getXAttrs on /.reserved/raw paths

Code Change:
* Added Additional property dfs.xattr.supported.only.on.reserved.namespace which will be true in case of isilon
* Modified logic of checkFileSystemXAttrSupport to return false in case of isilon if path doesn't start with /.reserved/raw
Testing:
* Tested on Cluster
Copy link

sonarcloud bot commented Nov 17, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

@pudidic
Copy link
Contributor

pudidic commented Nov 21, 2023

Looks good to me. Because the default value and behavior is kept same. Only if the configuration is enabled, it works. However, please fill the fields of the default PR template for reviewers.

@harshal-16
Copy link
Contributor Author

Filled the default PR template, please have a look @pudidic

@pudidic pudidic merged commit ae64e6d into apache:master Nov 22, 2023
5 checks passed
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants