New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-24705: Create/Alter/Drop tables based on storage handlers in HS2… #1960
HIVE-24705: Create/Alter/Drop tables based on storage handlers in HS2… #1960
Conversation
979434d
to
95e9846
Compare
* should return uri with table properties. | ||
*/ | ||
public URI getURIForAuth(Map<String, String> tableProperties) throws URISyntaxException; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Add newline character on the last line.
*/ | ||
@InterfaceAudience.Public | ||
@InterfaceStability.Stable | ||
public interface HiveStorageAuthorizationHandler{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the name "HiveStorageAuthorizationHandler" does not seem to fit the purpose of the interface. I am just throwing out names now. WOuld something like these make more sense? HiveURIBasedAuthorization or HiveUriBasedStorageHandler or something akin to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, I was using HiveURIBasedAuthorization as class name but Thejas suggested me to use HiveStorageAuthorizationHandler as this class may be evolved into the future to handle other methods of authorization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats ok then. Thank you
@@ -65,13 +68,16 @@ | |||
/** | |||
* Hive Kafka storage handler to allow user to read and write from/to Kafka message bus. | |||
*/ | |||
@SuppressWarnings("ALL") public class KafkaStorageHandler extends DefaultHiveMetaHook implements HiveStorageHandler { | |||
@SuppressWarnings("ALL") public class KafkaStorageHandler extends DefaultHiveMetaHook implements HiveStorageHandler, HiveStorageAuthorizationHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think the StorageHandlers are serializable classes. Can you confirm? If they are, then we should stamp these classes with serialVersionUID to avoid execution engines using the wrong jars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, All these storage handlers are not serializable.
|
||
private static final Logger LOG = LoggerFactory.getLogger(KafkaStorageHandler.class); | ||
private static final String KAFKA_STORAGE_HANDLER = "org.apache.hadoop.hive.kafka.KafkaStorageHandler"; | ||
|
||
private Configuration configuration; | ||
|
||
/** Kafka prefix to form the URI for authentication */ | ||
private static final String KAFKA_PREFIX = "kafka:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we call this constant the same across all handlers instead it being specific to each handler? Something like URI_PREFIX instead of KAFKA_PREFIX, HBASE_PREFIX etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prefix is used by Ranger to determine which storage handler class does the URI corresponds to.
HiveStorageHandler hiveStorageHandler = null; | ||
Configuration conf = new Configuration(); | ||
Map<String, String> tableProperties = new HashMap<>(); | ||
tableProperties.putAll(newTable.getSd().getSerdeInfo().getParameters()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this logic. Why are the SerDe.getParameters() being added to Table.getParameters()? Is this what we always do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableProperties is just a variable. I'm not setting this to the table parameters. The reason why I'm combining table parameters and serde parameters are, the user can give table properties like table name, column name or connection information in TBLPropperties or SERDE. So I need to put all the properties in a map and send the map to respective storage handler and extract table properties in it.
tableProperties.putAll(table.getParameters()); | ||
try { | ||
Method methodIsImplemented = table.getStorageHandler().getClass().getMethod("getURIForAuth", Map.class); | ||
if(methodIsImplemented != null && table.getStorageHandler() instanceof DefaultStorageHandler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so DefaultStorageHandler implements the "HiveStorageAuthorizationHandler" interface right? what is the difference between these if else conditions.
Feels like the code should do something like this
if (table.getStorageHandler() instancefo HiveStorageAuthorizationHandler) {
storageuri = ((HiveStorageAuthorizationHandler) table.getStorageHandler()).getURIForAuth(tableProperties)'
}
throw nullExp; | ||
}catch(Exception ex){ | ||
//Custom storage handler that has not implemented the getURIForAuth() | ||
storageuri = table.getStorageHandler().getClass().getName()+"://"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the above change makes sense, then this code should go into the else block instead of the exception handler.
} | ||
}catch(Exception ex){ | ||
//Custom storage handler that has not implemented the getURIForAuth() | ||
storageUri = hiveStorageHandler.getClass().getName()+"://"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if above comment makes sense, then this should go into the else block of the condition instead of the exception handler.
@@ -116,4 +157,13 @@ private String buildCommandString(String cmdStr, Table tbl) { | |||
} | |||
return ret; | |||
} | |||
|
|||
private static String getTablePropsForCustomStorageHandler(Map<String, String> tableProperties){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consolidate these 2 implementations of this method into some utility class?
System.out.println("Class not found. Storage handler will be set to null: "+ex); | ||
} | ||
t.setStorageHandler(storageHandler); | ||
for(Map.Entry<String,String> serdeMap : storageFormat.getSerdeProps().entrySet()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we are setting serde params on the table params? can you give me an example of why we want to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, in the compilation phase we are not sending serde information for ranger auth. This storageFormat's serde has table mappings(eg: hbase table name, column family) of external that needs to be set to table serde.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. +1
…STORAGEHANDLER_URI. Also added Owner information to these objects.
…n storage handlers
89bd108
to
ac368ad
Compare
fix has been merged to master. Please close this PR. |
… should be authorized by Ranger/Sentry
What changes were proposed in this pull request?
Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler e.t.c should implement a method getURIForAuthentication() which returns a URI that is formed from table properties.
Why are the changes needed?
With doAs=false in Hive3.x, whenever a user is trying to create a table based on storage handlers on external storage for ex: HBase table, the end user we are seeing is hive so we cannot really enforce the condition in Apache Ranger/Sentry on the end-user. So, we need to enforce this condition in the hive in the event of create/alter/drop tables based on storage handlers.
Does this PR introduce any user-facing change?
By enforcing authorization, only the users granted with create/alter/drop privileges in ranger/sentry can only do these operations on tables based on storage handlers.
How was this patch tested?
Local machine, Remote cluster.