Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24705: Create/Alter/Drop tables based on storage handlers in HS2… #1960

Closed

Conversation

saihemanth-cloudera
Copy link
Contributor

… should be authorized by Ranger/Sentry

What changes were proposed in this pull request?

Built-in hive storage handlers like HbaseStorageHandler, KafkaStorageHandler e.t.c should implement a method getURIForAuthentication() which returns a URI that is formed from table properties.

Why are the changes needed?

With doAs=false in Hive3.x, whenever a user is trying to create a table based on storage handlers on external storage for ex: HBase table, the end user we are seeing is hive so we cannot really enforce the condition in Apache Ranger/Sentry on the end-user. So, we need to enforce this condition in the hive in the event of create/alter/drop tables based on storage handlers.

Does this PR introduce any user-facing change?

By enforcing authorization, only the users granted with create/alter/drop privileges in ranger/sentry can only do these operations on tables based on storage handlers.

How was this patch tested?

Local machine, Remote cluster.

* should return uri with table properties.
*/
public URI getURIForAuth(Map<String, String> tableProperties) throws URISyntaxException;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add newline character on the last line.

*/
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface HiveStorageAuthorizationHandler{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name "HiveStorageAuthorizationHandler" does not seem to fit the purpose of the interface. I am just throwing out names now. WOuld something like these make more sense? HiveURIBasedAuthorization or HiveUriBasedStorageHandler or something akin to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I was using HiveURIBasedAuthorization as class name but Thejas suggested me to use HiveStorageAuthorizationHandler as this class may be evolved into the future to handle other methods of authorization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats ok then. Thank you

@@ -65,13 +68,16 @@
/**
* Hive Kafka storage handler to allow user to read and write from/to Kafka message bus.
*/
@SuppressWarnings("ALL") public class KafkaStorageHandler extends DefaultHiveMetaHook implements HiveStorageHandler {
@SuppressWarnings("ALL") public class KafkaStorageHandler extends DefaultHiveMetaHook implements HiveStorageHandler, HiveStorageAuthorizationHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think the StorageHandlers are serializable classes. Can you confirm? If they are, then we should stamp these classes with serialVersionUID to avoid execution engines using the wrong jars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, All these storage handlers are not serializable.


private static final Logger LOG = LoggerFactory.getLogger(KafkaStorageHandler.class);
private static final String KAFKA_STORAGE_HANDLER = "org.apache.hadoop.hive.kafka.KafkaStorageHandler";

private Configuration configuration;

/** Kafka prefix to form the URI for authentication */
private static final String KAFKA_PREFIX = "kafka:";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we call this constant the same across all handlers instead it being specific to each handler? Something like URI_PREFIX instead of KAFKA_PREFIX, HBASE_PREFIX etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prefix is used by Ranger to determine which storage handler class does the URI corresponds to.

HiveStorageHandler hiveStorageHandler = null;
Configuration conf = new Configuration();
Map<String, String> tableProperties = new HashMap<>();
tableProperties.putAll(newTable.getSd().getSerdeInfo().getParameters());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this logic. Why are the SerDe.getParameters() being added to Table.getParameters()? Is this what we always do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableProperties is just a variable. I'm not setting this to the table parameters. The reason why I'm combining table parameters and serde parameters are, the user can give table properties like table name, column name or connection information in TBLPropperties or SERDE. So I need to put all the properties in a map and send the map to respective storage handler and extract table properties in it.

tableProperties.putAll(table.getParameters());
try {
Method methodIsImplemented = table.getStorageHandler().getClass().getMethod("getURIForAuth", Map.class);
if(methodIsImplemented != null && table.getStorageHandler() instanceof DefaultStorageHandler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so DefaultStorageHandler implements the "HiveStorageAuthorizationHandler" interface right? what is the difference between these if else conditions.
Feels like the code should do something like this
if (table.getStorageHandler() instancefo HiveStorageAuthorizationHandler) {
storageuri = ((HiveStorageAuthorizationHandler) table.getStorageHandler()).getURIForAuth(tableProperties)'
}

throw nullExp;
}catch(Exception ex){
//Custom storage handler that has not implemented the getURIForAuth()
storageuri = table.getStorageHandler().getClass().getName()+"://"+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the above change makes sense, then this code should go into the else block instead of the exception handler.

}
}catch(Exception ex){
//Custom storage handler that has not implemented the getURIForAuth()
storageUri = hiveStorageHandler.getClass().getName()+"://"+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if above comment makes sense, then this should go into the else block of the condition instead of the exception handler.

@@ -116,4 +157,13 @@ private String buildCommandString(String cmdStr, Table tbl) {
}
return ret;
}

private static String getTablePropsForCustomStorageHandler(Map<String, String> tableProperties){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consolidate these 2 implementations of this method into some utility class?

System.out.println("Class not found. Storage handler will be set to null: "+ex);
}
t.setStorageHandler(storageHandler);
for(Map.Entry<String,String> serdeMap : storageFormat.getSerdeProps().entrySet()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we are setting serde params on the table params? can you give me an example of why we want to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in the compilation phase we are not sending serde information for ranger auth. This storageFormat's serde has table mappings(eg: hbase table name, column family) of external that needs to be set to table serde.

Copy link
Contributor

@nrg4878 nrg4878 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. +1

@nrg4878
Copy link
Contributor

nrg4878 commented Aug 4, 2021

fix has been merged to master. Please close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants