Skip to content
This repository has been archived by the owner on Dec 16, 2021. It is now read-only.

[thrift-audit-hook] Adding auditing for Thrift hooks #4

Merged
merged 1 commit into from
May 10, 2016

Conversation

john-bodley
Copy link
Collaborator

This PR adds audit logging hooks to metastore command instantiated via the Thrift interface. Note some refactoring of the existing audit log hook was necessary by introducing a SessionStateLite object to support Hive Operators not defined via the org.apache.hadoop.hive.ql.plan.HiveOperation enum.

The metastore Thrift hooks trigger via registered listeners where the various events are coerced into ReadEntity and WriteEntity objects.

Note I'm unsure why this is but the AlterPartitionEvent provides only the old and new partitions without specifying a table which is somewhat perplexing as partitions are associated with tables. In the onAlterPartition callback a dummy metadata.Table instance with the bare necessities is created in order to instantiate a metadata.Partition object. This approach might not be the most elegant but it seems to capture the relevant information of the serialized object.

Finally there's some additional cleanup of the HiveOperation enum as well as leveraging the getDbConnectionFactory for the AuditLogHookTest tests.

to: @plypaul

Tested by running the following unit tests:

  • AuditLogHookTest
  • MetastoreAuditLogHookTest

@john-bodley john-bodley force-pushed the johnbodley-thrift-audit-hook branch from 911d0aa to c5d956f Compare May 2, 2016 16:38
// Will wait BASE_SLEEP * 2 ^ (attempt no.) between attempts.
private static final int BASE_SLEEP = 1;

public static String DB_USERNAME = "airbnb.reair.audit_log.db.username";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the rollout, we might want to log to a separate set of tables, so could you change these to a separate set of keys?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Could you provide me with the temporary keys we should use?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airbnb.reair.metastore.audit_log.db.username and likewise.

@plypaul
Copy link
Collaborator

plypaul commented May 3, 2016

Great work! It looks good overall, but there may be some checkstyle issues though - can you try running ./gradlew jar?

try {

// Table is invariant and thus arbitrary choice between old and new.
Table table = new Table(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event doesn't have the whole table object?

Copy link
Collaborator Author

@john-bodley john-bodley May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no. I'm somewhat perplexed as to why.

https://hive.apache.org/javadocs/r0.13.1/api/metastore/org/apache/hadoop/hive/metastore/events/AlterPartitionEvent.html

I'll update the comment to explain this.

@plypaul
Copy link
Collaborator

plypaul commented May 3, 2016

Also might be good to rename AuditLogHook to CliAuditLogHook to differentiate it from the MetastoreAuditLogHook.

* Audit logging for the metastore Thrift server. Comprises of a series of
* event listeners adding auditing to the various Trift event.
*/
public class MetastoreAuditLogHook extends MetaStoreEventListener {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is better named a *Listener instead of a "*Hook"?

@john-bodley
Copy link
Collaborator Author

I ran ./gradlew jar and say no issues with regards to check styles.

@plypaul
Copy link
Collaborator

plypaul commented May 3, 2016

Sorry, mean to say ./gradlew build.

private static final int BASE_SLEEP = 1;

public static String DB_USERNAME =
"airbnb.reair.metastoreaudit_log.db.username";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing period

@john-bodley john-bodley force-pushed the johnbodley-thrift-audit-hook branch from 04afcb7 to 716d809 Compare May 4, 2016 05:40
try {
Set<ReadEntity> readEntities = new HashSet<>();
readEntities.add(new ReadEntity(new Table(event.getTable())));
Set<WriteEntity> writeEntities = new HashSet<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current convention is that the dropped table is in outputs.

@plypaul
Copy link
Collaborator

plypaul commented May 9, 2016

Chatted about comments offline - LGTM. Once you rebase and squash, I'll merge it in.

@plypaul
Copy link
Collaborator

plypaul commented May 9, 2016

Can you update the class name in hive-hooks/src/main/resources/hook_configuration_template.xml?

@john-bodley john-bodley force-pushed the johnbodley-thrift-audit-hook branch from e02e49f to 1d00ddb Compare May 10, 2016 01:42
@plypaul plypaul merged commit 2a68be2 into master May 10, 2016
@plypaul plypaul deleted the johnbodley-thrift-audit-hook branch May 10, 2016 02:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants