Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging implementation for AuditManager and audit more endpoints #15480

Merged
merged 31 commits into from
Dec 19, 2023

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Dec 4, 2023

Description

This PR adds an implementation of AuditManager that simply logs the audit events instead of persisting them to metadata store.

Sample audit log

Setup

Authentication: none
druid.audit.manager.type=log

2023-12-08T14:20:07,403 WARN [qtp972079939-170] org.apache.druid.server.audit.AuditLogger - 
User[console], identity[allowAll], IP[127.0.0.1] performed action[rules]
 on key[inline_data] with comment[Load 2 replicas in the default tier].
 Request[null],
 payload[[{"tieredReplicants":{"_default_tier":2},"useDefaultTierForNull":true,"type":"loadForever"}]].

Changes

  • Add log implementation for AuditManager alongwith SQLAuditManager
  • LoggingAuditManager simply logs the audit event. Thus, it returns empty for all fetchAuditHistory calls
  • Add new config druid.audit.manager.type which can take values log, sql (default)
  • Add new config druid.audit.manager.logLevel which can take values DEBUG, INFO, WARN. This gets activated only if type is log.
  • Remove usage of ConfigSerde from AuditManager as audit is not just limited to configs
  • Add AuditSerdeHelper for a single implementation of serialization/deserialization of audit payload that can be used by different AuditManager implementations

How is audit performed

All existing and new audited REST API endpoints accept two headers X-Druid-Author and X-Druid-Comment which are being used to populate the audit info. The web-console currently passes console as the value for X-Druid-Author.

An alternative to this can be to extract the authentication result set as an attribute in the HttpServletRequest and invoke authenticationResult.getIdentity().

New audited endpoints

  • Post a task type=ingestion.batch
  • Mark segments as unused: type=markSegmentsAsUnused
  • Kill segments: type=killSegments
  • Create user in basic authenticator: type=basicAuth.createUser
  • Delete user in basic authenticator: type=basicAuth.deleteUser
  • Update user in basic authenticator: type=basicAuth.updateUserCreds

Release note

  • Add new config druid.audit.manager.type which can take values log, sql(default). This allows audited events to either be logged or persisted in metadata store (default behaviour).
  • Add new config druid.audit.manager.logLevel which allows users to set the log level of audit events and can take values DEBUG, INFO(default), WARN.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

try {
Response response =
dataSourcesResource.markAsUnusedAllSegmentsOrKillUnusedSegmentsInInterval("datasource", "true", "???");
dataSourcesResource.markAsUnusedAllSegmentsOrKillUnusedSegmentsInInterval("datasource", "true", "???", request);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
DataSourcesResource.markAsUnusedAllSegmentsOrKillUnusedSegmentsInInterval
should be avoided because it has been deprecated.
@@ -900,7 +900,7 @@ public void testTaskPostDeniesDatasourceReadUser()
Task task = NoopTask.forDatasource(Datasources.WIKIPEDIA);
expectedException.expect(ForbiddenException.class);
expectedException.expect(ForbiddenException.class);
overlordResource.taskPost(task, req);
overlordResource.taskPost(task, "", "", req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a test with non-empty author to see that audit event does go through?

auditManager.doAudit(
AuditEvent.builder()
.key(dataSourceName)
.type("segments.markUnused")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the type name be standardized? E.g. serviceName.api-path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me.

So would something like the following work?

  • coordinator.markSegmentsUnused
  • coordinator.basicAuth.createUser
  • overlord.postTask
  • overlord.killSegments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a field request in AuditEntry to contain the details of path, uri, method and service.

@@ -225,18 +222,29 @@ public Response taskPost(final Task task, @Context final HttpServletRequest req)
taskQueue -> {
try {
taskQueue.add(task);

auditManager.doAudit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we filter task submissions that were initiated by the system itself? E.g. compaction duty, MSQ controller task, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is already happening inside the respective audit impls, such as SQLAuditManager and LoggingAuditManager. We check if an audited event was started by an internal service (by checking the author field) and then audit that event only if system event auditing is enabled via config (false by default).

This is better than checking at each call site if the request was initiated by an internal service.

<T> void doAudit(String key, String type, AuditInfo auditInfo, T payload, ConfigSerde<T> configSerde);
default boolean isSystemRequest(AuditInfo auditInfo)
{
return AUTHOR_DRUID_SYSTEM.equals(auditInfo.getAuthor());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be valid for any auth extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we would explicitly set this as the header value when using internal clients.

Currently, we are setting it only in OverlordClientImpl.runTask but we can choose do set it in other audited endpoints too, or better yet just always set it by default in any service client. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its possible for a user to bypass audit system if they set this header?

Comment on lines 150 to 155
final MapBinder<String, AuditManager> auditManagerBinder
= PolyBind.optionBinder(binder, Key.get(AuditManager.class));
auditManagerBinder
.addBinding("log")
.to(LoggingAuditManager.class)
.in(LazySingleton.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems off that the log choice is added in this module. should this be moved to some other module?

Copy link
Contributor Author

@kfaraz kfaraz Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me put this in some other module.

@@ -96,7 +97,8 @@ public ListenableFuture<Void> runTask(final String taskId, final Object taskObje
return FutureUtils.transform(
client.asyncRequest(
new RequestBuilder(HttpMethod.POST, "/druid/indexer/v1/task")
.jsonContent(jsonMapper, taskObject),
.jsonContent(jsonMapper, taskObject)
.header(AuditManager.X_DRUID_AUTHOR, AuditManager.AUTHOR_DRUID_SYSTEM),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? can we not rely on authenticated identity instead?

Copy link
Contributor Author

@kfaraz kfaraz Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed much easier to implement.

I encountered some issues with the other approach:

  • The authenticated identity would depend on both auth impl and configuration. I couldn't find a definite way to determine the internal service identity. Basic auth creates the druid_system user but I don't see it being used in internal communication. (On second thought, let me see if I can pull the identity from the escalated client that is being injected)
  • When there is no auth, all requests get the identity allowAll.
  • When auth is enabled, a user could still use the internal druid service username/password to invoke an API and bypass the audit (This is technically still possible by setting the author header to this specific value but that is less likely to happen. This is one of the reasons I chose to include a config for auditing system requests too).

Please let me know your thoughts on this.

Comment on lines +136 to 137
"KillCompactionConfig",
"KillCompactionConfig",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identity and author are KillCompactionConfig?

Copy link
Contributor Author

@kfaraz kfaraz Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Since we are revisiting auditing in this PR, I guess it would be better to use druid_system or something as the author instead.

@kfaraz
Copy link
Contributor Author

kfaraz commented Dec 19, 2023

Thanks for the review, @abhishekagarwal87 !
Merging since the IT failure is not caused by this set of change.

@kfaraz kfaraz merged commit 9f56885 into apache:master Dec 19, 2023
82 of 83 checks passed
@kfaraz kfaraz deleted the add_audit_logger branch December 19, 2023 07:44
@xvrl
Copy link
Member

xvrl commented Dec 19, 2023

@kfaraz the master build started failing for some middle-manager integration tests after merging your PR, could you take a look?

@kfaraz
Copy link
Contributor Author

kfaraz commented Dec 20, 2023

Yes, @xvrl , I am trying to debug that issue here #15561. It seems like a flaky test as it does seem to pass sometimes. But the flakiness has become pronounced after the changes in this auditing PR somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants