Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload #11078

Merged
merged 18 commits into from
Apr 14, 2021

Conversation

maytasm
Copy link
Contributor

@maytasm maytasm commented Apr 7, 2021

Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload

Description

Audit log such as coordinator.compaction.config can get very large as datasources and their configurations churn. This can cause problems such as

  • Metadata database taking up too much space / running out of storage space
  • Configuration change (such as compaction configuration) failing as we are not able to insert new row into audit table due to exceeding the database size limit (max_allowed_packet, etc)

This PR introduce a new configuration that skip storing audit payload if payload size exceed configured limit. The default value is set to -1 which disables this check, meaning Druid will always store audit payload regardless of it's size. This PR also skip storing null fields for audit payload in the effort of saving storage space as those are unnecessary.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps a more readable name for the config would be druid.audit.manager.maxPayloadSizeBytes? I don't think the exact behaviour (skipping the store in this case) needs to be a part of the config name.

@maytasm
Copy link
Contributor Author

maytasm commented Apr 7, 2021

Nit: Perhaps a more readable name for the config would be druid.audit.manager.maxPayloadSizeBytes? I don't think the exact behaviour (skipping the store in this case) needs to be a part of the config name.

I like that name better. Done

@maytasm maytasm changed the title Introduce a new configuration that skip storing audit payload if payload size exceed limit Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload Apr 7, 2021
@jihoonson
Copy link
Contributor

Added 'Design Review' as it introduces a new user-facing configuration.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM.

The defaults seem good as well 👍

@@ -338,6 +338,8 @@ Coordinator and Overlord log changes to lookups, segment load/drop rules, dynami
|--------|-----------|-------|
|`druid.audit.manager.auditHistoryMillis`|Default duration for querying audit history.|1 week|
|`druid.audit.manager.includePayloadAsDimensionInMetric`|Boolean flag on whether to add `payload` column in service metric.|false|
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting maxPayloadSizeBytes to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This way we can avoid the update to the .spelling file

Suggested change
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting maxPayloadSizeBytes to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1|
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -28,6 +28,8 @@

public interface AuditManager
{
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as the payload size exceed limit configured by druid.audit.manager.maxPayloadSizeBytes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -25,5 +25,6 @@
{
byte[] serialize(T obj);
String serializeToString(T obj);
String serializeSkipNullToString(T obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part of the code so well, could you add a javadoc and talk about what the expectations of this function are and how they're different from serializeToString above. Maybe javadocs will help clarify this for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -50,6 +52,7 @@ public JacksonConfigManager(
this.configManager = configManager;
this.jsonMapper = jsonMapper;
this.auditManager = auditManager;
this.jsonMapperSkipNull = jsonMapper.copy().setSerializationInclusion(JsonInclude.Include.NON_NULL);
Copy link
Contributor

@suneet-s suneet-s Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this object should be bound in the JacksonModule as a singleton.

Even though JacksonConfigManager is a singleton, if this binding changes in the future, we could inadvertently create an ObjectMapper per JacksonConfigManager instead of a singleton that can be shared across the process.

Otherwise, each time we create a new JacksonConfigMapper, we will create a new ObjectMapper, even though, jsonMapper is a singleton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -31,6 +31,9 @@
@JsonProperty
private boolean includePayloadAsDimensionInMetric = false;

@JsonProperty
private long maxPayloadSizeBytes = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this property declared as type of HumanReadableBytes ?

Copy link
Contributor Author

@maytasm maytasm Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HumanReadableBytes won't work here. I am using the value -1 to disabled this check / unlimited max payload size. This is the same as config like maxBytesInMemory where setting maxBytesInMemory to -1 disables the check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HumanReadableBytes not work with -1? I thought it does. @FrankChen021 I noticed we have no unit tests for HumanReadableBytes. It would be nice to have some 🙂.

This is the same as config like maxBytesInMemory where setting maxBytesInMemory to -1 disables the check.

To me, the design of maxBytesInMemory is not the best because what -1 or 0 mean is not intuitive which means you have to learn those by reading the doc. Similar idea for this, I think it's better to be HumanReadableBytes. You can add another flag to enable/disable the check if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HumanReadableBytes not work with -1?

HumanReadableBytes does not work with negative value from config since I think there's no such scenario that a size could be negative. But it can accept negative value from code.

Here I think there're some ways:

  1. if -1 is allowed in configuration, we could remove that constraint in HumanReadableBytes, and apply HumanReadableBytesRange annotation to make sure its value is in valid range
  2. or, if we leave that constraint, user must delete this configuration to disable the check
  3. or, provide an independent flag to enable or disable the check as suggested by @jihoonson

IMO, 3rd is better.

I noticed we have no unit tests for HumanReadableBytes. It would be nice to have some 🙂.

HumanReadableBytes has been covered by UT in HumanReadableBytesTest file.

Copy link
Contributor Author

@maytasm maytasm Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson Ah, I was wrong. HumanReadableBytes does allow -1. For example, CaffeineCacheConfig's sizeInBytes also uses HumanReadableBytes and set default to -1. Changed this to use HumanReadableBytes. However, I kept the default value as -1 which is use for disabling the check. This is consistent with a few other configs we have in Druid right now. We can discuss on changing all of the configs and standardize disable/enable in a separate PR.

@FrankChen021 Negative values should not be set by the user. We can set it to -1 in the code for disabling the check and if user set it (which is to enable it), then it must be set to a positive number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankChen021 oh you are right. There is a class for unit tests. Sorry, a filter that excludes all test classes might be on in my intellij when I searched last time 😓

@FrankChen021
Copy link
Member

I have a question. Users have no idea of the size of payload, are there any suggestions that tell users how to set this configuration ? Can we skip storing payload based on value of type ?

@maytasm
Copy link
Contributor Author

maytasm commented Apr 8, 2021

I have a question. Users have no idea of the size of payload, are there any suggestions that tell users how to set this configuration ? Can we skip storing payload based on value of type ?

In another PR, I am adding an auto cleanup based on size limit for the druid_audit metadata table. This work similar to druid.indexer.logs.kill.* which periodically clean up task entries in the druid_tasks table. Basically in my other PR, user will be able to configured the max number of entries they want to store in the druid_audit metadata table, and Druid will automatically remove from oldest entry (FIFO) when the table entries exceed this size limit. Using both of these configuration (this PR and the other PR), Druid operator will be able to control the size of the audit metadata table. For example, if you want to limit your audit table metadata to 20GB (i.e. that is how much storage you have on your machine), and say you want keep x number of last audit entries, you can then set maxPayloadSizeBytes to 20GB / x
This make sure that the audit table will not grow uncontrollably and bringing down / effecting your whole Druid cluster.

* This String is the default message stored instead of the actual audit payload if the audit payload size
* exceeded the maximum size limit configuration
*/
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as the payload size exceed limit configured by druid.audit.manager.maxPayloadSizeBytes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to include what was the limit when payload was skipped. Maybe something like Payload was not stored as its size exceeds the limit [%d] configured by druid.audit.manager.maxPayloadSizeBytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param obj to be serialize
* @param skipNull if true, then skip serialization of any field with null value.
* This can be used to reduce the size of the resulting String.
* @return String serialization of the input without fields with null values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true only when skipNull is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Updated.

import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -31,6 +31,9 @@
@JsonProperty
private boolean includePayloadAsDimensionInMetric = false;

@JsonProperty
private long maxPayloadSizeBytes = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HumanReadableBytes not work with -1? I thought it does. @FrankChen021 I noticed we have no unit tests for HumanReadableBytes. It would be nice to have some 🙂.

This is the same as config like maxBytesInMemory where setting maxBytesInMemory to -1 disables the check.

To me, the design of maxBytesInMemory is not the best because what -1 or 0 mean is not intuitive which means you have to learn those by reading the doc. Similar idea for this, I think it's better to be HumanReadableBytes. You can add another flag to enable/disable the check if you want.

@@ -77,13 +83,14 @@ public JacksonConfigManager(
.key(key)
.type(key)
.auditInfo(auditInfo)
.payload(configSerde.serializeToString(val))
.payload(configSerde.serializeToString(val, true))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a feature flag for this too. Otherwise, it will break existing user applications that work based on audit logs if any.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Retention(RetentionPolicy.RUNTIME)
@BindingAnnotation
@PublicApi
public @interface JsonOnlyNonNullValueSerialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name seems quite too long to me 😅 How about JsonNonNull?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@FrankChen021
Copy link
Member

I have a question. Users have no idea of the size of payload, are there any suggestions that tell users how to set this configuration ? Can we skip storing payload based on value of type ?

In another PR, I am adding an auto cleanup based on size limit for the druid_audit metadata table. This work similar to druid.indexer.logs.kill.* which periodically clean up task entries in the druid_tasks table. Basically in my other PR, user will be able to configured the max number of entries they want to store in the druid_audit metadata table, and Druid will automatically remove from oldest entry (FIFO) when the table entries exceed this size limit. Using both of these configuration (this PR and the other PR), Druid operator will be able to control the size of the audit metadata table. For example, if you want to limit your audit table metadata to 20GB (i.e. that is how much storage you have on your machine), and say you want keep x number of last audit entries, you can then set maxPayloadSizeBytes to 20GB / x
This make sure that the audit table will not grow uncontrollably and bringing down / effecting your whole Druid cluster.

That makes sense. Do you have any plan to put these explanation in doc by your another PR ?

@maytasm
Copy link
Contributor Author

maytasm commented Apr 9, 2021

I have a question. Users have no idea of the size of payload, are there any suggestions that tell users how to set this configuration ? Can we skip storing payload based on value of type ?

In another PR, I am adding an auto cleanup based on size limit for the druid_audit metadata table. This work similar to druid.indexer.logs.kill.* which periodically clean up task entries in the druid_tasks table. Basically in my other PR, user will be able to configured the max number of entries they want to store in the druid_audit metadata table, and Druid will automatically remove from oldest entry (FIFO) when the table entries exceed this size limit. Using both of these configuration (this PR and the other PR), Druid operator will be able to control the size of the audit metadata table. For example, if you want to limit your audit table metadata to 20GB (i.e. that is how much storage you have on your machine), and say you want keep x number of last audit entries, you can then set maxPayloadSizeBytes to 20GB / x
This make sure that the audit table will not grow uncontrollably and bringing down / effecting your whole Druid cluster.

That makes sense. Do you have any plan to put these explanation in doc by your another PR ?

Yep. This is the other PR #11084. Right now it only has the doc change and the code implementation is coming soon. Please let me know if you think any docs is missing. I will add a section on how both configs can be use together to control the size of the metadata tables.

@@ -32,17 +33,21 @@
* This String is the default message stored instead of the actual audit payload if the audit payload size
* exceeded the maximum size limit configuration
*/
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as the payload size exceed limit configured by druid.audit.manager.maxPayloadSizeBytes";
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as its size exceeds the limit [%d] configured by druid.audit.manager.maxPayloadSizeBytes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest PAYLOAD_SKIP_MSG_FORMAT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -32,7 +33,10 @@
private boolean includePayloadAsDimensionInMetric = false;

@JsonProperty
private long maxPayloadSizeBytes = -1;
private HumanReadableBytes maxPayloadSizeBytes = HumanReadableBytes.valueOf(-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maxPayloadSizeBytes should not be a negative number than -1. It would be nice to have a sanity check that checks the lower and upper bounds (to avoid overflow) using @HumanReadableBytesRange as @FrankChen021 suggested in #11078 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also updated the docs.

|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1|

|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size. Human-readable format is supported, see [here](human-readable-byte.md). |-1|
|`druid.audit.manager.skipNullField`|If true, the audit payload stored in metadata store will be exclude any field with null value. |false|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be exclude -> will exclude?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -31,6 +31,9 @@
@JsonProperty
private boolean includePayloadAsDimensionInMetric = false;

@JsonProperty
private long maxPayloadSizeBytes = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankChen021 oh you are right. There is a class for unit tests. Sorry, a filter that excludes all test classes might be on in my intellij when I searched last time 😓

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick update. I left a couple of trivial comments.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a suggestion on typos in docs.

docs/configuration/index.md Outdated Show resolved Hide resolved
@maytasm
Copy link
Contributor Author

maytasm commented Apr 12, 2021

@FrankChen021 I have changed the config to HumanReadableBytes. Do you have any other concern or blocker before merging this PR? Thanks!

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configs, defaults and approach LGTM

I have not dug into tests, but since this design review, another committer can confirm this.

Co-authored-by: Jihoon Son <jihoonson@apache.org>
Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! +1 after CI.

@maytasm
Copy link
Contributor Author

maytasm commented Apr 14, 2021

@jihoonson @suneet-s @FrankChen021 Thanks all for the review.

@FrankChen021 Let me know if you have any followup comments or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants