-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload #11078
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Perhaps a more readable name for the config would be druid.audit.manager.maxPayloadSizeBytes
? I don't think the exact behaviour (skipping the store in this case) needs to be a part of the config name.
I like that name better. Done |
Added 'Design Review' as it introduces a new user-facing configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach LGTM.
The defaults seem good as well 👍
docs/configuration/index.md
Outdated
@@ -338,6 +338,8 @@ Coordinator and Overlord log changes to lookups, segment load/drop rules, dynami | |||
|--------|-----------|-------| | |||
|`druid.audit.manager.auditHistoryMillis`|Default duration for querying audit history.|1 week| | |||
|`druid.audit.manager.includePayloadAsDimensionInMetric`|Boolean flag on whether to add `payload` column in service metric.|false| | |||
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting maxPayloadSizeBytes to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This way we can avoid the update to the .spelling
file
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting maxPayloadSizeBytes to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1| | |
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -28,6 +28,8 @@ | |||
|
|||
public interface AuditManager | |||
{ | |||
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as the payload size exceed limit configured by druid.audit.manager.maxPayloadSizeBytes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadocs please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -25,5 +25,6 @@ | |||
{ | |||
byte[] serialize(T obj); | |||
String serializeToString(T obj); | |||
String serializeSkipNullToString(T obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part of the code so well, could you add a javadoc and talk about what the expectations of this function are and how they're different from serializeToString
above. Maybe javadocs will help clarify this for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -50,6 +52,7 @@ public JacksonConfigManager( | |||
this.configManager = configManager; | |||
this.jsonMapper = jsonMapper; | |||
this.auditManager = auditManager; | |||
this.jsonMapperSkipNull = jsonMapper.copy().setSerializationInclusion(JsonInclude.Include.NON_NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this object should be bound in the JacksonModule
as a singleton.
Even though JacksonConfigManager is a singleton, if this binding changes in the future, we could inadvertently create an ObjectMapper per JacksonConfigManager instead of a singleton that can be shared across the process.
Otherwise, each time we create a new JacksonConfigMapper, we will create a new ObjectMapper, even though, jsonMapper is a singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -31,6 +31,9 @@ | |||
@JsonProperty | |||
private boolean includePayloadAsDimensionInMetric = false; | |||
|
|||
@JsonProperty | |||
private long maxPayloadSizeBytes = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this property declared as type of HumanReadableBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HumanReadableBytes
won't work here. I am using the value -1
to disabled this check / unlimited max payload size. This is the same as config like maxBytesInMemory where setting maxBytesInMemory to -1 disables the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HumanReadableBytes
not work with -1
? I thought it does. @FrankChen021 I noticed we have no unit tests for HumanReadableBytes
. It would be nice to have some 🙂.
This is the same as config like maxBytesInMemory where setting maxBytesInMemory to -1 disables the check.
To me, the design of maxBytesInMemory
is not the best because what -1
or 0
mean is not intuitive which means you have to learn those by reading the doc. Similar idea for this, I think it's better to be HumanReadableBytes
. You can add another flag to enable/disable the check if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HumanReadableBytes not work with -1?
HumanReadableBytes
does not work with negative value from config since I think there's no such scenario that a size could be negative. But it can accept negative value from code.
Here I think there're some ways:
- if -1 is allowed in configuration, we could remove that constraint in
HumanReadableBytes
, and applyHumanReadableBytesRange
annotation to make sure its value is in valid range - or, if we leave that constraint, user must delete this configuration to disable the check
- or, provide an independent flag to enable or disable the check as suggested by @jihoonson
IMO, 3rd is better.
I noticed we have no unit tests for HumanReadableBytes. It would be nice to have some 🙂.
HumanReadableBytes
has been covered by UT in HumanReadableBytesTest file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jihoonson Ah, I was wrong. HumanReadableBytes
does allow -1
. For example, CaffeineCacheConfig's sizeInBytes also uses HumanReadableBytes
and set default to -1
. Changed this to use HumanReadableBytes
. However, I kept the default value as -1
which is use for disabling the check. This is consistent with a few other configs we have in Druid right now. We can discuss on changing all of the configs and standardize disable/enable in a separate PR.
@FrankChen021 Negative values should not be set by the user. We can set it to -1
in the code for disabling the check and if user set it (which is to enable it), then it must be set to a positive number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FrankChen021 oh you are right. There is a class for unit tests. Sorry, a filter that excludes all test classes might be on in my intellij when I searched last time 😓
I have a question. Users have no idea of the size of payload, are there any suggestions that tell users how to set this configuration ? Can we skip storing payload based on value of |
In another PR, I am adding an auto cleanup based on size limit for the druid_audit metadata table. This work similar to druid.indexer.logs.kill.* which periodically clean up task entries in the druid_tasks table. Basically in my other PR, user will be able to configured the max number of entries they want to store in the druid_audit metadata table, and Druid will automatically remove from oldest entry (FIFO) when the table entries exceed this size limit. Using both of these configuration (this PR and the other PR), Druid operator will be able to control the size of the audit metadata table. For example, if you want to limit your audit table metadata to 20GB (i.e. that is how much storage you have on your machine), and say you want keep |
* This String is the default message stored instead of the actual audit payload if the audit payload size | ||
* exceeded the maximum size limit configuration | ||
*/ | ||
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as the payload size exceed limit configured by druid.audit.manager.maxPayloadSizeBytes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to include what was the limit when payload was skipped. Maybe something like Payload was not stored as its size exceeds the limit [%d] configured by druid.audit.manager.maxPayloadSizeBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* @param obj to be serialize | ||
* @param skipNull if true, then skip serialization of any field with null value. | ||
* This can be used to reduce the size of the resulting String. | ||
* @return String serialization of the input without fields with null values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true only when skipNull
is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Updated.
import java.lang.annotation.RetentionPolicy; | ||
import java.lang.annotation.Target; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -31,6 +31,9 @@ | |||
@JsonProperty | |||
private boolean includePayloadAsDimensionInMetric = false; | |||
|
|||
@JsonProperty | |||
private long maxPayloadSizeBytes = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HumanReadableBytes
not work with -1
? I thought it does. @FrankChen021 I noticed we have no unit tests for HumanReadableBytes
. It would be nice to have some 🙂.
This is the same as config like maxBytesInMemory where setting maxBytesInMemory to -1 disables the check.
To me, the design of maxBytesInMemory
is not the best because what -1
or 0
mean is not intuitive which means you have to learn those by reading the doc. Similar idea for this, I think it's better to be HumanReadableBytes
. You can add another flag to enable/disable the check if you want.
@@ -77,13 +83,14 @@ public JacksonConfigManager( | |||
.key(key) | |||
.type(key) | |||
.auditInfo(auditInfo) | |||
.payload(configSerde.serializeToString(val)) | |||
.payload(configSerde.serializeToString(val, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a feature flag for this too. Otherwise, it will break existing user applications that work based on audit logs if any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@Retention(RetentionPolicy.RUNTIME) | ||
@BindingAnnotation | ||
@PublicApi | ||
public @interface JsonOnlyNonNullValueSerialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name seems quite too long to me 😅 How about JsonNonNull
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
That makes sense. Do you have any plan to put these explanation in doc by your another PR ? |
Yep. This is the other PR #11084. Right now it only has the doc change and the code implementation is coming soon. Please let me know if you think any docs is missing. I will add a section on how both configs can be use together to control the size of the metadata tables. |
@@ -32,17 +33,21 @@ | |||
* This String is the default message stored instead of the actual audit payload if the audit payload size | |||
* exceeded the maximum size limit configuration | |||
*/ | |||
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as the payload size exceed limit configured by druid.audit.manager.maxPayloadSizeBytes"; | |||
String PAYLOAD_SKIP_MESSAGE = "Payload was not stored as its size exceeds the limit [%d] configured by druid.audit.manager.maxPayloadSizeBytes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest PAYLOAD_SKIP_MSG_FORMAT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -32,7 +33,10 @@ | |||
private boolean includePayloadAsDimensionInMetric = false; | |||
|
|||
@JsonProperty | |||
private long maxPayloadSizeBytes = -1; | |||
private HumanReadableBytes maxPayloadSizeBytes = HumanReadableBytes.valueOf(-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maxPayloadSizeBytes
should not be a negative number than -1. It would be nice to have a sanity check that checks the lower and upper bounds (to avoid overflow) using @HumanReadableBytesRange
as @FrankChen021 suggested in #11078 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also updated the docs.
docs/configuration/index.md
Outdated
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size.|-1| | ||
|
||
|`druid.audit.manager.maxPayloadSizeBytes`|The maximum size of audit payload, in bytes, to store in Druid's metadata store audit table. If the size of audit payload exceed this value, the audit log would be stored with a messaging indicating that the payload was omitted instead. Setting `maxPayloadSizeBytes` to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size. Human-readable format is supported, see [here](human-readable-byte.md). |-1| | ||
|`druid.audit.manager.skipNullField`|If true, the audit payload stored in metadata store will be exclude any field with null value. |false| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be exclude
-> will exclude
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -31,6 +31,9 @@ | |||
@JsonProperty | |||
private boolean includePayloadAsDimensionInMetric = false; | |||
|
|||
@JsonProperty | |||
private long maxPayloadSizeBytes = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FrankChen021 oh you are right. There is a class for unit tests. Sorry, a filter that excludes all test classes might be on in my intellij when I searched last time 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick update. I left a couple of trivial comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left a suggestion on typos in docs.
@FrankChen021 I have changed the config to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configs, defaults and approach LGTM
I have not dug into tests, but since this design review, another committer can confirm this.
Co-authored-by: Jihoon Son <jihoonson@apache.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! +1 after CI.
@jihoonson @suneet-s @FrankChen021 Thanks all for the review. @FrankChen021 Let me know if you have any followup comments or concerns. |
Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload
Description
Audit log such as coordinator.compaction.config can get very large as datasources and their configurations churn. This can cause problems such as
This PR introduce a new configuration that skip storing audit payload if payload size exceed configured limit. The default value is set to -1 which disables this check, meaning Druid will always store audit payload regardless of it's size. This PR also skip storing null fields for audit payload in the effort of saving storage space as those are unnecessary.
This PR has: