minor: change maxStringLength default#19198
Conversation
abhishekrb19
left a comment
There was a problem hiding this comment.
Thanks for the follow up @jaykanakiya! One comment on the validation.
| ) | ||
| { | ||
| super(name, multiValueHandling, createBitmapIndex == null ? DEFAULT_CREATE_BITMAP_INDEX : createBitmapIndex); | ||
| this.maxStringLength = maxStringLength != null && maxStringLength > 0 ? maxStringLength : getDefaultMaxStringLength(); |
There was a problem hiding this comment.
Should this validation be updated to allow 0 values now: if (maxStringLength != null && maxStringLength < 0)
There was a problem hiding this comment.
Updated it to allow 0.
docs/configuration/index.md
Outdated
| |`druid.indexer.task.tmpStorageBytesPerTask`|Maximum number of bytes per task to be used to store temporary files on disk. This config is generally intended for internal usage. Attempts to set it are very likely to be overwritten by the TaskRunner that executes the task, so be sure of what you expect to happen before directly adjusting this configuration parameter. The config is documented here primarily to provide an understanding of what it means if/when someone sees that it has been set. A value of -1 disables this limit. |-1| | ||
| |`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.|0| | ||
| |`druid.indexing.formats.maxStringLength`|Maximum number of characters to store per string dimension value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Set to 0 to disable. Can be overridden per-dimension using `maxStringLength` in the [dimension object](../ingestion/ingestion-spec.md#dimension-objects).|0 (no truncation)| | ||
| |`druid.indexing.formats.maxStringLength`|Maximum number of characters to store per string dimension value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Can be overridden per-dimension using `maxStringLength` in the [dimension object](../ingestion/ingestion-spec.md#dimension-objects).|`null` (no truncation)| |
There was a problem hiding this comment.
| |`druid.indexing.formats.maxStringLength`|Maximum number of characters to store per string dimension value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Can be overridden per-dimension using `maxStringLength` in the [dimension object](../ingestion/ingestion-spec.md#dimension-objects).|`null` (no truncation)| | |
| |`druid.indexing.formats.maxStringLength`|Maximum number of characters to store per string dimension value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Can be overridden per-dimension using `maxStringLength` in the [dimension object](../ingestion/ingestion-spec.md#dimension-objects). Value must be >= 0. |`null` (no truncation)| |
docs/ingestion/ingestion-spec.md
Outdated
| | createBitmapIndex | For `string` typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for `string` typed dimensions. | `true` | | ||
| | multiValueHandling | For `string` typed dimensions, specifies the type of handling for [multi-value fields](../querying/multi-value-dimensions.md). Possible values are `array` (ingest string arrays as-is), `sorted_array` (sort string arrays during ingestion), and `sorted_set` (sort and de-duplicate string arrays during ingestion). This parameter is ignored for types other than `string`. | `sorted_array` | | ||
| | maxStringLength | For `string` typed dimensions, the maximum number of characters to store per value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Set to 0 to disable. Overrides the global [`druid.indexing.formats.maxStringLength`](../configuration/index.md#additional-peon-configuration) property. | `0` (no truncation) | | ||
| | maxStringLength | For `string` typed dimensions, the maximum number of characters to store per value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Overrides the global [`druid.indexing.formats.maxStringLength`](../configuration/index.md#additional-peon-configuration) property. | `null` (no truncation) | |
There was a problem hiding this comment.
| | maxStringLength | For `string` typed dimensions, the maximum number of characters to store per value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Overrides the global [`druid.indexing.formats.maxStringLength`](../configuration/index.md#additional-peon-configuration) property. | `null` (no truncation) | | |
| | maxStringLength | For `string` typed dimensions, the maximum number of characters to store per value. Longer values are truncated during ingestion. Does not apply to multi-value string dimensions. Overrides the global [`druid.indexing.formats.maxStringLength`](../configuration/index.md#additional-peon-configuration) property. Value must be >= 0.| `null` (no truncation) | |
There was a problem hiding this comment.
Updated the docs.
processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java
Outdated
Show resolved
Hide resolved
| { | ||
| super(name, multiValueHandling, createBitmapIndex == null ? DEFAULT_CREATE_BITMAP_INDEX : createBitmapIndex); | ||
| this.maxStringLength = maxStringLength != null && maxStringLength > 0 ? maxStringLength : getDefaultMaxStringLength(); | ||
| this.maxStringLength = maxStringLength != null ? maxStringLength : getDefaultMaxStringLength(); |
There was a problem hiding this comment.
It will be good to call the same validation function for the per-dimension override maxStringLength as well. I think negative values will be accepted, but cause undefined behavior when truncation is applied: value.substring(0, maxStringLength)?
Consider adding a test in StringDimensionSchemaTest for this case
There was a problem hiding this comment.
Good point, added some validation and a test for this.
There was a problem hiding this comment.
Since 0 is supported, the error message should be "non-negative integer" or "Value must be >= 0"
There was a problem hiding this comment.
Changed it to non negative integer.
| * Truncates the value to the first {@link #maxStringLength} characters if configured, otherwise returns it as-is. | ||
| */ | ||
| @Nullable | ||
| private String truncateIfNeeded(String value) |
There was a problem hiding this comment.
Intellij is flagging this for me:
| private String truncateIfNeeded(String value) | |
| private String truncateIfNeeded(@Nullable String value) |
| private static Integer validateMaxStringLength(@Nullable Integer maxStringLength) | ||
| { | ||
| if (maxStringLength != null && maxStringLength < 0) { | ||
| throw new IllegalArgumentException("maxStringLength must be >= 0, got " + maxStringLength); |
There was a problem hiding this comment.
Please use DruidException and include column name in the error message.
There was a problem hiding this comment.
Updated to use DruidException and also log column name.
Follow up to #19146
Description
Changes default value of maxStringLength to
null.Release note
Updated the default value for druid.indexing.formats.maxStringLength to
nullfrom 0. This change also included documentation update for the same.Key changed/added classes in this PR
StringDimensionIndexerStringDimensionSchemaBuiltInTypesModule.javaThis PR has: