Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change arrayIngestMode default to array #16789

Merged

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jul 24, 2024

Description

The validation added in #15920 should make this relatively safe to do. Some ingestions that are based on the old 'mvd' mode syntax which implicitly changed ARRAY<STRING> types to STRING might fail, but there is a descriptive error message and the fix is pretty easy.

Release note

MSQ ingestion context flag arrayIngestMode now defaults to array instead of mvd. This means that SQL VARCHAR ARRAY types will no longer be implicitly translated and stored into VARCHAR columns, but will instead be stored as VARCHAR ARRAY. Additionally it permits other array types such as BIGINT ARRAY and DOUBLE ARRAY to be inserted with MSQ into their respective array column types (instead of failing as they do in mvd mode). To continue to store multi-value strings modify any insert/replace queries to wrap the array types with the ARRAY_TO_MV operator. Validation is in place that will prevent mixing VARCHAR and VARCHAR ARRAY columns in the same table, so any ingestions affected by this change will fail and provide a descriptive error message instead of silently doing something unexpected. See https://druid.apache.org/docs/latest/querying/multi-value-dimensions#sql-based-ingestion for additional information about how to ingest multi-value strings, and https://druid.apache.org/docs/latest/querying/arrays#sql-based-ingestion for additional information about arrays.

Additionally arrayIngestMode option of none has been removed. It was introduced prior to the table validation logic as a means for cluster operators to force query writers to explicitly set array or mvd on their query contexts, but provides little utility in Druid 31.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@clintropolis clintropolis added Release Notes Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 24, 2024
as regular scalar strings. SQL `BIGINT ARRAY` and `DOUBLE ARRAY` cannot be loaded under `arrayIngestMode: mvd`. This
mode is not recommended and will be removed in a future release, but provided for backwards compatibility.

When `arrayIngestMode` is `none`, Druid throws an exception when trying to store any type of arrays. This mode is most
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have validations to prevent user error associated with this transition, I wonder if we should remove arrayIngestMode = none.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me to drop it since I can't think of a very strong use case to keep it around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

validation unless the column is named under the `skipTypeVerification` context parameter. This parameter can be either
a comma-separated list of column names, or a JSON array in string form. This validation is done to prevent accidentally
mixing arrays and multi-value strings in the same column.
Arrays can be inserted with [SQL-based ingestion](../multi-stage-query/index.md) .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraneous space

@asdf2014 asdf2014 merged commit 5da69a0 into apache:master Jul 25, 2024
88 checks passed
@clintropolis clintropolis deleted the change-array-ingest-mode-default-to-array branch July 25, 2024 08:02
sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024
* change arrayIngestMode default to array

* remove arrayIngestMode flag option none

* fix space

* fix test
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Documentation Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Release Notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants