Skip to content

support schemaless ingestion for transformed dimensions#9546

Closed
xvrl wants to merge 5 commits intoapache:masterfrom
xvrl:transformspec-add-dims
Closed

support schemaless ingestion for transformed dimensions#9546
xvrl wants to merge 5 commits intoapache:masterfrom
xvrl:transformspec-add-dims

Conversation

@xvrl
Copy link
Member

@xvrl xvrl commented Mar 20, 2020

I might be missing some subtleties or edge cases, but I'm wondering if this is all it takes to support adding schemaless ingestion for dimensions created by transformspecs?

Of course this doesn't address the backwards incompatibility concerns. We can discuss whether we feel it is important to maintain the old behavior or not.

I updated TransformSpec to take a new addDimensions field, to indicate which dimensions created by the transforms should be added to the schema. This new field should only be used for schema-less ingestion, since the added dimensions cannot be excluded by dimensionExclusions.

Taking suggestions for a better name than addDimensions :)

fixes #7952

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this seems like a reasonable to change the defaults so that transforms are automatically used in schema-less mode. I'm uncertain though, does this change currently have the implication that if a column is transformed and only meant to be used for an aggregator input then it will still be ingested as a dimension, even if the dimensions are explicitly defined? I guess if the aggregator has the same name as the transform it would overwrite it, but it might be unexpected if the column shows up when the user has explicitly defined a dimension list and a differently named aggregator. I wonder if the dimension schema being in 'auto discover' mode should be propagated into this so that it can apply this behavior only when schema discovery is active. That seems like a more involved fix though.

We should document this behavior somewhere, but not sure where is most appropriate. Perhaps https://github.com/apache/druid/blob/master/docs/ingestion/index.md#transformspec? I guess https://github.com/apache/druid/blob/master/docs/ingestion/schema-design.md#schema-less-dimensions could also work maybe?

@xvrl
Copy link
Member Author

xvrl commented Mar 24, 2020

@clintropolis agree there are some quirks, it's a bit unfortunate that dimensionExclusions don't get applied after transforms, otherwise we could use that as a workaround. One option would be to pass in an additional field to the transform spec to indicate which field should be added as dimensions.

The behavior might still be odd since those new dimensions would still be added even if they are not part of the schema when one is explicitly defined. Maybe we can add this as an experimental/undocumented feature for now and see if we find a better solution long-term.

@xvrl xvrl removed the Incompatible label Mar 25, 2020
@JsonProperty("filter") final DimFilter filter,
@JsonProperty("transforms") final List<Transform> transforms
@JsonProperty("transforms") final List<Transform> transforms,
@JsonProperty("addDimensions") final List<String> addDimensions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could work, without the downside of always including them as dimensions. The alternative I guess would be adding a isDimension to the Transform interface? I'm not sure which is better, though the way you've done it here is definitely less intrusive in terms of modifications.

@jihoonson
Copy link
Contributor

The discussion in #3599 looks related.

@stale
Copy link

stale bot commented May 30, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label May 30, 2020
@stale
Copy link

stale bot commented Jun 27, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Jun 27, 2020
@jihoonson jihoonson reopened this Jun 27, 2020
@stale
Copy link

stale bot commented Jun 27, 2020

This pull request/issue is no longer marked as stale.

@stale stale bot removed the stale label Jun 27, 2020
@stale
Copy link

stale bot commented Aug 29, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Aug 29, 2020
@jihoonson jihoonson removed the stale label Aug 30, 2020
@stale
Copy link

stale bot commented Nov 5, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Nov 5, 2020
@stale
Copy link

stale bot commented Dec 25, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Dec 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transform specs are ignored if dimension auto detection is used

3 participants