support schemaless ingestion for transformed dimensions#9546
support schemaless ingestion for transformed dimensions#9546xvrl wants to merge 5 commits intoapache:masterfrom
Conversation
clintropolis
left a comment
There was a problem hiding this comment.
I think this seems like a reasonable to change the defaults so that transforms are automatically used in schema-less mode. I'm uncertain though, does this change currently have the implication that if a column is transformed and only meant to be used for an aggregator input then it will still be ingested as a dimension, even if the dimensions are explicitly defined? I guess if the aggregator has the same name as the transform it would overwrite it, but it might be unexpected if the column shows up when the user has explicitly defined a dimension list and a differently named aggregator. I wonder if the dimension schema being in 'auto discover' mode should be propagated into this so that it can apply this behavior only when schema discovery is active. That seems like a more involved fix though.
We should document this behavior somewhere, but not sure where is most appropriate. Perhaps https://github.com/apache/druid/blob/master/docs/ingestion/index.md#transformspec? I guess https://github.com/apache/druid/blob/master/docs/ingestion/schema-design.md#schema-less-dimensions could also work maybe?
|
@clintropolis agree there are some quirks, it's a bit unfortunate that dimensionExclusions don't get applied after transforms, otherwise we could use that as a workaround. One option would be to pass in an additional field to the transform spec to indicate which field should be added as dimensions. The behavior might still be odd since those new dimensions would still be added even if they are not part of the schema when one is explicitly defined. Maybe we can add this as an experimental/undocumented feature for now and see if we find a better solution long-term. |
| @JsonProperty("filter") final DimFilter filter, | ||
| @JsonProperty("transforms") final List<Transform> transforms | ||
| @JsonProperty("transforms") final List<Transform> transforms, | ||
| @JsonProperty("addDimensions") final List<String> addDimensions |
There was a problem hiding this comment.
This seems like it could work, without the downside of always including them as dimensions. The alternative I guess would be adding a isDimension to the Transform interface? I'm not sure which is better, though the way you've done it here is definitely less intrusive in terms of modifications.
|
The discussion in #3599 looks related. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
This pull request/issue is no longer marked as stale. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
I might be missing some subtleties or edge cases, but I'm wondering if this is all it takes to support adding schemaless ingestion for dimensions created by transformspecs?
Of course this doesn't address the backwards incompatibility concerns. We can discuss whether we feel it is important to maintain the old behavior or not.I updated
TransformSpecto take a newaddDimensionsfield, to indicate which dimensions created by the transforms should be added to the schema. This new field should only be used for schema-less ingestion, since the added dimensions cannot be excluded bydimensionExclusions.Taking suggestions for a better name than
addDimensions:)fixes #7952