Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Druid should handle situation when string dimension column is queried as numeric? #4888

Closed
leventov opened this issue Oct 2, 2017 · 5 comments · Fixed by #8428
Closed

Comments

@leventov
Copy link
Member

leventov commented Oct 2, 2017

  1. Consider values are zeros, do not try to parse string values. Seems this is the current behaviour
  2. Try to parse string values, fail with error if they are not parseable, or null, or multiple-value
  3. Try to parse string values, and consider values are zero if they are not parseable, or null, or multiple-value
  4. Do not try to parse, always fail with error

How this behaviour is chosen?

  1. Not chosen, always the same (current)
  2. Chosen on the query level
  3. Could be set in the node configurations
@gianm
Copy link
Contributor

gianm commented Oct 2, 2017

One current inconsistency is that with expression-based column selectors (anything that goes through Parser/Expr) the behavior is (3). See IdentifierExpr + how it handles strings that are treated as numbers. But with direct column selectors the behavior is (1). In particular this means that e.g. a longSum aggregator behaves differently if it's "fieldName" : "x" vs. "expression" : "x" even though you might think they should behave the same.

Although, on the third hand, dimensions behave differently from aggregators. They act more like (3). There's some code that sets up the "proper" column selector that matches the type in the segment, and groups using that seelctor on a per-segment basis, then uses the user-specified "outputType" only for merging results. So you can group on a string column with "outputType" : "long" and the strings will be parsed as numbers during the merging phase. This is mostly done through ColumnSelectorStrategyFactory impls in the groupBy and topN engines.

IMO, making (3) consistent across the board would be good. I don't think (2) or (4) are good, it's not in the spirit of Druid being a generally "loose schema" data store. IMO (3) is preferable to (1) since it enables easy schema changes from string -> numeric type and vice versa. This was the motivator for making groupBY/topN behave the way they do: it was introduced along with the numeric dimension feature.

@himanshug
Copy link
Contributor

consistently doing (3) makes sense to me too specially to behave well with schema migration.

@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2019
@stale
Copy link

stale bot commented Jul 5, 2019

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Jul 5, 2019
@leventov leventov removed the stale label Jul 30, 2019
@leventov leventov reopened this Jul 30, 2019
@himanshug
Copy link
Contributor

thanks to stalebot ,I had forgotten about this.

It turns out I have now gotten use cases where at least the aggregators need to have behavior (3) . I had created an issue to describe my thoughts in #8148 and I think I'm gonna make a PR soonish to make that change as using expressions with multi value columns is becoming tricky for the service that generates queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants