-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support order-by on BYTES column #5213
Conversation
In order to support order-by on BYTES column everywhere, inside the system we should always use ByteArray (comparable) to store the BYTES value. Currently BYTES value are stored as byte[], ByteArray or String in different places, which is very confusing and could cause casting errors. Changes: - For DisctinctCount, fix the casting issue when ordering on BYTES column - For selection order-by, order BYTES column using ByteArray instead of String for performance improvement - Inside Record, always store BYTES as ByteArray for clarity and also performance improvement (avoid expensive deepEquals and deepHashCode) - On broker side, store BYTES column using ByteArray instead of String for performance improvement - On broker side, support type compatible merges for all selection queries No format change on the query results. TODO: We are still returning String for BYTES column when preserving the type. Consider changing it to byte[].
6999a95
to
b5e867a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
break; | ||
default: | ||
case OBJECT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed for intermediate aggregated object like HLL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
switch (dataType) { | ||
// Single-value column | ||
case INT: | ||
return ((Number) value).intValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From IntermediateResultsBlock
class, you kind of directly casted types to an object instead of casting to Number
. Can we use the same approach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to handle compatible types here (commented on line 514). This is required for cases where different segments have different schema (same column name, but different data types).
The long term solution should be let the query engine use the table schema instead of the schema inside each segments.
In order to support order-by on BYTES column everywhere, inside the system we should always use ByteArray (comparable) to store the BYTES value.
Currently BYTES value are stored as byte[], ByteArray or String in different places, which is very confusing and could cause casting errors.
Changes:
No format change on the query results.
TODO: We are still returning String for BYTES column when preserving the type. Consider changing it to byte[].