Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support order-by on BYTES column #5213

Merged
merged 1 commit into from
Apr 8, 2020

Conversation

Jackie-Jiang
Copy link
Contributor

In order to support order-by on BYTES column everywhere, inside the system we should always use ByteArray (comparable) to store the BYTES value.
Currently BYTES value are stored as byte[], ByteArray or String in different places, which is very confusing and could cause casting errors.

Changes:

  • For DisctinctCount, fix the casting issue when ordering on BYTES column
  • For selection order-by, order BYTES column using ByteArray instead of String for performance improvement
  • Inside Record, always store BYTES as ByteArray for clarity and also performance improvement (avoid expensive deepEquals and deepHashCode)
  • On broker side, store BYTES column using ByteArray instead of String for performance improvement
  • On broker side, support type compatible merges for all selection queries

No format change on the query results.
TODO: We are still returning String for BYTES column when preserving the type. Consider changing it to byte[].

In order to support order-by on BYTES column everywhere, inside the system we should always use ByteArray (comparable) to store the BYTES value.
Currently BYTES value are stored as byte[], ByteArray or String in different places, which is very confusing and could cause casting errors.

Changes:
- For DisctinctCount, fix the casting issue when ordering on BYTES column
- For selection order-by, order BYTES column using ByteArray instead of String for performance improvement
- Inside Record, always store BYTES as ByteArray for clarity and also performance improvement (avoid expensive deepEquals and deepHashCode)
- On broker side, store BYTES column using ByteArray instead of String for performance improvement
- On broker side, support type compatible merges for all selection queries

No format change on the query results.
TODO: We are still returning String for BYTES column when preserving the type. Consider changing it to byte[].
Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

break;
default:
case OBJECT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed for intermediate aggregated object like HLL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

switch (dataType) {
// Single-value column
case INT:
return ((Number) value).intValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From IntermediateResultsBlock class, you kind of directly casted types to an object instead of casting to Number. Can we use the same approach here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to handle compatible types here (commented on line 514). This is required for cases where different segments have different schema (same column name, but different data types).
The long term solution should be let the query engine use the table schema instead of the schema inside each segments.

@Jackie-Jiang Jackie-Jiang merged commit e84a3af into apache:master Apr 8, 2020
@Jackie-Jiang Jackie-Jiang deleted the order_by_bytes branch April 8, 2020 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants