Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonization and bug-fixing for selector and filter behavior on unknown types. #9484

Merged
merged 4 commits into from
Mar 10, 2020

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Mar 9, 2020

  • Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
    system, and set defaultType COMPLEX so unknown types can be dynamically matched.
  • Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
  • Switch various methods to use convertObjectToX when casting to numbers, rather
    than ad-hoc and inconsistent logic.
  • Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
    true even for 0- or 1- element arrays.
  • Adjust various javadocs.

…own types.

- Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
  system, and set defaultType COMPLEX so unknown types can be dynamically matched.
- Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
- Switch various methods to use convertObjectToX when casting to numbers, rather
  than ad-hoc and inconsistent logic.
- Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
  true even for 0- or 1- element arrays.
- Adjust various javadocs.
@gianm
Copy link
Contributor Author

gianm commented Mar 9, 2020

Note to reviewers — some of the bugs fixed here aren't tested by existing tests, but I plan to add tests for them in a future patch that also adds a RowBasedStorageAdapter. That's because the simplest & best way to test them is to add a row-based cursor to BaseFilterTest, which won't exist until the future patch.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

@@ -95,7 +95,7 @@ private boolean isBindingArray(String x)
{
Object binding = bindings.get(x);
if (binding != null) {
if (binding instanceof String[] && ((String[]) binding).length > 1) {
if (binding instanceof String[]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

};
}

public static ValueMatcher makeLongValueMatcher(final BaseLongColumnValueSelector selector, final String value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: missing javadoc (since almost all the others have it)

};
}

public static ValueMatcher makeLongValueMatcher(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit re javadoc

@gianm
Copy link
Contributor Author

gianm commented Mar 9, 2020

Pushed some updates to address test failures in InputRowSerdeTest. I had to add a throwParseExceptions option to the RowBasedColumnSelectorFactory, since it turns out some users want that behavior and some don't.

@clintropolis
Copy link
Member

clintropolis commented Mar 10, 2020

Tagged release notes because this PR changes the behavior of complex metric aggregation at ingestion time when SQL compatible null handling is disabled (the default mode) to now aggregate the default 0 values for rows instead of skipping them. This change is for the better imo since it makes things symmetrical to as if you ingested the raw data and built the sketch at query time, but it is different so worth calling out, and you can see the effects in some of the test changes in this PR.

@gianm
Copy link
Contributor Author

gianm commented Mar 10, 2020

Tagged release notes because this PR changes the behavior of complex metric aggregation at ingestion time when SQL compatible null handling is disabled (the default mode) to now aggregate the default 0 values for rows instead of skipping them. This change is for the better imo since it makes things symmetrical to as if you ingested the raw data and built the sketch at query time, but it is different so worth calling out, and you can see the effects in some of the test changes in this PR.

Thanks for pointing that out. Yes, I agree, it is for the better since it makes the ingest-time behavior and query-time behavior the same. This is part of the promise of Druid rollup in the first place (you can move aggregations to ingest time if you want).

Btw, this patch also ends up making ingest-time transforms and filters behave more consistently with query-time ones.

The reason is that all this ingest-time stuff runs in unknown-type mode, which til now had various inconsistencies with known-type mode (which is used at query time).

@gianm gianm merged commit c6c2282 into apache:master Mar 10, 2020
@gianm gianm deleted the harmonize-untyped branch March 10, 2020 14:16
@jihoonson jihoonson added this to the 0.18.0 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants