Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input type validation for datasketches hll "build" aggregator factory #12131

Merged
merged 2 commits into from
Jan 11, 2022

Conversation

somu-imply
Copy link
Contributor

@somu-imply somu-imply commented Jan 7, 2022

…rect values

Say we are ingesting the following dimensions for a complex data type, say hyperUnique

"metricsSpec" : [
        { "type": "count", "name": "count" },
        { "type": "HLLSketchBuild", "name": "ipaddr_distinct_count", "fieldName": "ipaddr", "lgK": 4, "tgtHllType": "HLL_4", "round": true }

The ingestion will throw this error message

org.apache.druid.java.util.common.ISE: Invalid input type [COMPLEX<hyperUnique>] in ingestion for metric type HLLSketchBuild, in aggregate ipaddr_distinct_count for field name ipaddr

This PR has:

  • [ x] been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • [ x] been tested in a test Druid cluster.

@@ -71,6 +74,7 @@ protected byte getCacheTypeId()
public Aggregator factorize(final ColumnSelectorFactory columnSelectorFactory)
{
final ColumnValueSelector<Object> selector = columnSelectorFactory.makeColumnValueSelector(getFieldName());
validateInputs(columnSelectorFactory.getColumnCapabilities(getFieldName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factorizeBuffer and factorizeVector should also validate their inputs

also please add tests for this stuff

Comment on lines 125 to 131
throw new ISE(
"Invalid input type [%s] in ingestion for metric type %s, in aggregate %s for field name %s",
capabilities.asTypeString(),
HllSketchModule.BUILD_TYPE_NAME,
getName(),
getFieldName()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is used at both query and ingestion time, and shouldn't mention ingestion

Suggested change
throw new ISE(
"Invalid input type [%s] in ingestion for metric type %s, in aggregate %s for field name %s",
capabilities.asTypeString(),
HllSketchModule.BUILD_TYPE_NAME,
getName(),
getFieldName()
);
throw new ISE(
"Invalid input [%s] of type [%s] for [%s] aggregator [%s]",
getFieldName(),
capabilities.asTypeString(),
HllSketchModule.BUILD_TYPE_NAME,
getName()
);

@@ -114,4 +118,19 @@ public int getMaxIntermediateSize()
return HllSketch.getMaxUpdatableSerializationBytes(getLgK(), TgtHllType.valueOf(getTgtHllType()));
}

private void validateInputs(@Nullable ColumnCapabilities capabilities)
{
if (capabilities != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use Types.is(capabilities, ValueType.COMPLEX) to get rid of outer if statement since it null checks

@somu-imply
Copy link
Contributor Author

Addressed the changes and added junit test case

@clintropolis clintropolis changed the title Ingestion will fail for HLLSketchBuild instead of creating with incor… input type validation for datasketches hll "build" aggregator factory Jan 11, 2022
@clintropolis clintropolis merged commit 08fea7a into apache:master Jan 11, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
sachinsagare pushed a commit to pinterest/druid that referenced this pull request Oct 28, 2022
…apache#12131)

* Ingestion will fail for HLLSketchBuild instead of creating with incorrect values

* Addressing review comments for HLL< updated error message introduced test case

(cherry picked from commit 08fea7a)
sachinsagare pushed a commit to sachinsagare/druid that referenced this pull request Nov 2, 2022
…apache#12131)

* Ingestion will fail for HLLSketchBuild instead of creating with incorrect values

* Addressing review comments for HLL< updated error message introduced test case

(cherry picked from commit 08fea7a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants