-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column name in parse exceptions #16529
Column name in parse exceptions #16529
Conversation
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. I left a few comments related to error messages and code formatting, and a question related to Nullable
annotations.
Thanks for the QoL improvement @TSFenwick!
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
if (reportParseExceptions && ret == null) { | ||
throw new ParseException((String) valObj, "could not convert value [%s] to float", valObj); | ||
@Nullable | ||
public static Float convertObjectToFloat(@Nullable Object valObj, @Nullable String fieldName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question on the @Nullable
annotation: this new overloaded function appears to be called only from SqlResults
similar to convertObjectToLong()
where the fieldName doesn't have a @Nullable
annotation. I don't think fieldName
can be nullable in this context, right? If that's correct, please remove the annotation here; otherwise, update convertObjectToLong
's signature above to reflect that
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after CI!
@@ -52,6 +52,16 @@ public EncodedKeyComponent<Float> processRowValsToUnsortedEncodedKeyComponent(@N | |||
return new EncodedKeyComponent<>(f, Float.BYTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this method override, just return processRowValsToUnsortedEncodedKeyComponent(dimValues, reportParseExceptions, null)
to avoid duplication of code
@Override | ||
public EncodedKeyComponent<Float> processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean reportParseExceptions, @Nullable String dimension) | ||
{ | ||
Float l = DimensionHandlerUtils.convertObjectToFloat(dimValues, reportParseExceptions, dimension); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: variable rename for float: Float l
-> Float f
@@ -137,6 +137,15 @@ EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyC | |||
boolean reportParseExceptions | |||
); | |||
|
|||
default EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an interface method, it could use a brief javadoc:
default EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent( | |
/** | |
* Similar to {@link #processRowValsToUnsortedEncodedKeyComponent}, but with an optional | |
* dimension parameter to provide more context for error handling where applicable. | |
*/ | |
default EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent( |
@@ -1634,8 +1634,8 @@ public void testMultipleParseExceptionsSuccess() throws Exception | |||
|
|||
List<String> expectedMessages = Arrays.asList( | |||
"Unable to parse value[notanumber] for field[met1]", | |||
"could not convert value [notanumber] to float", | |||
"could not convert value [notanumber] to long", | |||
"could not convert value [notanumber] to float for dimension [dimFloat]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think CI is failing because these test expectations need to be updated?
@@ -608,7 +608,7 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row) | |||
Object dimsKey = null; | |||
try { | |||
final EncodedKeyComponent<?> encodedKeyComponent | |||
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true); | |||
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true, dimension); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, do we really need to pass this as an argument for every value of every row? maybe it would be better to just initialize the dimension indexers with their name instead.
Alternatively, we are already catching ParseException
, why not add a way to decorate it with an additional column name.
I guess i'm just being cautious about small overhead of additional arguments and the default method impl adding up since this is a pretty hot function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clintropolis I think i could do that. Lemme get this current PR passing. then i'll attempt that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clintropolis i was able to initialize the dimensionIndexers with the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, that seems better.
thinking a bit more on it i think having a way to decorate an existing ParseException
with a column name might actually have been a little cleaner and even less disruptive since we are already catching them in a place with the column name, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L616 without being dependent on dimension indexer implementation, but this way seems ok too
case STRING: | ||
return convertObjectToString(obj); | ||
case ARRAY: | ||
return coerceToObjectArrayWithElementCoercionFunction( | ||
obj, | ||
x -> DimensionHandlerUtils.convertObjectToType(x, type.getElementType()) | ||
x -> DimensionHandlerUtils.convertObjectToType(x, type.getElementType(), false, fieldName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably pass reportParseExceptions
instead of false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I set it to false
to preserve how it was working before. but it wouldn't hurt to have it pass reportParseExceptions
instead
@@ -608,7 +608,7 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row) | |||
Object dimsKey = null; | |||
try { | |||
final EncodedKeyComponent<?> encodedKeyComponent | |||
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true); | |||
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true, dimension); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, that seems better.
thinking a bit more on it i think having a way to decorate an existing ParseException
with a column name might actually have been a little cleaner and even less disruptive since we are already catching them in a place with the column name, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L616 without being dependent on dimension indexer implementation, but this way seems ok too
@@ -155,7 +155,7 @@ public EncodedKeyComponent<int[]> processRowValsToUnsortedEncodedKeyComponent(@N | |||
/** | |||
* Estimates size of the given key component. | |||
* <p> | |||
* Deprecated method. Use {@link #processRowValsToUnsortedEncodedKeyComponent(Object, boolean)} | |||
* Deprecated method. Use {@link #processRowValsToUnsortedEncodedKeyComponent(Object, boolean, String)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs reverted i think
final String message; | ||
if (fieldName != null) { | ||
message = StringUtils.nonStrictFormat( | ||
"Could not convert value [%s] to float for dimension [%s].", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The exception message in the if/else block seem to differ by for dimension [%s]
. Instead of duplicating the message does it make sense to have a single template?
8f6d8f9
to
bfecd95
Compare
Makes the column name or dimension name be more apparent in parse exceptions
Description
Often times parse exception occur that end up being user visible, but they don't offer any information on how to remediate them. This change adds more information to allow a user to fix the actual issue.
For example in the report of a streaming job the
details
changes fromto
One thing i thought of while doing this is I think it would make sense to make the ParseException actually contain another parameter to have the dimension or key that failed on parsing rather than just the input. But i think that can be left to another PR.
Release note
Key changed/added classes in this PR
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
This PR has: