Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column name in parse exceptions #16529

Merged
merged 25 commits into from
Jun 25, 2024

Conversation

TSFenwick
Copy link
Contributor

@TSFenwick TSFenwick commented May 31, 2024

Makes the column name or dimension name be more apparent in parse exceptions

Description

Often times parse exception occur that end up being user visible, but they don't offer any information on how to remediate them. This change adds more information to allow a user to fix the actual issue.

For example in the report of a streaming job the details changes from

      {
        "input": "{isRobot=false, channel=#sv.wikipedia, timestamp=2024-05-30T21:28:18.343Z, flags=M, isUnpatrolled=false, page=Fredrik Johan Leth, diffUrl=https://sv.wikipedia.org/w/index.php?diff=36127609&oldid=34977802, added=41, comment=tog bort [[Kategori:KDDO1gr]]; lade till [[Kategori:Kommendörer av första graden av Dannebrogorden]] med [[Wikipedia:Hotcat|Hotcat]], commentLength=132, isNew=false, isMinor=true, delta=, isAnonymous=false, user=Portunes, deltaBucket=0.0, deleted=0, namespace=Main}",
        "errorType": "processedWithError",
        "details": [
          "could not convert value [] to long"
        ],
        "timeOfExceptionMillis": 1717181630058
      }

to

      {
        "input": "{isRobot=false, channel=#sv.wikipedia, timestamp=2024-05-30T21:28:18.343Z, flags=M, isUnpatrolled=false, page=Fredrik Johan Leth, diffUrl=https://sv.wikipedia.org/w/index.php?diff=36127609&oldid=34977802, added=41, comment=tog bort [[Kategori:KDDO1gr]]; lade till [[Kategori:Kommendörer av första graden av Dannebrogorden]] med [[Wikipedia:Hotcat|Hotcat]], commentLength=132, isNew=false, isMinor=true, delta=, isAnonymous=false, user=Portunes, deltaBucket=0.0, deleted=0, namespace=Main}",
        "errorType": "processedWithError",
        "details": [
          "could not convert value [] to long for dimension [delta]"
        ],
        "timeOfExceptionMillis": 1717181630058
      }

One thing i thought of while doing this is I think it would make sense to make the ParseException actually contain another parameter to have the dimension or key that failed on parsing rather than just the input. But i think that can be left to another PR.

Release note


Key changed/added classes in this PR
  • processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@TSFenwick TSFenwick marked this pull request as ready for review June 5, 2024 22:20
Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I left a few comments related to error messages and code formatting, and a question related to Nullable annotations.

Thanks for the QoL improvement @TSFenwick!

if (reportParseExceptions && ret == null) {
throw new ParseException((String) valObj, "could not convert value [%s] to float", valObj);
@Nullable
public static Float convertObjectToFloat(@Nullable Object valObj, @Nullable String fieldName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on the @Nullable annotation: this new overloaded function appears to be called only from SqlResults similar to convertObjectToLong() where the fieldName doesn't have a @Nullable annotation. I don't think fieldName can be nullable in this context, right? If that's correct, please remove the annotation here; otherwise, update convertObjectToLong's signature above to reflect that

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 after CI!

@@ -52,6 +52,16 @@ public EncodedKeyComponent<Float> processRowValsToUnsortedEncodedKeyComponent(@N
return new EncodedKeyComponent<>(f, Float.BYTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this method override, just return processRowValsToUnsortedEncodedKeyComponent(dimValues, reportParseExceptions, null) to avoid duplication of code

@Override
public EncodedKeyComponent<Float> processRowValsToUnsortedEncodedKeyComponent(@Nullable Object dimValues, boolean reportParseExceptions, @Nullable String dimension)
{
Float l = DimensionHandlerUtils.convertObjectToFloat(dimValues, reportParseExceptions, dimension);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: variable rename for float: Float l -> Float f

@@ -137,6 +137,15 @@ EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyC
boolean reportParseExceptions
);

default EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an interface method, it could use a brief javadoc:

Suggested change
default EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent(
/**
* Similar to {@link #processRowValsToUnsortedEncodedKeyComponent}, but with an optional
* dimension parameter to provide more context for error handling where applicable.
*/
default EncodedKeyComponent<EncodedKeyComponentType> processRowValsToUnsortedEncodedKeyComponent(

@@ -1634,8 +1634,8 @@ public void testMultipleParseExceptionsSuccess() throws Exception

List<String> expectedMessages = Arrays.asList(
"Unable to parse value[notanumber] for field[met1]",
"could not convert value [notanumber] to float",
"could not convert value [notanumber] to long",
"could not convert value [notanumber] to float for dimension [dimFloat]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CI is failing because these test expectations need to be updated?

@@ -608,7 +608,7 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row)
Object dimsKey = null;
try {
final EncodedKeyComponent<?> encodedKeyComponent
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true);
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true, dimension);
Copy link
Member

@clintropolis clintropolis Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we really need to pass this as an argument for every value of every row? maybe it would be better to just initialize the dimension indexers with their name instead.

Alternatively, we are already catching ParseException, why not add a way to decorate it with an additional column name.

I guess i'm just being cautious about small overhead of additional arguments and the default method impl adding up since this is a pretty hot function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintropolis I think i could do that. Lemme get this current PR passing. then i'll attempt that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintropolis i was able to initialize the dimensionIndexers with the name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, that seems better.

thinking a bit more on it i think having a way to decorate an existing ParseException with a column name might actually have been a little cleaner and even less disruptive since we are already catching them in a place with the column name, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L616 without being dependent on dimension indexer implementation, but this way seems ok too

case STRING:
return convertObjectToString(obj);
case ARRAY:
return coerceToObjectArrayWithElementCoercionFunction(
obj,
x -> DimensionHandlerUtils.convertObjectToType(x, type.getElementType())
x -> DimensionHandlerUtils.convertObjectToType(x, type.getElementType(), false, fieldName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably pass reportParseExceptions instead of false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. I set it to false to preserve how it was working before. but it wouldn't hurt to have it pass reportParseExceptions instead

@@ -608,7 +608,7 @@ IncrementalIndexRowResult toIncrementalIndexRow(InputRow row)
Object dimsKey = null;
try {
final EncodedKeyComponent<?> encodedKeyComponent
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true);
= indexer.processRowValsToUnsortedEncodedKeyComponent(row.getRaw(dimension), true, dimension);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, that seems better.

thinking a bit more on it i think having a way to decorate an existing ParseException with a column name might actually have been a little cleaner and even less disruptive since we are already catching them in a place with the column name, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L616 without being dependent on dimension indexer implementation, but this way seems ok too

@@ -155,7 +155,7 @@ public EncodedKeyComponent<int[]> processRowValsToUnsortedEncodedKeyComponent(@N
/**
* Estimates size of the given key component.
* <p>
* Deprecated method. Use {@link #processRowValsToUnsortedEncodedKeyComponent(Object, boolean)}
* Deprecated method. Use {@link #processRowValsToUnsortedEncodedKeyComponent(Object, boolean, String)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs reverted i think

final String message;
if (fieldName != null) {
message = StringUtils.nonStrictFormat(
"Could not convert value [%s] to float for dimension [%s].",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The exception message in the if/else block seem to differ by for dimension [%s]. Instead of duplicating the message does it make sense to have a single template?

@TSFenwick TSFenwick force-pushed the column-name-in-parse-exceptions branch from 8f6d8f9 to bfecd95 Compare June 25, 2024 17:01
@abhishekrb19 abhishekrb19 merged commit 52c9929 into apache:master Jun 25, 2024
88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants