New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-4857: Support String<->byte[] conversion #2570
Conversation
a1f9d46
to
206f069
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattyb149 this looks good overall. I think we can simply things a bit, though, by avoiding support for byte[] and ByteBuffer in the Record and instead just support use of byte[]. I also think we need to reconsider how we handle some of the RuntimeExceptions and conversion failures. I commented inline on specific areas that I noticed.
// Either an object array or a String to be converted to byte[] or a ByteBuffer (from Avro, e.g.) | ||
&& (value instanceof Object[] | ||
|| (value instanceof String && RecordFieldType.BYTE.getDataType().equals(elementDataType)) | ||
|| value instanceof ByteBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be supporting ByteBuffer here, just byte[]. The more we allow for, the more complex this gets and the more error-prone and less consistent it will become. While Avro may use ByteBuffers, when we use an Avro Reader to create a Record, we should be doing the conversion there from ByteBuffer to byte[].
return new String((byte[])value, charset); | ||
} | ||
|
||
if (value instanceof ByteBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I think we should avoid the use of ByteBuffer here
if (value instanceof byte[]) { | ||
return new String((byte[]) value, charset); | ||
} | ||
if (value instanceof ByteBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
} else { | ||
try { | ||
return Charset.forName(charsetName); | ||
} catch(Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If given an invalid character set, I think I would prefer to just throw the Exception. If there is a typo somewhere, this can lead to some very unexpected results that are difficult to track down.
stringValue = ""; // Empty array = empty string | ||
} | ||
} else if (!(fv.getValue() instanceof byte[])) { | ||
return fv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably warrants throwing an Exception. It seems wrong to me to have the user explicitly indicating that they want a conversion to a String and then return something different, like an Integer.
try { | ||
stringValue = DataTypeUtils.toString(fv.getValue(), (String) null, charset); | ||
} catch (final Exception e) { | ||
return fv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any RuntimeException is thrown here, I don't think we want to silently ignore it. Should probably let it fly.
.map(fv -> { | ||
|
||
if (!(fv.getValue() instanceof String)) { | ||
return fv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably be throwing an Exception in this case? The user in this case is attempting to coerce a type that is not valid to coerce. Or otherwise filter it out from the results. It seems wrong to me to just ignore the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, but I copied that from ToDate, seemed like since it's a Stream, the top-level caller is expecting a new object back rather than catching an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the top-level caller is expecting that... but in this case, we cannot give the caller the correct type of data. So I think it's best to just throw instead of giving the caller the wrong data... if that's happening in ToDate then it's either a bug there as well, or perhaps there's some undocumented assumption being made about what else the type could be??
bytesValue[i] = src[i]; | ||
} | ||
} catch (final Exception e) { | ||
return fv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably let the Exception fly - as-is, it is just silently swallowing the Exception and then returning an object of the wrong type.
@@ -609,6 +623,9 @@ private static Object convertToAvroObject(final Object rawValue, final Schema fi | |||
if (rawValue instanceof byte[]) { | |||
return ByteBuffer.wrap((byte[]) rawValue); | |||
} | |||
if (rawValue instanceof String) { | |||
return ByteBuffer.wrap(((String) rawValue).getBytes(charset)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid ByteBuffer here and instead use just byte[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the clause above, a byte[] is wrapped in a ByteBuffer as well (that's where I got the code from), won't we be returning two different objects in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops - my bad on this one. This is #convertToAvroObject, and I was thinking of #normalizeValue. In this case, we are converting into the object that Avro wants, so a ByteBuffer is the correct thing to do.
@mattyb149 thanks for the update! Sorry about the delay in getting back to this. All looks good now from my POV. There was a checkstyle violation (unused import) but I addressed that and all else looks good so merged to master. |
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.