Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-4857: Support String<->byte[] conversion #2570

Closed
wants to merge 2 commits into from

Conversation

mattyb149
Copy link
Contributor

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

@mattyb149 mattyb149 force-pushed the NIFI-4857_2 branch 2 times, most recently from a1f9d46 to 206f069 Compare March 21, 2018 02:03
Copy link
Contributor

@markap14 markap14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattyb149 this looks good overall. I think we can simply things a bit, though, by avoiding support for byte[] and ByteBuffer in the Record and instead just support use of byte[]. I also think we need to reconsider how we handle some of the RuntimeExceptions and conversion failures. I commented inline on specific areas that I noticed.

// Either an object array or a String to be converted to byte[] or a ByteBuffer (from Avro, e.g.)
&& (value instanceof Object[]
|| (value instanceof String && RecordFieldType.BYTE.getDataType().equals(elementDataType))
|| value instanceof ByteBuffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be supporting ByteBuffer here, just byte[]. The more we allow for, the more complex this gets and the more error-prone and less consistent it will become. While Avro may use ByteBuffers, when we use an Avro Reader to create a Record, we should be doing the conversion there from ByteBuffer to byte[].

return new String((byte[])value, charset);
}

if (value instanceof ByteBuffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I think we should avoid the use of ByteBuffer here

if (value instanceof byte[]) {
return new String((byte[]) value, charset);
}
if (value instanceof ByteBuffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

} else {
try {
return Charset.forName(charsetName);
} catch(Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If given an invalid character set, I think I would prefer to just throw the Exception. If there is a typo somewhere, this can lead to some very unexpected results that are difficult to track down.

stringValue = ""; // Empty array = empty string
}
} else if (!(fv.getValue() instanceof byte[])) {
return fv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably warrants throwing an Exception. It seems wrong to me to have the user explicitly indicating that they want a conversion to a String and then return something different, like an Integer.

try {
stringValue = DataTypeUtils.toString(fv.getValue(), (String) null, charset);
} catch (final Exception e) {
return fv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any RuntimeException is thrown here, I don't think we want to silently ignore it. Should probably let it fly.

.map(fv -> {

if (!(fv.getValue() instanceof String)) {
return fv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be throwing an Exception in this case? The user in this case is attempting to coerce a type that is not valid to coerce. Or otherwise filter it out from the results. It seems wrong to me to just ignore the conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, but I copied that from ToDate, seemed like since it's a Stream, the top-level caller is expecting a new object back rather than catching an exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the top-level caller is expecting that... but in this case, we cannot give the caller the correct type of data. So I think it's best to just throw instead of giving the caller the wrong data... if that's happening in ToDate then it's either a bug there as well, or perhaps there's some undocumented assumption being made about what else the type could be??

bytesValue[i] = src[i];
}
} catch (final Exception e) {
return fv;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably let the Exception fly - as-is, it is just silently swallowing the Exception and then returning an object of the wrong type.

@@ -609,6 +623,9 @@ private static Object convertToAvroObject(final Object rawValue, final Schema fi
if (rawValue instanceof byte[]) {
return ByteBuffer.wrap((byte[]) rawValue);
}
if (rawValue instanceof String) {
return ByteBuffer.wrap(((String) rawValue).getBytes(charset));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid ByteBuffer here and instead use just byte[]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the clause above, a byte[] is wrapped in a ByteBuffer as well (that's where I got the code from), won't we be returning two different objects in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops - my bad on this one. This is #convertToAvroObject, and I was thinking of #normalizeValue. In this case, we are converting into the object that Avro wants, so a ByteBuffer is the correct thing to do.

@markap14
Copy link
Contributor

markap14 commented Apr 9, 2018

@mattyb149 thanks for the update! Sorry about the delay in getting back to this. All looks good now from my POV. There was a checkstyle violation (unused import) but I addressed that and all else looks good so merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants