New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43117][CONNECT] Make ProtoUtils.abbreviate
support repeated fields
#45056
Conversation
builder.setField(field, abbreviate(msg, thresholds)) | ||
|
||
case (field: FieldDescriptor, msgs: java.lang.Iterable[_]) | ||
if field.getJavaType == FieldDescriptor.JavaType.MESSAGE && field.isRepeated | ||
&& msgs != null => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need to consider the scenario where strings.iterator().hasNext
is false
?
&& strings != null => | ||
val threshold = thresholds.getOrElse(STRING, MAX_STRING_SIZE) | ||
strings.iterator().asScala.zipWithIndex.foreach { | ||
case (string: String, i) if string != null && string.length > threshold => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can string
really be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, but not very sure.
So I added string != null
to protect from NPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, for repeated fields, null checks exist whether adding a single element or adding a collection of elements, for example:
public Builder addColumnNames(
java.lang.String value) {
if (value == null) { throw new NullPointerException(); }
ensureColumnNamesIsMutable();
columnNames_.add(value);
bitField0_ |= 0x00000004;
onChanged();
return this;
}
// com.google.protobuf.AbstractMessageLite.Builder#addAll(java.lang.Iterable<T>, java.util.List<? super T>)
protected static <T> void addAll(final Iterable<T> values, final List<? super T> list) {
checkNotNull(values);
if (values instanceof LazyStringList) {
// For StringOrByteStringLists, check the underlying elements to avoid
// forcing conversions of ByteStrings to Strings.
// TODO: Could we just prohibit nulls in all protobuf lists and get rid of this? Is
// if even possible to hit this condition as all protobuf methods check for null first,
// right?
List<?> lazyValues = ((LazyStringList) values).getUnderlyingElements();
LazyStringList lazyList = (LazyStringList) list;
....
also, I have performed the following checks:
- adding a collection of elements with null
val names = Seq.range(0, 10).map(i => if (i == 3) null else i.toString * 1024)
val drop = proto.Drop.newBuilder().setInput(sql).addAllColumnNames(names.asJava).build()
[info] - truncate repeated strings with nulls *** FAILED *** (3 milliseconds)
[info] java.lang.NullPointerException: Element at index 3 is null.
[info] at com.google.protobuf.AbstractMessageLite$Builder.addAllCheckingNulls(AbstractMessageLite.java:359)
[info] at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:414)
[info] at org.apache.spark.connect.proto.Drop$Builder.addAllColumnNames(Drop.java:1240)
- add a null element
val drop = proto.Drop.newBuilder().setInput(sql).addColumnNames(null).build()
[info] - truncate repeated strings with nulls *** FAILED *** (4 milliseconds)
[info] java.lang.NullPointerException:
[info] at org.apache.spark.connect.proto.Drop$Builder.addColumnNames(Drop.java:1221)
As you can see, under normal circumstances, it is impossible to add a null element to the repeated type. So personally, I don't think this null check is necessary, but I don't object to adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for checking this!
thank you guys, merged to master |
What changes were proposed in this pull request?
Make
ProtoUtils.abbreviate
support repeated fieldsWhy are the changes needed?
existing implementation does not work for repeated fields (strings/messages)
we don't have
repeated bytes
in Spark Connect for now, so let it aloneDoes this PR introduce any user-facing change?
no
How was this patch tested?
added UTs
Was this patch authored or co-authored using generative AI tooling?
no