Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7274] Implement the Protobuf schema provider for compiled protocol buffers #10449

Merged
merged 4 commits into from Dec 25, 2019

Conversation

reuvenlax
Copy link
Contributor

Since compiled protocol buffers have efficient generated getters and builders, we reuse the ByteBuddy framework to create lazy rows that delegate to those methods.

Support for DynamicMessage is in a followon PR.

R: @alexvanboxel

@@ -94,7 +94,6 @@
new ForLoadedType(ReadableInstant.class);
private static final ForLoadedType READABLE_PARTIAL_TYPE =
new ForLoadedType(ReadablePartial.class);
private static final ForLoadedType OBJECT_TYPE = new ForLoadedType(Object.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably intensional, but where is OBJECT_TYPE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was unused.

import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap;

@Experimental(Kind.SCHEMAS)
public class ProtoRecordSchema extends GetterBasedSchemaProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funky naming, looks like spilled over from Avro implementation. Wouldn't ProtoMessageSchema be better?

@@ -122,6 +123,15 @@ private Iterable getIterableValue(FieldType elementType, Object fieldValue) {
cacheKey, i -> getMapValue(type.getMapKeyType(), type.getMapValueType(), map))
: (T) getMapValue(type.getMapKeyType(), type.getMapValueType(), map);
} else {
if (type.isLogicalType(OneOfType.IDENTIFIER)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle the oneOf (or UnionType) separately from all other LogicalTypes. I think we decided to leave Enums also out for now. I think it's best that we handle this separate from the Proto PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that originally, however it turns out that the only way to make OneOf work with the getters framework is to actually handle it explicitly, otherwise recursive translation fails to work. Happy to pull this out into a separate PR if you think that's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling says that we should take it separately. Ok, union types will fail, but for now it's only proto specific right? I've added this on my todo list when testing the implementation with our use cases.

OneOfType.Value oneOfValue = (OneOfType.Value) fieldValue;
Object convertedOneOfField =
getValue(oneOfValue.getFieldType(), oneOfValue.getValue(), null);
return (T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not generate a different type for each message. If conversion happens here as a user of the getValue you don't have a way to see what the effective type and field was. Why not return the LogicalTypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the current semantics of Row.getValue() is to return the underlying type and you're expected to call Row.getLogicalTypeValue() to get the LogicalType value. It's debatable whether that's the best solution for Row, but there are arguments both ways and I didn't want to change those semantics here.

@@ -127,6 +123,22 @@ public T apply(Row row) {
valueType,
typeFactory);
} else {
if (type.getTypeName().isLogicalType()
&& OneOfType.IDENTIFIER.equals(type.getLogicalType().getIdentifier())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as with RowWithGetters... should we handle the OneOf logical type as a special case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as before.

.getDeclaredMethods()
.filter(ElementMatchers.named("toDuration"))
.getOnly()));
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to handle the wrapper types here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! that's why tests are failing, and I'll add that support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! that's why tests are failing, and I'll add that support.


@Override
public List<FieldValueGetter> fieldValueGetters(Class<?> targetClass, Schema schema) {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would throw RuntimeException with "Not Implemented" for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I meant to remove this file from this PR - it belongs in the next PR. I'll remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I meant to remove this file from this PR - it belongs in the next PR. I'll remove it.

import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Multimap;

@Experimental(Kind.SCHEMAS)
public class ProtoRecordSchema extends GetterBasedSchemaProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProtoRecordSchema seems like a funky name, it's neither Beam nor Proto. Seems like Avro naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Contributor

@alexvanboxel alexvanboxel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the OneOf in the Row (and let's mark it as a known issue). If test are green + spotless I'll give a LGTM.

@reuvenlax
Copy link
Contributor Author

reuvenlax commented Dec 23, 2019 via email

@alexvanboxel
Copy link
Contributor

If I remove the OneOf then tests will start failing, because we won't be able to properly support the codegen for OneOf (basically we would then be removing OneOf support in protobuf as well).

OK, let's keep it in then. Better to tackle it later then. Spotless and Java tests are failing though.

@reuvenlax
Copy link
Contributor Author

@alexvanboxel support for wrapper types added, and all tests are now green.

Copy link
Contributor

@alexvanboxel alexvanboxel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only thing I think needs revision later on is the OneOf. I'll need to play with it to make a good assessment.

@reuvenlax reuvenlax merged commit bb3c63b into apache:master Dec 25, 2019
JozoVilcek pushed a commit to JozoVilcek/beam that referenced this pull request Feb 21, 2020
…chema provider for compiled protocol buffers
vmarquez pushed a commit to vmarquez/beam that referenced this pull request Apr 1, 2020
…chema provider for compiled protocol buffers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants