-
Notifications
You must be signed in to change notification settings - Fork 956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make metadata allFields ordered consistently with the valueSchema. #787
Conversation
@confluentinc It looks like @norganna just signed our Contributor License Agreement. 👍 Always at your service, clabot |
@gharris1727 This is a more complete and accurate replacement for #763. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your previous strategy of just replacing allFields
was more robust.
@@ -129,7 +130,15 @@ public static FieldsMetadata extract( | |||
); | |||
} | |||
|
|||
return new FieldsMetadata(keyFieldNames, nonKeyFieldNames, allFields); | |||
final Map<String, SinkRecordField> allFieldsOrdered = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By separating allFields and allFieldsOrdered, you're removing the PK-first ordering of the columns in favor of whatever order the record specifies them in. I think this might violate some assumptions that we have downstream.
I'm also concerned about what happens when allFields contains a key that's not in the record (such as topic/partition/offset), since these are added in a previous step to allFields, but never appear in allFieldsOrdered.
I think the build failures may be linked to this change in semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any PK-first dependancies down-stream because currently the HashMap is indeterminately ordered.
Ok, then perhaps this is not the right approach, since the previous method will not preserve original column ordering in the destination table (as PK's will appear first).
Perhaps the approach needed is for the query builder to not use the order from allFields :/
I'll take a look at the additional fields that are added to the allFields and see if I can fix this PR up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have committed additional code to make sure all fields are added in a determinate order to the allFieldsOrdered
map.
Looks like tests are completing now.
The default Kafka fields are added at the beginning of the fields, and the remainder of non-valueSchema fields are added at the end in sorted order for a repeatably determinate ordering.
if (allFields.containsKey(fieldName)) { | ||
allFieldsOrdered.put(fieldName, allFields.get(fieldName)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will add the default PK fields first, the ordering is determinate because the DEFAULT_KAFKA_PK_NAMES will always be the same.
allFieldsOrdered.put(fieldName, allFields.get(fieldName)); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will add the fields from the source DB in the order from the source table and is thus determinate.
allFieldsOrdered.put(fieldName, allFields.get(fieldName)); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's fields missing (possible if pkMode != RECORD_VALUE and valueSchema = NULL) then add the remaining fields sorted by name for deterministic ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the improvement @norganna !
Currently the use of a HashMap to store the columns loses the ordering of the columns from the schema.
When the create table is generated, the order of the columns is unpredictable.
This change creates a LinkedHashMap which preserves the original column order from the valueSchema and tests to check ordering is functioning correctly.