Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make metadata allFields ordered consistently with the valueSchema. #787

Merged
merged 4 commits into from
Feb 18, 2020

Conversation

norganna
Copy link
Contributor

@norganna norganna commented Feb 6, 2020

Currently the use of a HashMap to store the columns loses the ordering of the columns from the schema.

When the create table is generated, the order of the columns is unpredictable.

This change creates a LinkedHashMap which preserves the original column order from the valueSchema and tests to check ordering is functioning correctly.

@ghost
Copy link

ghost commented Feb 6, 2020

@confluentinc It looks like @norganna just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@norganna
Copy link
Contributor Author

norganna commented Feb 6, 2020

@gharris1727 This is a more complete and accurate replacement for #763.

Copy link
Contributor

@gharris1727 gharris1727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your previous strategy of just replacing allFields was more robust.

@@ -129,7 +130,15 @@ public static FieldsMetadata extract(
);
}

return new FieldsMetadata(keyFieldNames, nonKeyFieldNames, allFields);
final Map<String, SinkRecordField> allFieldsOrdered = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By separating allFields and allFieldsOrdered, you're removing the PK-first ordering of the columns in favor of whatever order the record specifies them in. I think this might violate some assumptions that we have downstream.

I'm also concerned about what happens when allFields contains a key that's not in the record (such as topic/partition/offset), since these are added in a previous step to allFields, but never appear in allFieldsOrdered.

I think the build failures may be linked to this change in semantics.

Copy link
Contributor Author

@norganna norganna Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any PK-first dependancies down-stream because currently the HashMap is indeterminately ordered.

Ok, then perhaps this is not the right approach, since the previous method will not preserve original column ordering in the destination table (as PK's will appear first).

Perhaps the approach needed is for the query builder to not use the order from allFields :/

I'll take a look at the additional fields that are added to the allFields and see if I can fix this PR up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have committed additional code to make sure all fields are added in a determinate order to the allFieldsOrdered map.

Looks like tests are completing now.

The default Kafka fields are added at the beginning of the fields, and the remainder of non-valueSchema fields are added at the end in sorted order for a repeatably determinate ordering.
if (allFields.containsKey(fieldName)) {
allFieldsOrdered.put(fieldName, allFields.get(fieldName));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add the default PK fields first, the ordering is determinate because the DEFAULT_KAFKA_PK_NAMES will always be the same.

allFieldsOrdered.put(fieldName, allFields.get(fieldName));
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add the fields from the source DB in the order from the source table and is thus determinate.

allFieldsOrdered.put(fieldName, allFields.get(fieldName));
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's fields missing (possible if pkMode != RECORD_VALUE and valueSchema = NULL) then add the remaining fields sorted by name for deterministic ordering.

Copy link
Contributor

@gharris1727 gharris1727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvement @norganna !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants