Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-7958][Sort] Fix MongoDB's schema becomes unordered after extracting the row data #7960

Merged
merged 4 commits into from
May 7, 2023

Conversation

e-mhui
Copy link
Contributor

@e-mhui e-mhui commented May 6, 2023

Prepare a Pull Request

[INLONG-7958][Sort] Fix MongoDB's schema becomes unordered after extracting the row data

Motivation

  1. MongoDB's schema becomes unordered after extracting the row data. The sink (e.g. iceberg) automatically builds the table, and the source schema being shuffled during the synchronization process will result in inconsistent schema order between source and sink.

image

  1. For all database migration, Mognodb CDC does not specify a primary key, so upsert cannot be implemented. Refer to https://ververica.github.io/flink-cdc-connectors/master/content/connectors/mongodb-cdc.html, we should specify _id as the default primary key.

MongoDB’s change event record doesn’t have updated before message. So, we can only convert it to Flink’s UPSERT changelog stream. An upsert stream requires a unique key, so we must declare _id as primary key. We can’t declare other column as primary key, because delete operation does not contain the key and value besides _id and sharding key.

Modifications

  1. Use LinkedHashMap instead of HashMap to make the schema ordered.
  2. Use _id as the default primary key
  3. Refactor the code

@gong
Copy link
Contributor

gong commented May 6, 2023

@e-mhui Why to keep schema order? I suggest that add reason.

@e-mhui
Copy link
Contributor Author

e-mhui commented May 6, 2023

@e-mhui Why to keep schema order? I suggest that add reason.

The sink (e.g. iceberg) automatically builds the table, and the source schema being shuffled during the synchronization process will result in inconsistent schema order between source and sink.

@e-mhui
Copy link
Contributor Author

e-mhui commented May 6, 2023

@e-mhui Why to keep schema order? I suggest that add reason.

The reason has been added to the description.

@gong
Copy link
Contributor

gong commented May 6, 2023

@e-mhui Why to keep schema order? I suggest that add reason.

The sink (e.g. iceberg) automatically builds the table, and the source schema being shuffled during the synchronization process will result in inconsistent schema order between source and sink.

look good to me

@dockerzhang dockerzhang merged commit bb45659 into apache:master May 7, 2023
GanfengTan pushed a commit to GanfengTan/incubator-inlong that referenced this pull request May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][Sort] MongoDB's schema becomes unordered after extracting the row data
5 participants