New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MaterializedMySQL: Introduce MySQL Binlog Client #57323
Conversation
0edd6b2
to
d467831
Compare
Should it be an Improvement instead of Feature? |
af1f249
to
362ada5
Compare
This is an automated comment for commit 5c221d1 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
362ada5
to
68619af
Compare
542de84
to
38f92ce
Compare
Trying to understand why it could not find the files, could it be because they are not copied/bound? |
02a9fed
to
8cc2f46
Compare
test_storage_kafka/test.py::test_system_kafka_consumers_rebalance_mv ERROR |
316ec85
to
24716de
Compare
|
7762822
to
fa89fc2
Compare
All test_materialized_mysql_database tests passed |
a541bee
to
089e3e5
Compare
b956dfc
to
dc7bd4c
Compare
d97c614
to
bef8514
Compare
95eb2af
to
f818bb9
Compare
One binlog connection for many databases. Suggesting to disable this feature by default for now. It should be explicitly enabled by SETTINGS use_binlog_client=1. But if you would permanently enable it in MaterializedMySQLSettings, it should keep old behavior and all tests should pass too. 1. Introduced `IBinlog` and its impl to read the binlog events from socket - `BinlogFromSocket`, or file - `BinlogFromFile`. Based on prev impl of `EventBase` and the same old binlog parsers. It fully keeps BC with old version. Fixed `./check-mysql-binlog` to test new impl. 2. Introduced `BinlogEventsDispatcher`, it reads the event from the source `IBinlog` and sends it to currently attached `IBinlog` instances. 3. Introduced `BinlogClient`, which is used to group a list of `BinlogEventsDispatcher` by MySQL binlog connection which is defined by `user:password@host:port`. All dispatchers with the same binlog position should be merged to one. 4. Introduced `BinlogClientFactory`, which is a singleton and it is used to track all binlogs created over the instance. 5. Introduced `use_binlog_client` setting to `MaterializedMySQL`, which forces to reuse a `BinlogClient` if it already exists in `BinlogClientCatalog` or create new one. By default, it is disabled. 6. Introduced `max_bytes_in_binlog_queue` setting to define the limit of bytes in binlog's queue of events. If bytes in the queue increases this limit, `BinlogEventsDispatcher` will stop reading new events from source `IBinlog` until the space for new events will be freed. 7. Introduced `max_milliseconds_to_wait_in_binlog_queue` setting to define max ms to wait when the max bytes exceeded. 7. Introduced `max_milliseconds_to_wait_in_binlog_queue` setting to define max ms to wait when the max bytes exceeded. 8. Introduced `max_bytes_in_binlog_dispatcher_buffer` setting to define max bytes in the binlog dispatcher's buffer before it is flushed to attached binlogs. 9. Introduced `max_flush_milliseconds_in_binlog_dispatcher` setting to define max milliseconds in the binlog dispatcher's buffer to wait before it is flushed to attached binlogs. 10. Introduced `system.mysql_binlogs` system table, which shows a list of active binlogs. 11. Introduced `UnparsedRowsEvent` and `MYSQL_UNPARSED_ROWS_EVENT`, which defines that an event is not parsed and should be explicitly parsed later. 12. Fixed bug when not possible to apply DDL since syntax error or unsupported SQL. @larspars is the author of following: `GTIDSets::contains()` `ReplicationHelper` `shouldReconnectOnException()`
f818bb9
to
5c221d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay - I was on vacation.
|
Thanks |
@valbok, Hi! A test has failed: https://s3.amazonaws.com/clickhouse-test-reports/59166/383ae86ebb0da8a962521f55b08186331eb0f676/integration_tests__asan__analyzer__[4_6].html Let's fix this test or revert this PR. |
Thanks, will fix that asap. - #59370 |
The MySQL Binlog Client provides a mechanism in ClickHouse to share the binlog from a MySQL instance among multiple MaterializedMySQL databases. This avoids consuming unnecessary bandwidth and CPU when replicating more than one schema/database.
Suggesting to disable this feature by default for now. It should be explicitly enabled by
SETTINGS use_binlog_client=1
.But if you would permanently enable it in
MaterializedMySQLSettings
, it should keep old behavior and all tests should pass too.Details:
IBinlog
and its impl to read the binlog events from socket -BinlogFromSocket
, or file -BinlogFromFile
. Based on prev impl ofEventBase
and the same old binlog parsers. It fully keeps BC with old version. Fixed./check-mysql-binlog
to test new impl.BinlogEventsDispatcher
, it reads the event from the sourceIBinlog
and sends it to currently attachedIBinlog
instances.BinlogClient
, which is used to group a list ofBinlogEventsDispatcher
by MySQL binlog connection which is defined byuser:password@host:port
. All dispatchers with the same binlog position should be merged to one.BinlogClientFactory
, which is a singleton and it is used to track all binlogs created over the instance.use_binlog_client
setting toMaterializedMySQL
, which forces to reuse aBinlogClient
if it already exists inBinlogClientFactory
or create new one. By default, it is disabled.max_bytes_in_binlog_queue
setting to define the limit of bytes in binlog's queue of events. If bytes in the queue increases this limit,BinlogEventsDispatcher
will stop reading new events from sourceIBinlog
until the space for new events will be freed.max_milliseconds_to_wait_in_binlog_queue
setting to define max ms to wait when the max bytes exceeded.max_bytes_in_binlog_dispatcher_buffer
setting to define max bytes in the binlog dispatcher's buffer before it is flushed to attached binlogs.max_flush_milliseconds_in_binlog_dispatcher
setting to define max milliseconds in the binlog dispatcher's buffer to wait before it is flushed to attached binlogs.system.mysql_binlogs
system table, which shows a list of active binlogs.UnparsedRowsEvent
andMYSQL_UNPARSED_ROWS_EVENT
, which defines that an event is not parsed and should be explicitly parsed later.There are some additional improvements:
@larspars is the author of following:
GTIDSets::contains()
- allows to check if the GTID Sets contains another set.ReplicationHelper
- helper to be used in integration tests.shouldReconnectOnException()
- Improved the logic to retry on failure. Allows to reconnect if needed f.e.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Documentation entry for user-facing changes