[improve](streaming-job) support SSL and align MySQL CDC source with PG#62700
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Findings
regression-test/...: the PR adds three new MySQL CDC suites withqt_...assertions, but it does not add the matching generated.outfiles. Normal regression runs will fail withMissing outputFilebefore the new coverage can execute.fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java: MySQLssl_modeis forwarded without MySQL-specific validation or normalization. A PG-style value such asverify-cais still accepted by FE, but Debezium MySQL only acceptsverify_ca/verify_identity, so the reader can fail at startup instead of rejecting the job at create time.fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SmallFileMgr.java:PKCS12_CACHEcaches only the generated path, not the file's continued existence. If the.p12is removed after the first successful conversion, later jobs in the same process get a stale path even though the source PEM is still available.
Critical Checkpoints
- Goal / correctness: The goal is to add MySQL SSL parity plus test coverage. The main wiring is present, but the invalid-
ssl_modepath and missing regression outputs mean the change is not fully ready. - Scope / clarity: The patch is reasonably focused on MySQL CDC SSL, small-file conversion, and tests.
- Concurrency:
SmallFileMgruses per-key synchronization consistently and I did not see a deadlock risk, but the new PKCS12 cache has a lifecycle hole because cached paths are returned without rechecking the on-disk file. - Lifecycle / initialization: No cross-TU/static-init concern applies here. The main lifecycle risk is the in-process PKCS12 cache retaining stale paths.
- Config items: No new FE/BE config items were added. The new MySQL job property handling is incomplete because
ssl_modevalues are not source-specifically validated. - Compatibility: No storage-format or protocol compatibility issue identified.
- Parallel code paths: PostgreSQL is the existing parallel path. MySQL now mirrors it functionally, but the accepted
ssl_modespellings differ and need explicit handling. - Special condition checks: No problematic conditional logic stood out beyond the points above.
- Test coverage: Positive-path regression and unit coverage were added, but the regression artifacts are incomplete. I also could not run
mvn -f fs_brokers/cdc_client/pom.xml -Dtest=SmallFileMgrTest testin this checkout because the module currently fails to compile due to missingorg.apache.doris.job.cdc.*classes in the workspace setup. - Test result files: The new MySQL regression
.outfiles are missing. - Observability: Existing logging is sufficient for this change.
- Transaction / persistence / data writes: Not applicable to this patch.
- FE-BE variable passing: Not applicable.
- Performance: The PEM -> PKCS12 conversion is cached and not on a hot path; no separate performance blocker found.
- Other issues: No additional blocking issue beyond the inline comments.
There was a problem hiding this comment.
Pull request overview
This PR brings the MySQL CDC streaming-job path up to feature parity with the PostgreSQL implementation by adding SSL support and mirroring existing PG behaviors around table mapping and column exclusion, along with corresponding test and docker-environment updates.
Changes:
- Add MySQL CDC SSL configuration wiring (ssl_mode / ssl_rootcert) into the Debezium/Flink CDC connector config, including PEM CA → PKCS12 truststore conversion support in
SmallFileMgr. - Add MySQL regression suites for SSL, target table mapping, and excluded-column filtering.
- Update the MySQL 5.7 docker-compose template to enable server-side SSL using mounted certs.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_table_mapping.groovy | New regression coverage for MySQL target_table mapping + multi-table merge into one Doris table. |
| regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_ssl.groovy | New regression coverage for MySQL SSL job configuration and incremental correctness. |
| regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_col_filter.groovy | New regression coverage for MySQL exclude_columns validation and behavior. |
| fs_brokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SmallFileMgrTest.java | Unit tests for PEM → PKCS12 conversion and caching behaviors. |
| fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SmallFileMgr.java | Add PKCS12 truststore generation/caching derived from downloaded PEM CA files. |
| fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java | Wire MySQL SSL settings into Debezium properties and resolve CA file via SmallFileMgr. |
| docker/thirdparties/docker-compose/mysql/mysql-5.7.yaml.tpl | Enable MySQL server SSL by mounting certs and passing --ssl-* flags at startup. |
| docker/thirdparties/docker-compose/mysql/certs/server.key | Add MySQL server private key used by docker-based test env. |
| docker/thirdparties/docker-compose/mysql/certs/server.crt | Add MySQL server certificate used by docker-based test env. |
| docker/thirdparties/docker-compose/mysql/certs/root.crt | Add CA certificate used by docker-based test env and client verification. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
|
run buildall |
1 similar comment
|
run buildall |
|
OpenCode automated review failed and did not complete. Error: Review step was skipped (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Requesting changes for two blocking FE validation regressions introduced by the shared DataSourceConfigValidator updates.
Critical Checkpoints
- Goal of this PR: add MySQL SSL support and parity features (
ssl_mode,ssl_rootcert, table mapping, exclude columns). The MySQL reader and tests are mostly in place, but the FE validation changes currently break valid existing flows. - Small / focused change: mostly yes, but the new SSL validation is shared across all streaming sources, so MySQL-specific assumptions now leak into PostgreSQL and
ALTER JOBpaths. - Concurrency: the new
SmallFileMgrPKCS12 cache path looks thread-safe and the lock scope is small; no blocking concurrency issue found. - Lifecycle / static init: no lifecycle or static-init issue found in the touched code.
- Config changes: none.
- Compatibility: regressed. FE now rejects PostgreSQL
ssl_modevalues that the PG connector still supports, and it can reject validALTER JOBupdates because validation runs on partial source-property maps. - Parallel code paths: not handled consistently. PostgreSQL remains a parallel source path, but the new validator narrows values globally.
- Special checks: the new cross-field
verify-cacheck does not account for merged state duringALTER JOB. - Test coverage: MySQL unit/regression coverage was added, but there is no coverage for PostgreSQL SSL-mode compatibility or
ALTER JOBpartial updates, which is why these regressions slipped through. - Modified test results: the new
.outfiles are present now. - Observability: sufficient for the new MySQL SSL path.
- Transaction / persistence / FE-BE variable passing: not applicable here.
- Performance: truststore conversion/caching looks reasonable; no blocking performance issue found.
- Other issues: no additional blocking issue found beyond the two comments below.
User focus: no additional focus points were provided.
- Wire ssl_mode / ssl_rootcert from job properties into Debezium's
database.ssl.{mode,truststore,truststore.password} in MySqlSourceReader.
- SmallFileMgr gains getPkcs12TruststorePath() which converts the uploaded
PEM CA cert to a PKCS12 truststore (MySQL JDBC rejects raw PEM),
reusing the same content-addressed md5 cache as PG.
- mysql-5.7 compose template now mounts shared-with-PG root CA / server
cert / key and starts mysqld with --ssl-*. require_secure_transport is
NOT set, so existing useSSL=false tests keep working.
- Regression tests mirror the PG ones for ssl / table_mapping / col_filter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add 4 JUnit cases in SmallFileMgrTest covering single cert, chain alias safety, memory cache hit and invalid PEM rejection. - Refactor SmallFileMgr.getPkcs12TruststorePath to expose a 4-arg package-private overload (mirrors getFilePath) so tests can inject a local directory. - Replace the 60s sleep in test_streaming_mysql_job_ssl with an Awaitility poll on the expected incremental state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pshot JDBC verify
- FE validates ssl_mode against {disable, require, verify-ca} and requires
ssl_rootcert when ssl_mode=verify-ca is set.
- cdc_client normalizes PG-style ssl_mode to Debezium MySQL's underscore
spelling, and mirrors SSL config into jdbcProperties using Connector/J
native names (sslMode / trustCertificateKeyStoreUrl / Type / Password)
so the snapshot JDBC path actually performs CA verification. Flink CDC's
forked MySqlConnection drops Debezium SSL props from the snapshot URL,
so without this mirror snapshot JDBC silently skips CA verification.
- SmallFileMgr's PKCS12 cache fast-path now re-checks File.exists() so a
deleted truststore is regenerated on next use instead of returning a
stale path.
- Tests: add DataSourceConfigValidatorTest and MySqlSourceReaderTest;
split SmallFileMgrTest's cache-hit case into file-present and
regenerate-when-missing; fix lexicographic '2' <= ... comparison in
the MySQL SSL regression suite; add the three missing .out files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… rootcert requirement verify-ca needs a companion ssl_rootcert after the cross-field check, so exercising it in testSslModeLegalValues (which passes only ssl_mode) now fails. Leave that case to testVerifyCaWithRootcertPasses which already supplies ssl_rootcert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eaks The new docker/.../mysql/certs/server.key is a self-signed test-only private key (same style as the existing postgresql fixture). Extend the gitleaks private-key rule with an allowlist scoped strictly to that path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ecc49bd to
cb87814
Compare
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
Resolve add/add conflict on DataSourceConfigValidatorTest.java: keep both PR's SSL validation tests and master's slot/publication identifier tests (apache#62526), and update PR's tests to call the new two-argument validateSource(input, dataSourceType) signature introduced by master in apache#62490. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run buildall |
|
run feut |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
run p0 |
|
run buildall |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #60988
Problem Summary:
The MySQL CDC streaming source currently lacks several configuration knobs that the PostgreSQL path already supports (PG SSL shipped in #60988 and follow-ups for
target_table/exclude_columns). This PR brings MySQL up to parity.ssl_modeandssl_rootcertproperties are now wired from job config into Debezium'sdatabase.ssl.{mode,truststore,truststore.password}. Because MySQL JDBC does not accept raw PEM,SmallFileMgrgains agetPkcs12TruststorePath()helper that converts the uploaded PEM CA cert into a PKCS12 truststore on first use, reusing the same content-addressed md5 cache as the PG path. The truststore password is a fixedchangeitplaceholder (public-CA-only truststore; the password is not a security boundary, just a JCA API requirement).mysql-5.7.yaml.tplnow mounts the same root CA / server cert / key as PG and startsmysqldwith--ssl-ca/cert/key.require_secure_transportis intentionally NOT set so existinguseSSL=falsetests keep working.test_streaming_mysql_job_ssl,test_streaming_mysql_job_table_mapping,test_streaming_mysql_job_col_filter.Release note
Support SSL (`ssl_mode`, `ssl_rootcert`) for MySQL CDC streaming insert jobs.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)