Skip to content

[improve](streaming-job) support SSL and align MySQL CDC source with PG#62700

Merged
JNSimba merged 7 commits into
apache:masterfrom
JNSimba:feat/mysql-cdc-ssl-align-pg
Apr 30, 2026
Merged

[improve](streaming-job) support SSL and align MySQL CDC source with PG#62700
JNSimba merged 7 commits into
apache:masterfrom
JNSimba:feat/mysql-cdc-ssl-align-pg

Conversation

@JNSimba
Copy link
Copy Markdown
Member

@JNSimba JNSimba commented Apr 22, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #60988

Problem Summary:

The MySQL CDC streaming source currently lacks several configuration knobs that the PostgreSQL path already supports (PG SSL shipped in #60988 and follow-ups for target_table / exclude_columns). This PR brings MySQL up to parity.

  1. SSL encryption with CA verificationssl_mode and ssl_rootcert properties are now wired from job config into Debezium's database.ssl.{mode,truststore,truststore.password}. Because MySQL JDBC does not accept raw PEM, SmallFileMgr gains a getPkcs12TruststorePath() helper that converts the uploaded PEM CA cert into a PKCS12 truststore on first use, reusing the same content-addressed md5 cache as the PG path. The truststore password is a fixed changeit placeholder (public-CA-only truststore; the password is not a security boundary, just a JCA API requirement).
  2. MySQL docker test environmentmysql-5.7.yaml.tpl now mounts the same root CA / server cert / key as PG and starts mysqld with --ssl-ca/cert/key. require_secure_transport is intentionally NOT set so existing useSSL=false tests keep working.
  3. Regression tests — mirror PG's suites for MySQL: test_streaming_mysql_job_ssl, test_streaming_mysql_job_table_mapping, test_streaming_mysql_job_col_filter.
  4. Unit tests — `SmallFileMgrTest` covers the new PEM → PKCS12 conversion: single cert, chain alias safety (so `ca0/ca1` don't overwrite each other), memory cache hit, invalid PEM rejection.

Release note

Support SSL (`ssl_mode`, `ssl_rootcert`) for MySQL CDC streaming insert jobs.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. MySQL CDC streaming jobs now accept `ssl_mode` and `ssl_rootcert` properties. Existing jobs without these properties are unaffected.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 22, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • regression-test/...: the PR adds three new MySQL CDC suites with qt_... assertions, but it does not add the matching generated .out files. Normal regression runs will fail with Missing outputFile before the new coverage can execute.
  • fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java: MySQL ssl_mode is forwarded without MySQL-specific validation or normalization. A PG-style value such as verify-ca is still accepted by FE, but Debezium MySQL only accepts verify_ca/verify_identity, so the reader can fail at startup instead of rejecting the job at create time.
  • fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SmallFileMgr.java: PKCS12_CACHE caches only the generated path, not the file's continued existence. If the .p12 is removed after the first successful conversion, later jobs in the same process get a stale path even though the source PEM is still available.

Critical Checkpoints

  • Goal / correctness: The goal is to add MySQL SSL parity plus test coverage. The main wiring is present, but the invalid-ssl_mode path and missing regression outputs mean the change is not fully ready.
  • Scope / clarity: The patch is reasonably focused on MySQL CDC SSL, small-file conversion, and tests.
  • Concurrency: SmallFileMgr uses per-key synchronization consistently and I did not see a deadlock risk, but the new PKCS12 cache has a lifecycle hole because cached paths are returned without rechecking the on-disk file.
  • Lifecycle / initialization: No cross-TU/static-init concern applies here. The main lifecycle risk is the in-process PKCS12 cache retaining stale paths.
  • Config items: No new FE/BE config items were added. The new MySQL job property handling is incomplete because ssl_mode values are not source-specifically validated.
  • Compatibility: No storage-format or protocol compatibility issue identified.
  • Parallel code paths: PostgreSQL is the existing parallel path. MySQL now mirrors it functionally, but the accepted ssl_mode spellings differ and need explicit handling.
  • Special condition checks: No problematic conditional logic stood out beyond the points above.
  • Test coverage: Positive-path regression and unit coverage were added, but the regression artifacts are incomplete. I also could not run mvn -f fs_brokers/cdc_client/pom.xml -Dtest=SmallFileMgrTest test in this checkout because the module currently fails to compile due to missing org.apache.doris.job.cdc.* classes in the workspace setup.
  • Test result files: The new MySQL regression .out files are missing.
  • Observability: Existing logging is sufficient for this change.
  • Transaction / persistence / data writes: Not applicable to this patch.
  • FE-BE variable passing: Not applicable.
  • Performance: The PEM -> PKCS12 conversion is cached and not on a hot path; no separate performance blocker found.
  • Other issues: No additional blocking issue beyond the inline comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR brings the MySQL CDC streaming-job path up to feature parity with the PostgreSQL implementation by adding SSL support and mirroring existing PG behaviors around table mapping and column exclusion, along with corresponding test and docker-environment updates.

Changes:

  • Add MySQL CDC SSL configuration wiring (ssl_mode / ssl_rootcert) into the Debezium/Flink CDC connector config, including PEM CA → PKCS12 truststore conversion support in SmallFileMgr.
  • Add MySQL regression suites for SSL, target table mapping, and excluded-column filtering.
  • Update the MySQL 5.7 docker-compose template to enable server-side SSL using mounted certs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_table_mapping.groovy New regression coverage for MySQL target_table mapping + multi-table merge into one Doris table.
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_ssl.groovy New regression coverage for MySQL SSL job configuration and incremental correctness.
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_col_filter.groovy New regression coverage for MySQL exclude_columns validation and behavior.
fs_brokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SmallFileMgrTest.java Unit tests for PEM → PKCS12 conversion and caching behaviors.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SmallFileMgr.java Add PKCS12 truststore generation/caching derived from downloaded PEM CA files.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java Wire MySQL SSL settings into Debezium properties and resolve CA file via SmallFileMgr.
docker/thirdparties/docker-compose/mysql/mysql-5.7.yaml.tpl Enable MySQL server SSL by mounting certs and passing --ssl-* flags at startup.
docker/thirdparties/docker-compose/mysql/certs/server.key Add MySQL server private key used by docker-based test env.
docker/thirdparties/docker-compose/mysql/certs/server.crt Add MySQL server certificate used by docker-based test env.
docker/thirdparties/docker-compose/mysql/certs/root.crt Add CA certificate used by docker-based test env and client verification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 22, 2026

/review

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 22, 2026

run buildall

1 similar comment
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 22, 2026

run buildall

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was skipped (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24772066381

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/7) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 23, 2026

/review

@JNSimba JNSimba closed this Apr 23, 2026
@JNSimba JNSimba reopened this Apr 23, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for two blocking FE validation regressions introduced by the shared DataSourceConfigValidator updates.

Critical Checkpoints

  • Goal of this PR: add MySQL SSL support and parity features (ssl_mode, ssl_rootcert, table mapping, exclude columns). The MySQL reader and tests are mostly in place, but the FE validation changes currently break valid existing flows.
  • Small / focused change: mostly yes, but the new SSL validation is shared across all streaming sources, so MySQL-specific assumptions now leak into PostgreSQL and ALTER JOB paths.
  • Concurrency: the new SmallFileMgr PKCS12 cache path looks thread-safe and the lock scope is small; no blocking concurrency issue found.
  • Lifecycle / static init: no lifecycle or static-init issue found in the touched code.
  • Config changes: none.
  • Compatibility: regressed. FE now rejects PostgreSQL ssl_mode values that the PG connector still supports, and it can reject valid ALTER JOB updates because validation runs on partial source-property maps.
  • Parallel code paths: not handled consistently. PostgreSQL remains a parallel source path, but the new validator narrows values globally.
  • Special checks: the new cross-field verify-ca check does not account for merged state during ALTER JOB.
  • Test coverage: MySQL unit/regression coverage was added, but there is no coverage for PostgreSQL SSL-mode compatibility or ALTER JOB partial updates, which is why these regressions slipped through.
  • Modified test results: the new .out files are present now.
  • Observability: sufficient for the new MySQL SSL path.
  • Transaction / persistence / FE-BE variable passing: not applicable here.
  • Performance: truststore conversion/caching looks reasonable; no blocking performance issue found.
  • Other issues: no additional blocking issue found beyond the two comments below.

User focus: no additional focus points were provided.

JNSimba and others added 5 commits April 23, 2026 10:48
- Wire ssl_mode / ssl_rootcert from job properties into Debezium's
  database.ssl.{mode,truststore,truststore.password} in MySqlSourceReader.
- SmallFileMgr gains getPkcs12TruststorePath() which converts the uploaded
  PEM CA cert to a PKCS12 truststore (MySQL JDBC rejects raw PEM),
  reusing the same content-addressed md5 cache as PG.
- mysql-5.7 compose template now mounts shared-with-PG root CA / server
  cert / key and starts mysqld with --ssl-*. require_secure_transport is
  NOT set, so existing useSSL=false tests keep working.
- Regression tests mirror the PG ones for ssl / table_mapping / col_filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add 4 JUnit cases in SmallFileMgrTest covering single cert, chain
  alias safety, memory cache hit and invalid PEM rejection.
- Refactor SmallFileMgr.getPkcs12TruststorePath to expose a 4-arg
  package-private overload (mirrors getFilePath) so tests can inject
  a local directory.
- Replace the 60s sleep in test_streaming_mysql_job_ssl with an
  Awaitility poll on the expected incremental state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pshot JDBC verify

- FE validates ssl_mode against {disable, require, verify-ca} and requires
  ssl_rootcert when ssl_mode=verify-ca is set.
- cdc_client normalizes PG-style ssl_mode to Debezium MySQL's underscore
  spelling, and mirrors SSL config into jdbcProperties using Connector/J
  native names (sslMode / trustCertificateKeyStoreUrl / Type / Password)
  so the snapshot JDBC path actually performs CA verification. Flink CDC's
  forked MySqlConnection drops Debezium SSL props from the snapshot URL,
  so without this mirror snapshot JDBC silently skips CA verification.
- SmallFileMgr's PKCS12 cache fast-path now re-checks File.exists() so a
  deleted truststore is regenerated on next use instead of returning a
  stale path.
- Tests: add DataSourceConfigValidatorTest and MySqlSourceReaderTest;
  split SmallFileMgrTest's cache-hit case into file-present and
  regenerate-when-missing; fix lexicographic '2' <= ... comparison in
  the MySQL SSL regression suite; add the three missing .out files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… rootcert requirement

verify-ca needs a companion ssl_rootcert after the cross-field check, so
exercising it in testSslModeLegalValues (which passes only ssl_mode) now
fails. Leave that case to testVerifyCaWithRootcertPasses which already
supplies ssl_rootcert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eaks

The new docker/.../mysql/certs/server.key is a self-signed test-only
private key (same style as the existing postgresql fixture). Extend the
gitleaks private-key rule with an allowlist scoped strictly to that path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JNSimba JNSimba force-pushed the feat/mysql-cdc-ssl-align-pg branch from ecc49bd to cb87814 Compare April 23, 2026 02:48
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 23, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/7) 🎉
Increment coverage report
Complete coverage report

Resolve add/add conflict on DataSourceConfigValidatorTest.java:
keep both PR's SSL validation tests and master's slot/publication
identifier tests (apache#62526), and update PR's tests to call the new
two-argument validateSource(input, dataSourceType) signature
introduced by master in apache#62490.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 27, 2026

run buildall

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 27, 2026

run feut

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 85.71% (6/7) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 29, 2026

run p0

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 30, 2026

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.1-merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants