Skip to content

Conversation

@ParkMyCar
Copy link
Contributor

@ParkMyCar ParkMyCar commented Apr 7, 2025

This PR implements "purification" in the Adapter for a SQL Server source. For folks unfamiliar, purification is a step during object creation where we query external systems for any necessary information, with the goal of making our persisted SQL "pure".

During purification of a SQL Server source we do the following:

  1. Ensure CDC is enabled for the current database
  2. Ensure snapshot isolation is enabled for the current database
  3. List all tables that currently have CDC enabled, and their capture instances
    • The specified capture instance for each subsource is then persisted in `PurifiedExportDetails

Left as a TODO is implementing support for CREATE TABLE ... FROM SQL SERVER. Nothing blocks our implementation of this feature although because it's not released yet I opted out of implementing it to keep the PR small.

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/8762

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar force-pushed the sql_server/purification branch from bc87fc0 to a4c2f00 Compare April 9, 2025 20:49
@ParkMyCar ParkMyCar marked this pull request as ready for review April 9, 2025 20:58
@ParkMyCar ParkMyCar requested a review from a team as a code owner April 9, 2025 20:58
@ParkMyCar ParkMyCar requested a review from aljoscha April 9, 2025 20:58
@ParkMyCar ParkMyCar force-pushed the sql_server/purification branch from a4c2f00 to 84c2d90 Compare April 9, 2025 21:00
* implement purification for SQL Server sources, lists tables from upstream that have CDC enabled and their capture instances
* returns an error for CREATE TABLE ... FROM SOURCE for SQL Server
@ParkMyCar ParkMyCar force-pushed the sql_server/purification branch from 84c2d90 to 04528d9 Compare April 10, 2025 16:25
Copy link
Contributor

@martykulma martykulma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor notes/questions, nothing blocking 🚢🚢🚢

name,
capture_instance,
columns: columns.into(),
is_cdc_enabled: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is_cdc_enabled implicitly true here because the table has a capture instance associated with it?

There's a ensure_table_cdc_enabled above, but it doesn't look like anything calls it. Wasn't sure if it's needed by an upcoming change or if it's vestigial because we can tell that cdc is enabled for a table via this query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is_cdc_enabled implicitly true here because the table has a capture instance associated with it?

It was! But this field also was unused so I got rid of it. Same with the ensure_table_cdc_enabled query, in a previous iteration it was used but no longer is.

* remove is_cdc_enabled field which was unused
* fix Cargo.toml deps to get WASM build working
* remove unused ensure_table_cdc_enabled function
* refactor a bit of code
@ParkMyCar
Copy link
Contributor Author

@martykulma TFTR!

@ParkMyCar ParkMyCar enabled auto-merge (squash) April 10, 2025 21:23
@ParkMyCar ParkMyCar merged commit 49c017b into MaterializeInc:main Apr 10, 2025
82 checks passed
@ParkMyCar ParkMyCar deleted the sql_server/purification branch April 12, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants