Skip to content

feat(connectors): Delta Lake Sink Connector#2889

Open
kriti-sc wants to merge 4 commits intoapache:masterfrom
kriti-sc:delta-sink
Open

feat(connectors): Delta Lake Sink Connector#2889
kriti-sc wants to merge 4 commits intoapache:masterfrom
kriti-sc:delta-sink

Conversation

@kriti-sc
Copy link
Contributor

@kriti-sc kriti-sc commented Mar 7, 2026

Which issue does this PR close?

Closes #1852

Rationale

Delta Lake is a data analytics engine, and very popular in modern streaming analytics architectures.

What changed?

Introduces a Delta Lake Sink Connector that enables writing data from Iggy to Delta Lake.

The Delta Lake writing logic is heavily inspired by the kafka-delta-ingest project, to have a proven starting ground for writing to Delta Lake.

Local Execution

  1. Produced 32632 messages with schema user_id: String, user_type: u8, email: String, source: String, state: String, created_at: DateTime<Utc>, message: String using sample data producer.
  2. Consumed messages using the Delta Lake sink and created a Delta table on filesystem.
  3. Verified number of rows in delta table and the schema.
  4. Added unit tests and e2e tests, both passing.
image Left: messages produced; Right(top): messages consumed by Delta sink; Right(bottom): Inspecting Delta table in python

AI Usage

If AI tools were used, please answer:

  1. Which tools? Claude Code
  2. Scope of usage? generated functions
  3. How did you verify the generated code works correctly? Manual testing by producing data into Iggy and then running the sink and verifying local Delta Lake creation, unit tests and e2e tests for local Delta Lake and Delta Lake on S3.
  4. Can you explain every line of the code if asked? Yes

@codecov
Copy link

codecov bot commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 95.72271% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.59%. Comparing base (ecd7709) to head (39db882).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
core/connectors/sinks/delta_sink/src/sink.rs 79.48% 24 Missing ⚠️
core/connectors/sinks/delta_sink/src/coercions.rs 98.59% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2889      +/-   ##
============================================
+ Coverage     68.36%   68.59%   +0.22%     
  Complexity      739      739              
============================================
  Files          1053     1057       +4     
  Lines         84763    85441     +678     
  Branches      61297    61985     +688     
============================================
+ Hits          57948    58605     +657     
- Misses        24448    24462      +14     
- Partials       2367     2374       +7     
Flag Coverage Δ
csharp 67.43% <ø> (-0.19%) ⬇️
go 6.27% <ø> (ø)
java 54.83% <ø> (ø)
node 92.26% <ø> (-0.15%) ⬇️
python 81.57% <ø> (ø)
rust 70.37% <95.72%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
core/connectors/sinks/delta_sink/src/lib.rs 100.00% <100.00%> (ø)
core/connectors/sinks/delta_sink/src/storage.rs 100.00% <100.00%> (ø)
core/connectors/sinks/delta_sink/src/coercions.rs 98.59% <98.59%> (ø)
core/connectors/sinks/delta_sink/src/sink.rs 79.48% <79.48%> (ø)

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +39 to +43
let endpoint_url = config
.aws_s3_endpoint_url
.as_ref()
.ok_or(Error::InvalidConfig)?;
let allow_http = config.aws_s3_allow_http.ok_or(Error::InvalidConfig)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws_s3_endpoint_url and aws_s3_allow_http are hard-required via .ok_or(Error::InvalidConfig)?, but both are optional in delta-rs/object_store. AWS_ENDPOINT_URL defaults to the standard AWS S3 regional endpoint when omitted, and AWS_ALLOW_HTTP defaults to false. This means users connecting to real AWS S3 (not MinIO/LocalStack) are forced to provide values that shouldn't be necessary. These two fields should be added to the options map only when present, not treated as required.

Comment on lines +57 to +64
let account_key = config
.azure_storage_account_key
.as_ref()
.ok_or(Error::InvalidConfig)?;
let sas_token = config
.azure_storage_sas_token
.as_ref()
.ok_or(Error::InvalidConfig)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both azure_storage_account_key and azure_storage_sas_token are required simultaneously via .ok_or(Error::InvalidConfig)?. In Azure, these are alternative authentication methods. you use either an account key or a SAS token, not both. This blocks users who only have a SAS token (common in restricted-access scenarios). The code should require account_name and at least one of account_key or sas_token, and only insert whichever credential is provided.


fn apply_coercion(value: &mut Value, node: &CoercionNode) {
match node {
CoercionNode::Coercion(Coercion::ToString) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL semantics treat NULL as a special type and its never coerced to any other type in most cases.

This can lead to subtle but hard to debug bugs.

Copy link
Contributor Author

@kriti-sc kriti-sc Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Thank you for catching this, fixing this in the next commit. I had misunderstood your point earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Delta Lake connectors

3 participants