Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation to write data to S3 #166

Closed
mayanksingh2298 opened this issue Jan 24, 2024 · 7 comments
Closed

Documentation to write data to S3 #166

mayanksingh2298 opened this issue Jan 24, 2024 · 7 comments

Comments

@mayanksingh2298
Copy link

Where / How do I set the access keys and secret keys to enable writing data to s3?

How do I partition data by date? Is there any documentation for this project?

@alberttwong
Copy link

I couldn't find anything. The easiest way is to use the hudi or iceberg kafka sink to write into S3 and then use Apache xTable to convert it to delta lake.

It's very roundabout but it seems like delta isn't investing in this area.

@mightyshazam
Copy link
Collaborator

The documentation for writing to s3 is in the Writing to S3 section of the README.
The short answer is to set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Additionally, set AWS_S3_LOCKING_PROVIDER to dynamodb. Then it should work. There are more environment variables for enabling AWS connectivity, but those are not covered in the README, but you can get them in the object store code.

@mightyshazam
Copy link
Collaborator

To partition your data by date, you do that when creating your table. This project has no mechanism for creating a new delta table. However, it will respect any settings covered by delta-rs

@alberttwong
Copy link

alberttwong commented Apr 13, 2024

Not being able to create tables is unfortunate.

Iceberg and hudi's Kafka sink can create tables, insert, upsert and do full crud operations.

@rtyler rtyler closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024
@mightyshazam
Copy link
Collaborator

Not being able to create tables is unfortunate.

Iceberg and hudi's Kafka sink can create tables, insert, upsert and do full crud operations.

Delta lake has a Kafka connector. Spark is also an option. This project is neither of those things. Nonetheless, delta-rs is a building block to make this possible.

@alberttwong
Copy link

I have only seen a closed source version of the delta lake kafka sink from Confluent (https://docs.confluent.io/kafka-connectors/databricks-delta-lake-sink/current/overview.html#databricks-delta-lake-sink-connector-cp), is that what you're referring to?

@mightyshazam
Copy link
Collaborator

That is what I was thinking of. Given that, there is probably a space for an open source connector. I have experimented with a kubernetes operator to do what we're talking about. I recommend bringing up the discussion in the delta users slack because there may be more interest in the topic. It's not a bad idea. It just wasn't the original intention of this particular project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants