Documentation to write data to S3 #166

mayanksingh2298 · 2024-01-24T12:06:38Z

Where / How do I set the access keys and secret keys to enable writing data to s3?

How do I partition data by date? Is there any documentation for this project?

alberttwong · 2024-04-06T04:43:04Z

I couldn't find anything. The easiest way is to use the hudi or iceberg kafka sink to write into S3 and then use Apache xTable to convert it to delta lake.

It's very roundabout but it seems like delta isn't investing in this area.

mightyshazam · 2024-04-08T18:27:27Z

The documentation for writing to s3 is in the Writing to S3 section of the README.
The short answer is to set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Additionally, set AWS_S3_LOCKING_PROVIDER to dynamodb. Then it should work. There are more environment variables for enabling AWS connectivity, but those are not covered in the README, but you can get them in the object store code.

mightyshazam · 2024-04-13T01:28:28Z

To partition your data by date, you do that when creating your table. This project has no mechanism for creating a new delta table. However, it will respect any settings covered by delta-rs

alberttwong · 2024-04-13T01:41:08Z

Not being able to create tables is unfortunate.

Iceberg and hudi's Kafka sink can create tables, insert, upsert and do full crud operations.

mightyshazam · 2024-04-13T16:33:03Z

Not being able to create tables is unfortunate.

Iceberg and hudi's Kafka sink can create tables, insert, upsert and do full crud operations.

Delta lake has a Kafka connector. Spark is also an option. This project is neither of those things. Nonetheless, delta-rs is a building block to make this possible.

alberttwong · 2024-04-13T17:25:34Z

I have only seen a closed source version of the delta lake kafka sink from Confluent (https://docs.confluent.io/kafka-connectors/databricks-delta-lake-sink/current/overview.html#databricks-delta-lake-sink-connector-cp), is that what you're referring to?

mightyshazam · 2024-04-13T17:39:32Z

That is what I was thinking of. Given that, there is probably a space for an open source connector. I have experimented with a kubernetes operator to do what we're talking about. I recommend bringing up the discussion in the delta users slack because there may be more interest in the topic. It's not a bad idea. It just wasn't the original intention of this particular project.

alberttwong mentioned this issue Apr 13, 2024

Delta lake support slingdata-io/sling-cli#26

Open

rtyler closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation to write data to S3 #166

Documentation to write data to S3 #166

mayanksingh2298 commented Jan 24, 2024

alberttwong commented Apr 6, 2024

mightyshazam commented Apr 8, 2024

mightyshazam commented Apr 13, 2024

alberttwong commented Apr 13, 2024 •

edited

Loading

mightyshazam commented Apr 13, 2024

alberttwong commented Apr 13, 2024

mightyshazam commented Apr 13, 2024

Documentation to write data to S3 #166

Documentation to write data to S3 #166

Comments

mayanksingh2298 commented Jan 24, 2024

alberttwong commented Apr 6, 2024

mightyshazam commented Apr 8, 2024

mightyshazam commented Apr 13, 2024

alberttwong commented Apr 13, 2024 • edited Loading

mightyshazam commented Apr 13, 2024

alberttwong commented Apr 13, 2024

mightyshazam commented Apr 13, 2024

alberttwong commented Apr 13, 2024 •

edited

Loading