Skip to content

Releases: awslabs/amazon-s3-tagging-spark-util

Release v2.0 of Apache Spark based Amazon S3 object tagging Latest

16 Oct 18:49
f9c3da3
Compare
Choose a tag to compare

This is the second release of Amazon S3 tagging Spark Util. This is a library built on top of Apache Spark for tagging Amazon S3 objects. This library helps you to tag objects at table level or partition level.

  • Added Spark 3.x Support
  • Added Glue 3.0 and Glue 4.0 Support
  • Added EMR Support
  • Added Support and Documentation for Scala Spark and PySpark Code in README.md

For details, see README.

Release v1.0 of Apache Spark based Amazon S3 object tagging

20 Nov 09:44
Compare
Choose a tag to compare

This is the first release of Amazon S3 tagging Spark Util. This is a library built on top of Apache Spark for tagging Amazon S3 objects. This library helps you to tag objects at table level or partition level.

With this library, you can store Spark Dataframe like this:

df.write
    .format("s3.parquet") // s3.csv, s3.json, s3.parquet, s3.orc, s3.text, s3.avro 
    .mode(...)
    .option("tags", "{\"FirstTag\": \"TagValue\", \"FileType\":\"parquet\"}")
    .save("s3://<bucket>/sample1/parquet") 

Binary distribution of the library can be found here.

For details, see README.