Releases: awslabs/amazon-s3-tagging-spark-util
Releases · awslabs/amazon-s3-tagging-spark-util
Release v2.0 of Apache Spark based Amazon S3 object tagging Latest
This is the second release of Amazon S3 tagging Spark Util. This is a library built on top of Apache Spark for tagging Amazon S3 objects. This library helps you to tag objects at table level or partition level.
- Added Spark 3.x Support
- Added Glue 3.0 and Glue 4.0 Support
- Added EMR Support
- Added Support and Documentation for Scala Spark and PySpark Code in README.md
For details, see README.
Release v1.0 of Apache Spark based Amazon S3 object tagging
This is the first release of Amazon S3 tagging Spark Util. This is a library built on top of Apache Spark for tagging Amazon S3 objects. This library helps you to tag objects at table level or partition level.
With this library, you can store Spark Dataframe like this:
df.write
.format("s3.parquet") // s3.csv, s3.json, s3.parquet, s3.orc, s3.text, s3.avro
.mode(...)
.option("tags", "{\"FirstTag\": \"TagValue\", \"FileType\":\"parquet\"}")
.save("s3://<bucket>/sample1/parquet")
Binary distribution of the library can be found here.
For details, see README.