Skip to content

0.16.1

Compare
Choose a tag to compare
@davidrabinowitz davidrabinowitz released this 11 Jun 18:15
· 675 commits to master since this release

New Features

  • Apache Arrow is now the default read format. Based on our benchmarking, Arrow provides read performance faster by 40% then Avro. (PR #180)
  • Apache Avro has been added as a write intermediate format. Based on our testing it shows performance improvements when the DataFrame is larger than 50GB (PR #163)
  • Usage simplification: Now instead of using the table mandatory option, user can use the built in path parameter of load() and save(), so that read becomes df = spark.read.format("bigquery").load("source_table") and write becomes df.write.format("bigquery").save("target_table") (PR #176)
  • An experimental implementation of the DataSource v2 API has been added. It is not ready for production use.

Dependency Updates

  • BigQuery API has been upgraded to version 1.116.1
  • BigQuery Storage API has been upgraded to version 0.133.2-beta
  • gRPC has been upgraded to version 1.29.0
  • Guava has been upgraded to version 29.0-jre