Skip to content

Releases: GoogleCloudDataproc/spark-bigquery-connector

0.19.0

25 Feb 22:20
Compare
Choose a tag to compare

New Features

  • Issue #247: Allowing to load results of any arbitrary SELECT query from BigQuery.
  • Issue #310: Allowing to configure the expiration time of materialized data.
  • PR #283: Implemented Datasource v2 write support.
  • Improved Spark 3 compatibility.

Dependency Updates

  • BigQuery API has been upgraded to version 1.127.4
  • BigQuery Storage API has been upgraded to version 1.10.0
  • Guava has been upgraded to version 30.1-jre
  • Netty has been upgraded to version 4.1.52.Final

0.18.1

22 Jan 00:04
Compare
Choose a tag to compare

New Features

  • PR #276: Added the option to enable useAvroLogicalTypes option When writing data to BigQuery.

Bug Fixes

  • Issue #248: Reducing the size of the URI list when writing to BigQuery. This allows larger DataFrames (>10,000 partitions) to be safely written.
  • Issue #296: Removed redundant packaged slf4j-api.

0.18.0

13 Nov 00:18
Compare
Choose a tag to compare

New Features

  • Issue #226: Adding support for HOUR, MONTH, DAY Time Partitions
  • Issue #260: Increasing connection timeout to the BigQuery service, and configuring the request retry settings.

Bug Fixes

  • Issue #263: Fixed select * error when ColumnarBatch is used (DataSource v2)
  • Issue #266: Fixed the external configuration not working regression bug (Introduced in version 0.17.2)
  • PR #262: Filters on BigQuery DATE and TIMESTAMP now use the right type.

Dependency Updates

  • BigQuery API has been upgraded to version 1.123.2
  • BigQuery Storage API has been upgraded to version 1.6.0
  • Guava has been upgraded to version 30.0-jre
  • Netty has been upgraded to version 4.1.51.Final
  • netty-tcnative has been upgraded to version 4.1.34.Final

0.17.3

06 Oct 21:46
Compare
Choose a tag to compare

Bug Fixes

  • PR #242, #243: Fixed Spark 3 compatibility, added Spark 3 acceptance test
  • Issue #249: Fixing credentials creation from key

0.17.2

10 Sep 22:18
Compare
Choose a tag to compare

Bug Fixes

  • PR #239: Ensuring that the BigQuery client will have the proper project id

0.17.1

06 Aug 19:15
Compare
Choose a tag to compare

New Features

  • PR #229: Adding support for Spark ML Vector and Matrix data types

Bug Fixes

  • Issue #216: removed redundant ALPN dependency
  • Issue #219: Fixed the LessThanOrEqual filter SQL compilation in the DataSource v2 implementation
  • Issue #221: Fixed ProtobufUtilsTest.java with newer BigQuery dependencies

Dependency Updates

  • BigQuery API has been upgraded to version 1.116.8
  • BigQuery Storage API has been upgraded to version 1.3.1

0.17.0

22 Jul 00:59
Compare
Choose a tag to compare

New Features

  • Structured streaming write is now supported (PR #201, thanks @varundhussa)
  • Users now has the option to keep the data on GCS after writing to BigQuery (PR #202, thanks @leoneuwald)
  • Enabling to overwrite data of a single date partition (PR #211)
  • Supporting MATERIALIZED_VIEW as table type (PR #192)
  • Supporting columnar batch reads from Spark in the DataSource V2 implementation. (PR #198) It is not ready for production use.

Bug Fixes

  • Conditions on StructType fields are now handled by Spark and not the connector, Fixing Issue #197

Dependency Updates

  • BigQuery API has been upgraded to version 1.116.3
  • BigQuery Storage API has been upgraded to version 1.0.0
  • Netty has been upgraded to version 4.1.48.Final (Fixing issue #200)

0.16.1

11 Jun 18:15
Compare
Choose a tag to compare

New Features

  • Apache Arrow is now the default read format. Based on our benchmarking, Arrow provides read performance faster by 40% then Avro. (PR #180)
  • Apache Avro has been added as a write intermediate format. Based on our testing it shows performance improvements when the DataFrame is larger than 50GB (PR #163)
  • Usage simplification: Now instead of using the table mandatory option, user can use the built in path parameter of load() and save(), so that read becomes df = spark.read.format("bigquery").load("source_table") and write becomes df.write.format("bigquery").save("target_table") (PR #176)
  • An experimental implementation of the DataSource v2 API has been added. It is not ready for production use.

Dependency Updates

  • BigQuery API has been upgraded to version 1.116.1
  • BigQuery Storage API has been upgraded to version 0.133.2-beta
  • gRPC has been upgraded to version 1.29.0
  • Guava has been upgraded to version 29.0-jre

0.15.1-beta

27 Apr 17:39
Compare
Choose a tag to compare

A bug fix release:

  • PR #158: Users can now add the spark.datasource.bigquery prefix to the configuration options in order to support Spark's --conf command line flag
  • PR #160: View materialization is performed only on action, fixing a bug where view materialization was done too early

0.15.0-beta

21 Apr 01:56
Compare
Choose a tag to compare
  • PR #150: Reading DataFrames should be quicker, especially in interactive usage such in notebooks
  • PR #154: Upgraded to the BigQuery Storage v1 API
  • PR #146: Authentication can be done using AccessToken on top of Credentials file, Credentials, and the GOOGLE_APPLICATION_CREDENTIALS environment variable.