Releases: GoogleCloudDataproc/spark-bigquery-connector
Releases · GoogleCloudDataproc/spark-bigquery-connector
0.19.0
New Features
- Issue #247: Allowing to load results of any arbitrary SELECT query from BigQuery.
- Issue #310: Allowing to configure the expiration time of materialized data.
- PR #283: Implemented Datasource v2 write support.
- Improved Spark 3 compatibility.
Dependency Updates
- BigQuery API has been upgraded to version 1.127.4
- BigQuery Storage API has been upgraded to version 1.10.0
- Guava has been upgraded to version 30.1-jre
- Netty has been upgraded to version 4.1.52.Final
0.18.1
New Features
- PR #276: Added the option to enable
useAvroLogicalTypes
option When writing data to BigQuery.
Bug Fixes
0.18.0
New Features
- Issue #226: Adding support for HOUR, MONTH, DAY Time Partitions
- Issue #260: Increasing connection timeout to the BigQuery service, and configuring the request retry settings.
Bug Fixes
- Issue #263: Fixed
select *
error when ColumnarBatch is used (DataSource v2) - Issue #266: Fixed the external configuration not working regression bug (Introduced in version 0.17.2)
- PR #262: Filters on BigQuery DATE and TIMESTAMP now use the right type.
Dependency Updates
- BigQuery API has been upgraded to version 1.123.2
- BigQuery Storage API has been upgraded to version 1.6.0
- Guava has been upgraded to version 30.0-jre
- Netty has been upgraded to version 4.1.51.Final
- netty-tcnative has been upgraded to version 4.1.34.Final
0.17.3
0.17.2
0.17.1
New Features
- PR #229: Adding support for Spark ML Vector and Matrix data types
Bug Fixes
- Issue #216: removed redundant ALPN dependency
- Issue #219: Fixed the LessThanOrEqual filter SQL compilation in the DataSource v2 implementation
- Issue #221: Fixed ProtobufUtilsTest.java with newer BigQuery dependencies
Dependency Updates
- BigQuery API has been upgraded to version 1.116.8
- BigQuery Storage API has been upgraded to version 1.3.1
0.17.0
New Features
- Structured streaming write is now supported (PR #201, thanks @varundhussa)
- Users now has the option to keep the data on GCS after writing to BigQuery (PR #202, thanks @leoneuwald)
- Enabling to overwrite data of a single date partition (PR #211)
- Supporting
MATERIALIZED_VIEW
as table type (PR #192) - Supporting columnar batch reads from Spark in the DataSource V2 implementation. (PR #198) It is not ready for production use.
Bug Fixes
- Conditions on StructType fields are now handled by Spark and not the connector, Fixing Issue #197
Dependency Updates
- BigQuery API has been upgraded to version 1.116.3
- BigQuery Storage API has been upgraded to version 1.0.0
- Netty has been upgraded to version 4.1.48.Final (Fixing issue #200)
0.16.1
New Features
- Apache Arrow is now the default read format. Based on our benchmarking, Arrow provides read performance faster by 40% then Avro. (PR #180)
- Apache Avro has been added as a write intermediate format. Based on our testing it shows performance improvements when the DataFrame is larger than 50GB (PR #163)
- Usage simplification: Now instead of using the
table
mandatory option, user can use the built inpath
parameter ofload()
andsave()
, so that read becomesdf = spark.read.format("bigquery").load("source_table")
and write becomesdf.write.format("bigquery").save("target_table")
(PR #176) - An experimental implementation of the DataSource v2 API has been added. It is not ready for production use.
Dependency Updates
- BigQuery API has been upgraded to version 1.116.1
- BigQuery Storage API has been upgraded to version 0.133.2-beta
- gRPC has been upgraded to version 1.29.0
- Guava has been upgraded to version 29.0-jre
0.15.1-beta
0.15.0-beta
- PR #150: Reading
DataFrame
s should be quicker, especially in interactive usage such in notebooks - PR #154: Upgraded to the BigQuery Storage v1 API
- PR #146: Authentication can be done using AccessToken on top of Credentials file, Credentials, and the
GOOGLE_APPLICATION_CREDENTIALS
environment variable.