From 3572f9186c5cc2842bd283d78bbff578724e8e31 Mon Sep 17 00:00:00 2001 From: David Rabinowitz Date: Thu, 11 Jun 2020 08:06:22 -0700 Subject: [PATCH] prepare release 0.16.1 --- CHANGES.md | 15 +++++++++++---- README.md | 17 +++++++++-------- build.sbt | 2 +- 3 files changed, 21 insertions(+), 13 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index cf5f06f77..ed30915fc 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -1,12 +1,19 @@ # Release Notes +## 0.16.1 - 2020-06-11 +* PR #186: Fixed SparkBigQueryConnectorUserAgentProvider initialization bug + ## 0.16.0 - 2020-06-09 -* Apache Arrow is not the default read format. Based on our benchmarking, Arrow provides read - performance faster by 40% then Avro. (PR #180) -* Usage simplification: Now instead of using the `table` mandatory option, user can use the built +**Please don't use this version, use 0.16.1 instead** + +* PR #180: Apache Arrow is now the default read format. Based on our benchmarking, Arrow provides read + performance faster by 40% then Avro. +* PR #163: Apache Avro was added as a write intermediate format. It shows better performance over parquet + in large (>50GB) datasets. The spark-avro package must be added in runtime in order to use this format. +* PR #176: Usage simplification: Now instead of using the `table` mandatory option, user can use the built in `path` parameter of `load()` and `save()`, so that read becomes `df = spark.read.format("bigquery").load("source_table")` and write becomes - `df.write.format("bigquery").save("target_table")` (PR #176) + `df.write.format("bigquery").save("target_table")` * An experimental implementation of the DataSource v2 API has been added. **It is not ready for production use.** * BigQuery API has been upgraded to version 1.116.1 diff --git a/README.md b/README.md index df2029d7d..1cfa2309e 100644 --- a/README.md +++ b/README.md @@ -76,8 +76,8 @@ repository. It can be used using the `--packages` option or the | Scala version | Connector Artifact | | --- | --- | -| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.16.0` | -| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.16.0` | +| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.16.1` | +| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.16.1` | ## Hello World Example @@ -278,7 +278,8 @@ The API Supports a number of options to configure the read intermediateFormat The format of the data before it is loaded to BigQuery, values can be - either "parquet" or "orc". + either "parquet","orc" or "avro". In order to use the Avro format, the + spark-avro package must be added in runtime.
(Optional. Defaults to parquet). On write only. Write @@ -536,7 +537,7 @@ using the following code: ```python from pyspark.sql import SparkSession spark = SparkSession.builder\ - .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.16.0")\ + .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.16.1")\ .getOrCreate() df = spark.read.format("bigquery")\ .load("dataset.table") @@ -545,7 +546,7 @@ df = spark.read.format("bigquery")\ **Scala:** ```python val spark = SparkSession.builder - .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.16.0") + .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.16.1") .getOrCreate() val df = spark.read.format("bigquery") .load("dataset.table") @@ -553,7 +554,7 @@ val df = spark.read.format("bigquery") In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x, mandatory in 3.0.x), then the relevant package is -com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.16.0. In +com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.16.1. In order to know which Scala version is used, please run the following code: **Python:** @@ -577,14 +578,14 @@ To include the connector in your project: com.google.cloud.spark spark-bigquery-with-dependencies_${scala.version} - 0.16.0 + 0.16.1 ``` ### SBT ```sbt -libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.16.0" +libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.16.1" ``` ## Building the Connector diff --git a/build.sbt b/build.sbt index 313eb98b1..89b57c54a 100644 --- a/build.sbt +++ b/build.sbt @@ -4,7 +4,7 @@ lazy val sparkVersion = "2.4.0" lazy val commonSettings = Seq( organization := "com.google.cloud.spark", - version := "0.16.1-SNAPSHOT", + version := "0.16.1", scalaVersion := scala211Version, crossScalaVersions := Seq(scala211Version, scala212Version) )