Skip to content

Commit

Permalink
prepare release 0.17.1
Browse files Browse the repository at this point in the history
  • Loading branch information
davidrabinowitz committed Aug 6, 2020
1 parent 211b0c0 commit 3c6e978
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 8 deletions.
8 changes: 8 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Release Notes

## 0.17.1 - 2020-08-xx
* Issue #216: removed redundant ALPN dependency
* Issue #219: Fixed the LessThanOrEqual filter SQL compilation in the DataSource v2 implmentation
* Issue #221: Fixed ProtobufUtilsTest.java with newer BigQuery dependencies
* PR #229: Adding support for Spark ML Vector and Matrix data types
* BigQuery API has been upgraded to version 1.116.8
* BigQuery Storage API has been upgraded to version 1.3.1

## 0.17.0 - 2020-07-15
* PR #201: [Structured streaming write](http://spark.apache.org/docs/2.4.5/structured-streaming-programming-guide.html#starting-streaming-queries)
is now supported (thanks @varundhussa)
Expand Down
24 changes: 17 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ repository. It can be used using the `--packages` option or the

| Scala version | Connector Artifact |
| --- | --- |
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.0` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.17.0` |
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.1` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.17.1` |

## Hello World Example

Expand Down Expand Up @@ -510,6 +510,16 @@ When casting to Timestamp TIME have the same TimeZone issues as DATETIME
</tr>
</table>

#### Spark ML Data Types Support

The Spark ML [Vector](https://spark.apache.org/docs/2.4.5/api/python/pyspark.ml.html#pyspark.ml.linalg.Vector) and
[Matrix](https://spark.apache.org/docs/2.4.5/api/python/pyspark.ml.html#pyspark.ml.linalg.Matrix) are supported,
including their dense and sparse versions. The data is saved as a BigQuery RECORD. Notice that a suffix is added to
the field's description which includes the spark type of the field.

In order to write those types to BigQuery, use the ORC or Avro intermediate format, and have them as column of the
Row (i.e. not a field in a struct).

### Filtering

The connector automatically computes column and pushdown filters the DataFrame's `SELECT` statement e.g.
Expand Down Expand Up @@ -585,7 +595,7 @@ using the following code:
```python
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.0")\
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.1")\
.getOrCreate()
df = spark.read.format("bigquery")\
.load("dataset.table")
Expand All @@ -594,15 +604,15 @@ df = spark.read.format("bigquery")\
**Scala:**
```python
val spark = SparkSession.builder
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.0")
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.1")
.getOrCreate()
val df = spark.read.format("bigquery")
.load("dataset.table")
```

In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
mandatory in 3.0.x), then the relevant package is
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.17.0. In
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.17.1. In
order to know which Scala version is used, please run the following code:

**Python:**
Expand All @@ -626,14 +636,14 @@ To include the connector in your project:
<dependency>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-with-dependencies_${scala.version}</artifactId>
<version>0.17.0</version>
<version>0.17.1</version>
</dependency>
```

### SBT

```sbt
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.17.0"
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.17.1"
```

## Building the Connector
Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ lazy val nettyTcnativeVersion = "2.0.29.Final"

lazy val commonSettings = Seq(
organization := "com.google.cloud.spark",
version := "0.17.1-SNAPSHOT",
version := "0.17.1",
scalaVersion := scala211Version,
crossScalaVersions := Seq(scala211Version, scala212Version)
)
Expand Down

0 comments on commit 3c6e978

Please sign in to comment.