Skip to content

Commit

Permalink
Upgrade to spark 2.4 (#40)
Browse files Browse the repository at this point in the history
* Upgrade to Spark 2.4.
* Provide Databricks instructions.
* Prepare 0.4.0 release.
  • Loading branch information
mengxr committed Nov 15, 2018
1 parent 4f4020b commit 2abcff2
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 5 deletions.
9 changes: 7 additions & 2 deletions README.md
Expand Up @@ -54,9 +54,14 @@ output.show(truncate = false)
+----------------------------------------------+------------------------------------------------------+--------------------------------------------------+---------+
~~~

### Databricks

If you are a Databricks user, please follow the instructions in this
[example notebook](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1233855/1962483213436895/588180/latest.html).

### Dependencies

Because CoreNLP depends on `protobuf-java` 3.x but Spark 2.3 depends on `protobuf-java` 2.x,
Because CoreNLP depends on `protobuf-java` 3.x but Spark 2.4 depends on `protobuf-java` 2.x,
we release `spark-corenlp` as an assembly jar that includes CoreNLP as well as its transitive dependencies,
except `protobuf-java` being shaded.
This might cause issues if you have CoreNLP or its dependencies on the classpath.
Expand All @@ -67,7 +72,7 @@ To use `spark-corenlp`, you need one of the CoreNLP language models:
# Download one of the language models.
wget http://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.9.1/stanford-corenlp-3.9.1-models.jar
# Run spark-shell
spark-shell --packages databricks/spark-corenlp:0.3.1-s_2.11 --jars stanford-corenlp-3.9.1-models.jar
spark-shell --packages databricks/spark-corenlp:0.4.0-spark_2.4-scala_2.11 --jars stanford-corenlp-3.9.1-models.jar
~~~

### Acknowledgements
Expand Down
9 changes: 7 additions & 2 deletions build.sbt
@@ -1,13 +1,18 @@
import ReleaseTransformations._

def majorVersion(version: String) = version.split('.').slice(0, 2).mkString(".")

lazy val commonSettings = Seq(
organization := "databricks",
name := "spark-corenlp",
spName := "databricks/spark-corenlp",
licenses := Seq("GPL-3.0" -> url("http://opensource.org/licenses/GPL-3.0")),
// dependency settings //
scalaVersion := "2.11.8",
sparkVersion := "2.3.1",
sparkVersion := "2.4.0",
version := (version in ThisBuild).value +
s"-spark_${majorVersion(sparkVersion.value)}" +
s"-scala_${majorVersion(scalaVersion.value)}",
initialize := {
val _ = initialize.value
// require Java 8+
Expand All @@ -21,7 +26,7 @@ lazy val commonSettings = Seq(
fork in Test := true,
javaOptions in Test ++= Seq("-Xmx6g"),
// release settings //
spAppendScalaVersion := true,
spAppendScalaVersion := false,
// We only use sbt-release to update version numbers for now.
releaseProcess := Seq[ReleaseStep](
inquireVersions,
Expand Down
2 changes: 1 addition & 1 deletion version.sbt
@@ -1 +1 @@
version in ThisBuild := "0.3.2-SNAPSHOT"
version in ThisBuild := "0.4.0-SNAPSHOT"

0 comments on commit 2abcff2

Please sign in to comment.