Skip to content

John Snow Labs Spark-NLP 2.5.4: Supporting Apache Spark 2.3, 43 new models and 26 new languages, new RegexTokenizer, lots of new notebooks, and more!

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 20 Jul 14:47
· 4730 commits to master since this release

Overview

We are excited to release Spark NLP 2.5.4 with the full support of Apache Spark 2.3.x, adding 43 new pre-trained models for stop words cleaning, supporting 26 new languages, a new RegexTokenizer annotator and more!

As always, we would like to thank our community for their feedback, questions, and feature requests.


New Features

  • Add support for Apache Spark 2.3.x including new Maven artifacts and full support of all pre-trained models/pipelines
  • Add 43 new pre-trained models in 43 languages to StopWordsCleaner annotator
  • Introduce a new RegexTokenizer to split text by regex pattern

Enhancements

  • Retrained 6 new BioBERT and ClinicalBERT models
  • Add a new param spark23 to start() function to start the session for Apache Spark 2.3.x

Bugfixes

  • Add missing library for SentencePiece used by AlbertEmbeddings and XlnetEmbeddings on Windows
  • Fix ModuleNotFoundError in LanguageDetectorDL pipelines in Python

Models

  • We have added 43 new pre-trained models in 43 languages for StopWordsCleaner. Some selected models:

Afrikaans - Models

Model Name Build Lang Offline
StopWordsCleaner stopwords_af 2.5.4 af Download

Arabic - Models

Model Name Build Lang Offline
StopWordsCleaner stopwords_ar 2.5.4 ar Download

Armenian - Models

Model Name Build Lang Offline
StopWordsCleaner stopwords_hy 2.5.4 hy Download

Basque - Models

Model Name Build Lang Offline
StopWordsCleaner stopwords_eu 2.5.4 eu Download

Bengali - Models

Model Name Build Lang Offline
StopWordsCleaner stopwords_bn 2.5.4 bn Download

Breton - Models

Model Name Build Lang Offline
StopWordsCleaner stopwords_br 2.5.4 br Download

Documentation and Notebooks


Installation

Python

#PyPI

pip install spark-nlp==2.5.4

#Conda

conda install -c johnsnowlabs spark-nlp==2.5.4

Spark

spark-nlp on Apache Spark 2.4.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.4

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.4

spark-nlp on Apache Spark 2.3.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.4

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.4

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.4

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.4

Maven

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.5.4</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.11</artifactId>
    <version>2.5.4</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>2.5.4</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>2.5.4</version>
</dependency>

FAT JARs