John Snow Labs Spark-NLP 2.5.4: Supporting Apache Spark 2.3, 43 new models and 26 new languages, new RegexTokenizer, lots of new notebooks, and more!
Overview
We are excited to release Spark NLP 2.5.4 with the full support of Apache Spark 2.3.x, adding 43 new pre-trained models for stop words cleaning, supporting 26 new languages, a new RegexTokenizer annotator and more!
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
- Add support for Apache Spark 2.3.x including new Maven artifacts and full support of all pre-trained models/pipelines
- Add 43 new pre-trained models in 43 languages to StopWordsCleaner annotator
- Introduce a new RegexTokenizer to split text by regex pattern
Enhancements
- Retrained 6 new BioBERT and ClinicalBERT models
- Add a new param
spark23
tostart()
function to start the session for Apache Spark 2.3.x
Bugfixes
- Add missing library for SentencePiece used by AlbertEmbeddings and XlnetEmbeddings on Windows
- Fix ModuleNotFoundError in LanguageDetectorDL pipelines in Python
Models
- We have added 43 new pre-trained models in 43 languages for StopWordsCleaner. Some selected models:
Afrikaans - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
StopWordsCleaner | stopwords_af |
2.5.4 | af |
Download |
Arabic - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
StopWordsCleaner | stopwords_ar |
2.5.4 | ar |
Download |
Armenian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
StopWordsCleaner | stopwords_hy |
2.5.4 | hy |
Download |
Basque - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
StopWordsCleaner | stopwords_eu |
2.5.4 | eu |
Download |
Bengali - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
StopWordsCleaner | stopwords_bn |
2.5.4 | bn |
Download |
Breton - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
StopWordsCleaner | stopwords_br |
2.5.4 | br |
Download |
Documentation and Notebooks
- New notebook for Language detection and identification
- New notebook for Classify text according to TREC classes
- New notebook for Detect Spam messages
- New notebook for Detect fake news
- New notebook for Find sentiment in text
- New notebook for Detect bullying in tweets
- New notebook for Detect Emotions in text
- New notebook for Detect Sarcasm in text
- Update the entire spark-nlp-models repository with new pre-trained models and pipelines
- Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.x
- Update documentation for release of Spark NLP 2.5.x
Installation
Python
#PyPI
pip install spark-nlp==2.5.4
#Conda
conda install -c johnsnowlabs spark-nlp==2.5.4
Spark
spark-nlp on Apache Spark 2.4.x:
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.4
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.4
spark-nlp on Apache Spark 2.3.x:
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.4
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.4
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.4
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.4
Maven
spark-nlp on Apache Spark 2.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.5.4</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.11</artifactId>
<version>2.5.4</version>
</dependency>
spark-nlp on Apache Spark 2.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark23_2.11</artifactId>
<version>2.5.4</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
<version>2.5.4</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-2.5.4.jar
-
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-2.5.4.jar
-
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-2.5.4.jar
-
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-gpu-assembly-2.5.4.jar