John Snow Labs Spark-NLP 2.6.2: New SentenceDetectorDL, improved BioBERT models, new Models Hub, and other improvements!
Overview
We are glad to release Spark NLP 2.6.2! This release comes with a brand new SentenceDetectorDL (SDDL) that is based on a general-purpose neural network model for sentence boundary detection with higher accuracy. In addition, we are releasing 12 new and improved BioBERT models for BertEmbeddings and BertSentenceEembeddings used for sequence and text classifications.
Spark NLP has a new and improved Website for its documentation and models. We have been moving our 330+ pretrained models and pipelines into Models Hubs and we would appreciate your feedback! :)
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
- Introducing a new SentenceDetectorDL (trainable) for sentence boundary detection
- Dedicated Models Hub for all pretrained models & pipelines
Enhancements
- Improved BioBERT models quality for BertEmbeddings (it achieves higher accuracy in sequence classification)
- Improved Sentence BioBERT models quality for BertSentenceEmbeddings (it achieves higher accuracy in text classification)
- Improve loadSavedModel in BertEmbeddings and BertSentenceEmbeddings
- Add unit test to MultiClassifierDL annotator
- Better error handling in SentimentDLApproach
Bugfixes
- Fix BERT LaBSE model for BertSentenceEmbeddings
- Fix loadSavedModel for BertSentenceEmbeddings in Python
Deprecations
- DeepSentenceDetector is deprecated in favor of SentenceDetectorDL
Models
Model | Name | Build | Lang |
---|---|---|---|
BertEmbeddings | biobert_pubmed_base_cased |
2.6.2 | en |
BertEmbeddings | biobert_pubmed_large_cased |
2.6.2 | en |
BertEmbeddings | biobert_pmc_base_cased |
2.6.2 | en |
BertEmbeddings | biobert_pubmed_pmc_base_cased |
2.6.2 | en |
BertEmbeddings | biobert_clinical_base_cased |
2.6.2 | en |
BertEmbeddings | biobert_discharge_base_cased |
2.6.2 | en |
BertSentenceEmbeddings | sent_biobert_pubmed_base_cased |
2.6.2 | en |
BertSentenceEmbeddings | sent_biobert_pubmed_large_cased |
2.6.2 | en |
BertSentenceEmbeddings | sent_biobert_pmc_base_cased |
2.6.2 | en |
BertSentenceEmbeddings | sent_biobert_pubmed_pmc_base_cased |
2.6.0 | en |
BertSentenceEmbeddings | sent_biobert_clinical_base_cased |
2.6.2 | en |
BertSentenceEmbeddings | sent_biobert_discharge_base_cased |
2.6.2 | en |
The complete list of all 330+ models & pipelines in 46+ languages is available here.
Documentation and Notebooks
- New notebook to use SentenceDetectorDL
- Update Model Hubs with new models in Spark NLP 2.6.2
- Update documentation for release of Spark NLP 2.6.2
- Update the entire spark-nlp-workshop notebooks for Spark NLP 2.6.2
Installation
Python
#PyPI
pip install spark-nlp==2.6.2
#Conda
conda install -c johnsnowlabs spark-nlp==2.6.2
Spark
spark-nlp on Apache Spark 2.4.x:
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.6.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.6.2
spark-nlp on Apache Spark 2.3.x:
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.6.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.6.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.6.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.6.2
Maven
spark-nlp on Apache Spark 2.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.6.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.11</artifactId>
<version>2.6.2</version>
</dependency>
spark-nlp on Apache Spark 2.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark23_2.11</artifactId>
<version>2.6.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
<version>2.6.2</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-2.6.2.jar
-
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-2.6.2.jar
-
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-2.6.2.jar
-
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-gpu-assembly-2.6.2.jar