Skip to content

John Snow Labs Spark-NLP 2.6.2: New SentenceDetectorDL, improved BioBERT models, new Models Hub, and other improvements!

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 01 Oct 19:17
· 4500 commits to master since this release

Overview

We are glad to release Spark NLP 2.6.2! This release comes with a brand new SentenceDetectorDL (SDDL) that is based on a general-purpose neural network model for sentence boundary detection with higher accuracy. In addition, we are releasing 12 new and improved BioBERT models for BertEmbeddings and BertSentenceEembeddings used for sequence and text classifications.

Spark NLP has a new and improved Website for its documentation and models. We have been moving our 330+ pretrained models and pipelines into Models Hubs and we would appreciate your feedback! :)

As always, we would like to thank our community for their feedback, questions, and feature requests.


New Features

  • Introducing a new SentenceDetectorDL (trainable) for sentence boundary detection
  • Dedicated Models Hub for all pretrained models & pipelines

Enhancements

  • Improved BioBERT models quality for BertEmbeddings (it achieves higher accuracy in sequence classification)
  • Improved Sentence BioBERT models quality for BertSentenceEmbeddings (it achieves higher accuracy in text classification)
  • Improve loadSavedModel in BertEmbeddings and BertSentenceEmbeddings
  • Add unit test to MultiClassifierDL annotator
  • Better error handling in SentimentDLApproach

Bugfixes

  • Fix BERT LaBSE model for BertSentenceEmbeddings
  • Fix loadSavedModel for BertSentenceEmbeddings in Python

Deprecations

  • DeepSentenceDetector is deprecated in favor of SentenceDetectorDL

Models

Model Name Build Lang
BertEmbeddings biobert_pubmed_base_cased 2.6.2 en
BertEmbeddings biobert_pubmed_large_cased 2.6.2 en
BertEmbeddings biobert_pmc_base_cased 2.6.2 en
BertEmbeddings biobert_pubmed_pmc_base_cased 2.6.2 en
BertEmbeddings biobert_clinical_base_cased 2.6.2 en
BertEmbeddings biobert_discharge_base_cased 2.6.2 en
BertSentenceEmbeddings sent_biobert_pubmed_base_cased 2.6.2 en
BertSentenceEmbeddings sent_biobert_pubmed_large_cased 2.6.2 en
BertSentenceEmbeddings sent_biobert_pmc_base_cased 2.6.2 en
BertSentenceEmbeddings sent_biobert_pubmed_pmc_base_cased 2.6.0 en
BertSentenceEmbeddings sent_biobert_clinical_base_cased 2.6.2 en
BertSentenceEmbeddings sent_biobert_discharge_base_cased 2.6.2 en

The complete list of all 330+ models & pipelines in 46+ languages is available here.


Documentation and Notebooks


Installation

Python

#PyPI

pip install spark-nlp==2.6.2

#Conda

conda install -c johnsnowlabs spark-nlp==2.6.2

Spark

spark-nlp on Apache Spark 2.4.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.6.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.6.2

spark-nlp on Apache Spark 2.3.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.6.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.6.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.6.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.6.2

Maven

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.6.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.11</artifactId>
    <version>2.6.2</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>2.6.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>2.6.2</version>
</dependency>

FAT JARs