Skip to content

John Snow Labs Spark-NLP 3.3.2: New BERT for Sequence Classification, Comet.ml logging integration, new state-of-the-art BERT topic and sentiment detection models, and bug fixes!

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 03 Nov 15:35
· 2361 commits to master since this release

Overview

We are pleased to release Spark NLP 🚀 3.3.2! This release comes with a new BertForSequenceClassification annotator for existing or fine-tuned models on HuggingFace, new logging feature during training with Comet.ml, New state-of-the-art fine-tuned BERT models for Sequence Classification, and bug fixes!

As always, we would like to thank our community for their feedback, questions, and feature requests.


New Features

  • Introducing BertForSequenceClassification annotator. BertForSequenceClassification can load BERT Models with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks. This annotator is compatible with all the models trained/fine-tuned by using BertForSequenceClassification (PyTorch) or TFBertForSequenceClassification (TensorFlow) in HuggingFace 🤗
  • New support for Comet.ml in Spark NLP to build better models faster.

Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model’s entire lifecycle. From training to production. With just two lines of code, you can start building better models today.

Comet SparkNLP Integration Notebook


Bug Fixes and Enhancements

  • Fix a missing batchSize param in NerDLModel that degraded GPU performance by not allowing users to change the default batchSize
  • Fix NerDLApproach logs format on Databricks
  • Fix EntityRulerApproach name from import
  • Fix missing EntityRulerModel in ResourceDownloader
  • Faster Colab setup script for pyspark 3.0.x and 3.1.x on Java 11

Models

New state-of-the-art fine-tuned BERT models for Sequence Classification in English, French, German, Spanish, Japanese, Turkish, Russian, and multilingual languages.

Featured Pretrained Models

Model Name Build Lang
BertForSequenceClassification bert_multilingual_sequence_classifier_allocine 3.3.2 fr
BertForSequenceClassification bert_large_sequence_classifier_imdb 3.3.2 en
BertForSequenceClassification bert_base_sequence_classifier_imdb 3.3.2 en
BertForSequenceClassification bert_base_sequence_classifier_ag_news 3.3.2 en
BertForSequenceClassification bert_base_sequence_classifier_dbpedia_14 3.3.2 en
BertForSequenceClassification bert_sequence_classifier_turkish_sentiment 3.3.2 tr
BertForSequenceClassification bert_sequence_classifier_sentiment 3.3.2 de
BertForSequenceClassification bert_sequence_classifier_rubert_sentiment 3.3.2 ru
BertForSequenceClassification bert_sequence_classifier_multilingual_sentiment 3.3.2 xx
BertForSequenceClassification bert_sequence_classifier_japanese_sentiment 3.3.2 ja
BertForSequenceClassification bert_sequence_classifier_finbert 3.3.2 en
BertForSequenceClassification bert_sequence_classifier_dehatebert_mono 3.3.2 en
BertForSequenceClassification bert_sequence_classifier_beto_sentiment_analysis 3.3.2 es
BertForSequenceClassification bert_sequence_classifier_beto_emotion_analysis 3.3.2 es

The complete list of all 4000+ models & pipelines in 200+ languages is available on Models Hub.

New Notebooks

Spark NLP Notebooks Colab
BertForSequenceClassification HuggingFace in Spark NLP - BertForSequenceClassification Open In Colab
Comet.ml Comet SparkNLP Integration Notebook Open In Colab

Documentation


Installation

Python

#PyPI

pip install spark-nlp==3.3.2

Spark Packages

spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.2

spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.2

spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.2

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:3.3.2

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.2

Maven

spark-nlp on Apache Spark 3.0.x and 3.1.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>3.3.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>3.3.2</version>
</dependency>

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark24_2.11</artifactId>
    <version>3.3.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark24_2.11</artifactId>
    <version>3.3.2</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>3.3.2</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>3.3.2</version>
</dependency>

FAT JARs