John Snow Labs Spark-NLP 3.3.2: New BERT for Sequence Classification, Comet.ml logging integration, new state-of-the-art BERT topic and sentiment detection models, and bug fixes!
Overview
We are pleased to release Spark NLP 🚀 3.3.2! This release comes with a new BertForSequenceClassification annotator for existing or fine-tuned models on HuggingFace, new logging feature during training with Comet.ml, New state-of-the-art fine-tuned BERT models for Sequence Classification, and bug fixes!
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
- Introducing BertForSequenceClassification annotator. BertForSequenceClassification can load BERT Models with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks. This annotator is compatible with all the models trained/fine-tuned by using BertForSequenceClassification (PyTorch) or TFBertForSequenceClassification (TensorFlow) in HuggingFace 🤗
- New support for Comet.ml in Spark NLP to build better models faster.
Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model’s entire lifecycle. From training to production. With just two lines of code, you can start building better models today.
Comet SparkNLP Integration Notebook
Bug Fixes and Enhancements
- Fix a missing batchSize param in NerDLModel that degraded GPU performance by not allowing users to change the default batchSize
- Fix NerDLApproach logs format on Databricks
- Fix EntityRulerApproach name from import
- Fix missing EntityRulerModel in ResourceDownloader
- Faster Colab setup script for pyspark 3.0.x and 3.1.x on Java 11
Models
New state-of-the-art fine-tuned BERT models for Sequence Classification in English, French, German, Spanish, Japanese, Turkish, Russian, and multilingual languages.
Featured Pretrained Models
Model | Name | Build | Lang |
---|---|---|---|
BertForSequenceClassification | bert_multilingual_sequence_classifier_allocine | 3.3.2 |
fr |
BertForSequenceClassification | bert_large_sequence_classifier_imdb | 3.3.2 |
en |
BertForSequenceClassification | bert_base_sequence_classifier_imdb | 3.3.2 |
en |
BertForSequenceClassification | bert_base_sequence_classifier_ag_news | 3.3.2 |
en |
BertForSequenceClassification | bert_base_sequence_classifier_dbpedia_14 | 3.3.2 |
en |
BertForSequenceClassification | bert_sequence_classifier_turkish_sentiment | 3.3.2 |
tr |
BertForSequenceClassification | bert_sequence_classifier_sentiment | 3.3.2 |
de |
BertForSequenceClassification | bert_sequence_classifier_rubert_sentiment | 3.3.2 |
ru |
BertForSequenceClassification | bert_sequence_classifier_multilingual_sentiment | 3.3.2 |
xx |
BertForSequenceClassification | bert_sequence_classifier_japanese_sentiment | 3.3.2 |
ja |
BertForSequenceClassification | bert_sequence_classifier_finbert | 3.3.2 |
en |
BertForSequenceClassification | bert_sequence_classifier_dehatebert_mono | 3.3.2 |
en |
BertForSequenceClassification | bert_sequence_classifier_beto_sentiment_analysis | 3.3.2 |
es |
BertForSequenceClassification | bert_sequence_classifier_beto_emotion_analysis | 3.3.2 |
es |
The complete list of all 4000+ models & pipelines in 200+ languages is available on Models Hub.
New Notebooks
Spark NLP | Notebooks | Colab |
---|---|---|
BertForSequenceClassification | HuggingFace in Spark NLP - BertForSequenceClassification | |
Comet.ml | Comet SparkNLP Integration Notebook |
Documentation
- TF Hub & HuggingFace to Spark NLP
- Models Hub with new models
- Spark NLP documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
- Spark NLP Workshop notebooks
- Spark NLP publications
- Spark NLP in Action
- Spark NLP training certification notebooks for Google Colab and Databricks
- Spark NLP Display for visualization of different types of annotations
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
Installation
Python
#PyPI
pip install spark-nlp==3.3.2
Spark Packages
spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.2
spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.2
spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.2
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:3.3.2
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.2
Maven
spark-nlp on Apache Spark 3.0.x and 3.1.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>3.3.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>3.3.2</version>
</dependency>
spark-nlp on Apache Spark 2.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark24_2.11</artifactId>
<version>3.3.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark24_2.11</artifactId>
<version>3.3.2</version>
</dependency>
spark-nlp on Apache Spark 2.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark23_2.11</artifactId>
<version>3.3.2</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
<version>3.3.2</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.3.2.jar
-
GPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.3.2.jar
-
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.3.2.jar
-
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.3.2.jar
-
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.3.2.jar
-
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.3.2.jar