Release John Snow Labs Spark-NLP 2.5.5: 28 new Lemma and POS models in 14 languages, bug fixes, and lots of new notebooks! · JohnSnowLabs/spark-nlp

Overview

We are excited to release Spark NLP 2.5.5 with 28 new pretrained models for Lemma and POS in 14 languages, bug fixes, new notebooks, and more!

As always, we would like to thank our community for their feedback, questions, and feature requests.

New Features

Add getClasses() function to NerDLModel
Add getClasses() function to ClassifierDLModel
Add getClasses() function to SentimentDLModel

Example:

ner_model = NerDLModel.pretrained('onto_100')
print(ner_model.getClasses())
#['O', 'B-CARDINAL', 'B-EVENT', 'I-EVENT', 'B-WORK_OF_ART', 'I-WORK_OF_ART', 'B-ORG', 'B-DATE', 'I-DATE', 'I-ORG', 'B-GPE', 'B-PERSON', 'B-PRODUCT', 'B-NORP', 'B-ORDINAL', 'I-PERSON', 'B-MONEY', 'I-MONEY', 'I-GPE', 'B-LOC', 'I-LOC', 'I-CARDINAL', 'B-FAC', 'I-FAC', 'B-LAW', 'I-LAW', 'B-TIME', 'I-TIME', 'B-PERCENT', 'I-PERCENT', 'I-NORP', 'I-PRODUCT', 'B-QUANTITY', 'I-QUANTITY', 'B-LANGUAGE', 'I-ORDINAL', 'I-LANGUAGE', 'X']

Enhancements

Improve max sequence length calculation in BertEmbeddings and XlnetEmbeddings

Bugfixes

Fix a bug in RegexTokenizer in Python
Fix StopWordsCleaner exception in Python when pretrained() is used
Fix max sequence length issue in AlbertEmbeddings and SentencePiece generation
Fix HDFS support for setGaphFolder param in NerDLApproach

Models

We have added 28 new pretrained models for Lemma and POS in 14 languages:

Model	Name	Build	Lang
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`br`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`ca`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`da`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`ga`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`hi`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`hy`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`eu`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`mr`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`yo`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`la`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`lv`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`sl`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`gl`
LemmatizerModel (Lemmatizer)	`lemma`	2.5.5	`id`
PerceptronModel (POS UD)	`pos_ud_keb`	2.5.5	`br`
PerceptronModel (POS UD)	`pos_ud_ancora`	2.5.5	`ca`
PerceptronModel (POS UD)	`pos_ud_ddt`	2.5.5	`da`
PerceptronModel (POS UD)	`pos_ud_idt`	2.5.5	`ga`
PerceptronModel (POS UD)	`pos_ud_hdtb`	2.5.5	`hi`
PerceptronModel (POS UD)	`pos_ud_armtdp`	2.5.5	`hy`
PerceptronModel (POS UD)	`pos_ud_bdt`	2.5.5	`eu`
PerceptronModel (POS UD)	`pos_ud_ufal`	2.5.5	`mr`
PerceptronModel (POS UD)	`pos_ud_ytb`	2.5.5	`yo`
PerceptronModel (POS UD)	`pos_ud_llct`	2.5.5	`la`
PerceptronModel (POS UD)	`pos_ud_lvtb`	2.5.5	`lv`
PerceptronModel (POS UD)	`pos_ud_ssj`	2.5.5	`sl`
PerceptronModel (POS UD)	`pos_ud_treegal`	2.5.5	`gl`
PerceptronModel (POS UD)	`pos_ud_gsd`	2.5.5	`id`

Languages: Armenian, Basque, Breton, Catalan, Danish, Galician, Hindi, Indonesian, Irish, Latin, Latvian, Marathi, Slovenian, Yoruba

Documentation and Notebooks

New notebook for pretrained StopWordsCleaner
New notebook to Detect entities in German language
New notebook to Detect entities in English language
New notebook to Detect entities in Spanish language
New notebook to Detect entities in French language
New notebook to Detect entities in Italian language
New notebook to Detect entities in Norwegian language
New notebook to Detect entities in Polish language
New notebook to Detect entities in Portugese language
New notebook to Detect entities in Russian language
Update documentation for release of Spark NLP 2.5.x
Update the entire spark-nlp-models repository with new pre-trained models and pipelines
Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.x

Installation

Python

#PyPI

pip install spark-nlp==2.5.5

#Conda

conda install -c johnsnowlabs spark-nlp==2.5.5

Spark

spark-nlp on Apache Spark 2.4.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.5

spark-nlp on Apache Spark 2.3.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.5

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.5

Maven

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

FAT JARs

CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-2.5.5.jar
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-2.5.5.jar
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-2.5.5.jar
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-gpu-assembly-2.5.5.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs Spark-NLP 2.5.5: 28 new Lemma and POS models in 14 languages, bug fixes, and lots of new notebooks!