Skip to content

John Snow Labs Spark-NLP 2.5.5: 28 new Lemma and POS models in 14 languages, bug fixes, and lots of new notebooks!

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 04 Aug 13:44
· 4693 commits to master since this release

Overview

We are excited to release Spark NLP 2.5.5 with 28 new pretrained models for Lemma and POS in 14 languages, bug fixes, new notebooks, and more!

As always, we would like to thank our community for their feedback, questions, and feature requests.


New Features

  • Add getClasses() function to NerDLModel
  • Add getClasses() function to ClassifierDLModel
  • Add getClasses() function to SentimentDLModel

Example:

ner_model = NerDLModel.pretrained('onto_100')
print(ner_model.getClasses())
#['O', 'B-CARDINAL', 'B-EVENT', 'I-EVENT', 'B-WORK_OF_ART', 'I-WORK_OF_ART', 'B-ORG', 'B-DATE', 'I-DATE', 'I-ORG', 'B-GPE', 'B-PERSON', 'B-PRODUCT', 'B-NORP', 'B-ORDINAL', 'I-PERSON', 'B-MONEY', 'I-MONEY', 'I-GPE', 'B-LOC', 'I-LOC', 'I-CARDINAL', 'B-FAC', 'I-FAC', 'B-LAW', 'I-LAW', 'B-TIME', 'I-TIME', 'B-PERCENT', 'I-PERCENT', 'I-NORP', 'I-PRODUCT', 'B-QUANTITY', 'I-QUANTITY', 'B-LANGUAGE', 'I-ORDINAL', 'I-LANGUAGE', 'X']

Enhancements

  • Improve max sequence length calculation in BertEmbeddings and XlnetEmbeddings

Bugfixes

  • Fix a bug in RegexTokenizer in Python
  • Fix StopWordsCleaner exception in Python when pretrained() is used
  • Fix max sequence length issue in AlbertEmbeddings and SentencePiece generation
  • Fix HDFS support for setGaphFolder param in NerDLApproach

Models

  • We have added 28 new pretrained models for Lemma and POS in 14 languages:
Model Name Build Lang
LemmatizerModel (Lemmatizer) lemma 2.5.5 br
LemmatizerModel (Lemmatizer) lemma 2.5.5 ca
LemmatizerModel (Lemmatizer) lemma 2.5.5 da
LemmatizerModel (Lemmatizer) lemma 2.5.5 ga
LemmatizerModel (Lemmatizer) lemma 2.5.5 hi
LemmatizerModel (Lemmatizer) lemma 2.5.5 hy
LemmatizerModel (Lemmatizer) lemma 2.5.5 eu
LemmatizerModel (Lemmatizer) lemma 2.5.5 mr
LemmatizerModel (Lemmatizer) lemma 2.5.5 yo
LemmatizerModel (Lemmatizer) lemma 2.5.5 la
LemmatizerModel (Lemmatizer) lemma 2.5.5 lv
LemmatizerModel (Lemmatizer) lemma 2.5.5 sl
LemmatizerModel (Lemmatizer) lemma 2.5.5 gl
LemmatizerModel (Lemmatizer) lemma 2.5.5 id
PerceptronModel (POS UD) pos_ud_keb 2.5.5 br
PerceptronModel (POS UD) pos_ud_ancora 2.5.5 ca
PerceptronModel (POS UD) pos_ud_ddt 2.5.5 da
PerceptronModel (POS UD) pos_ud_idt 2.5.5 ga
PerceptronModel (POS UD) pos_ud_hdtb 2.5.5 hi
PerceptronModel (POS UD) pos_ud_armtdp 2.5.5 hy
PerceptronModel (POS UD) pos_ud_bdt 2.5.5 eu
PerceptronModel (POS UD) pos_ud_ufal 2.5.5 mr
PerceptronModel (POS UD) pos_ud_ytb 2.5.5 yo
PerceptronModel (POS UD) pos_ud_llct 2.5.5 la
PerceptronModel (POS UD) pos_ud_lvtb 2.5.5 lv
PerceptronModel (POS UD) pos_ud_ssj 2.5.5 sl
PerceptronModel (POS UD) pos_ud_treegal 2.5.5 gl
PerceptronModel (POS UD) pos_ud_gsd 2.5.5 id

Languages: Armenian, Basque, Breton, Catalan, Danish, Galician, Hindi, Indonesian, Irish, Latin, Latvian, Marathi, Slovenian, Yoruba


Documentation and Notebooks


Installation

Python

#PyPI

pip install spark-nlp==2.5.5

#Conda

conda install -c johnsnowlabs spark-nlp==2.5.5

Spark

spark-nlp on Apache Spark 2.4.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.5

spark-nlp on Apache Spark 2.3.x:

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.5

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.5

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.5

Maven

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>2.5.5</version>
</dependency>

FAT JARs