John Snow Labs Spark-NLP 2.5.5: 28 new Lemma and POS models in 14 languages, bug fixes, and lots of new notebooks!
Overview
We are excited to release Spark NLP 2.5.5 with 28 new pretrained models for Lemma and POS in 14 languages, bug fixes, new notebooks, and more!
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
- Add getClasses() function to NerDLModel
- Add getClasses() function to ClassifierDLModel
- Add getClasses() function to SentimentDLModel
Example:
ner_model = NerDLModel.pretrained('onto_100')
print(ner_model.getClasses())
#['O', 'B-CARDINAL', 'B-EVENT', 'I-EVENT', 'B-WORK_OF_ART', 'I-WORK_OF_ART', 'B-ORG', 'B-DATE', 'I-DATE', 'I-ORG', 'B-GPE', 'B-PERSON', 'B-PRODUCT', 'B-NORP', 'B-ORDINAL', 'I-PERSON', 'B-MONEY', 'I-MONEY', 'I-GPE', 'B-LOC', 'I-LOC', 'I-CARDINAL', 'B-FAC', 'I-FAC', 'B-LAW', 'I-LAW', 'B-TIME', 'I-TIME', 'B-PERCENT', 'I-PERCENT', 'I-NORP', 'I-PRODUCT', 'B-QUANTITY', 'I-QUANTITY', 'B-LANGUAGE', 'I-ORDINAL', 'I-LANGUAGE', 'X']
Enhancements
- Improve max sequence length calculation in BertEmbeddings and XlnetEmbeddings
Bugfixes
- Fix a bug in RegexTokenizer in Python
- Fix StopWordsCleaner exception in Python when pretrained() is used
- Fix max sequence length issue in AlbertEmbeddings and SentencePiece generation
- Fix HDFS support for setGaphFolder param in NerDLApproach
Models
- We have added 28 new pretrained models for Lemma and POS in 14 languages:
Model | Name | Build | Lang |
---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | br |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | ca |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | da |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | ga |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | hi |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | hy |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | eu |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | mr |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | yo |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | la |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | lv |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | sl |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | gl |
LemmatizerModel (Lemmatizer) | lemma |
2.5.5 | id |
PerceptronModel (POS UD) | pos_ud_keb |
2.5.5 | br |
PerceptronModel (POS UD) | pos_ud_ancora |
2.5.5 | ca |
PerceptronModel (POS UD) | pos_ud_ddt |
2.5.5 | da |
PerceptronModel (POS UD) | pos_ud_idt |
2.5.5 | ga |
PerceptronModel (POS UD) | pos_ud_hdtb |
2.5.5 | hi |
PerceptronModel (POS UD) | pos_ud_armtdp |
2.5.5 | hy |
PerceptronModel (POS UD) | pos_ud_bdt |
2.5.5 | eu |
PerceptronModel (POS UD) | pos_ud_ufal |
2.5.5 | mr |
PerceptronModel (POS UD) | pos_ud_ytb |
2.5.5 | yo |
PerceptronModel (POS UD) | pos_ud_llct |
2.5.5 | la |
PerceptronModel (POS UD) | pos_ud_lvtb |
2.5.5 | lv |
PerceptronModel (POS UD) | pos_ud_ssj |
2.5.5 | sl |
PerceptronModel (POS UD) | pos_ud_treegal |
2.5.5 | gl |
PerceptronModel (POS UD) | pos_ud_gsd |
2.5.5 | id |
Languages: Armenian, Basque, Breton, Catalan, Danish, Galician, Hindi, Indonesian, Irish, Latin, Latvian, Marathi, Slovenian, Yoruba
Documentation and Notebooks
- New notebook for pretrained StopWordsCleaner
- New notebook to Detect entities in German language
- New notebook to Detect entities in English language
- New notebook to Detect entities in Spanish language
- New notebook to Detect entities in French language
- New notebook to Detect entities in Italian language
- New notebook to Detect entities in Norwegian language
- New notebook to Detect entities in Polish language
- New notebook to Detect entities in Portugese language
- New notebook to Detect entities in Russian language
- Update documentation for release of Spark NLP 2.5.x
- Update the entire spark-nlp-models repository with new pre-trained models and pipelines
- Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.x
Installation
Python
#PyPI
pip install spark-nlp==2.5.5
#Conda
conda install -c johnsnowlabs spark-nlp==2.5.5
Spark
spark-nlp on Apache Spark 2.4.x:
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.5
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.5.5
spark-nlp on Apache Spark 2.3.x:
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:2.5.5
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.5
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:2.5.5
Maven
spark-nlp on Apache Spark 2.4.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>2.5.5</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.11</artifactId>
<version>2.5.5</version>
</dependency>
spark-nlp on Apache Spark 2.3.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-spark23_2.11</artifactId>
<version>2.5.5</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
<version>2.5.5</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-2.5.5.jar
-
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-2.5.5.jar
-
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-2.5.5.jar
-
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-gpu-assembly-2.5.5.jar