From 6fb47c094f51ed93a2e3a1b2c1da9d3ab32dda17 Mon Sep 17 00:00:00 2001 From: Maziyar Panahi Date: Mon, 10 Apr 2023 17:00:05 +0200 Subject: [PATCH] Add new features and update docs[run doc] --- README.md | 17 ++++++++++------- docs/_layouts/landing.html | 38 ++++++++++++++++++++------------------ docs/en/install.md | 22 +++++++++++----------- python/README.md | 17 ++++++++++------- 4 files changed, 51 insertions(+), 43 deletions(-) diff --git a/README.md b/README.md index fa5682b3092c..82746009c1b8 100644 --- a/README.md +++ b/README.md @@ -19,10 +19,10 @@ Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed environment. -Spark NLP comes with **11000+** pretrained **pipelines** and **models** in more than **200+** languages. -It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, and many more [NLP tasks](#features). +Spark NLP comes with **17000+** pretrained **pipelines** and **models** in more than **200+** languages. +It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features). -**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Google T5**, **MarianMT**, **GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively. +**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively. ## Project's website @@ -137,19 +137,22 @@ documentation and examples - Longformer for Question Answering - Table Question Answering (TAPAS) - Zero-Shot NER Model +- Zero Shot Text Classification by BERT (ZSL) - Neural Machine Translation (MarianMT) - Text-To-Text Transfer Transformer (Google T5) - Generative Pre-trained Transformer 2 (OpenAI GPT2) -- Vision Transformer (ViT) -- Swin Image Classification +- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART) +- Vision Transformer (Google ViT) +- Swin Image Classification (Microsoft Swin Transformer) +- ConvNext Image Classification (Facebook ConvNext) - Automatic Speech Recognition (Wav2Vec2) - Automatic Speech Recognition (HuBERT) - Named entity recognition (Deep learning) - Easy TensorFlow integration - GPU Support - Full integration with Spark ML functions -- +9400 pre-trained models in +200 languages! -- +3200 pre-trained pipelines in +200 languages! +- +12000 pre-trained models in +200 languages! +- +5000 pre-trained pipelines in +200 languages! - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more. diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html index 4ffd3f28d5b8..2c33f6187ec6 100755 --- a/docs/_layouts/landing.html +++ b/docs/_layouts/landing.html @@ -201,22 +201,22 @@

{{ _section.title }}

{% highlight bash %} # Install Spark NLP from PyPI - $ pip install spark-nlp==4.3.2 pyspark==3.3.0 + $ pip install spark-nlp==4.4.0 pyspark==3.3.0 # Install Spark NLP from Anaconda/Conda $ conda install -c johnsnowlabs spark-nlp # Load Spark NLP with Spark Shell - $ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + $ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 # Load Spark NLP with PySpark - $ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + $ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 # Load Spark NLP with Spark Submit - $ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + $ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 # Load Spark NLP as an external Fat JAR - $ spark-shell --jars spark-nlp-assembly-4.3.2.jar + $ spark-shell --jars spark-nlp-assembly-4.4.0.jar {% endhighlight %}
@@ -236,8 +236,8 @@

Transformers at Scale

Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, CamemBERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, DeBERTa, - XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Google T5, MarianMT, and OpenAI GPT2 not only to Python, and R - but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively + XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Facebook BART, Google T5, MarianMT, OpenAI GPT2, + Google ViT, ASR Wav2Vec2 and many more not only to Python, and R but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively
@@ -313,18 +313,11 @@

NLP Features

  • ALBERT Embeddings
  • XLNet Embeddings
  • ELMO Embeddings
  • - - - + {% highlight python %} diff --git a/docs/en/install.md b/docs/en/install.md index 0a777d03bef0..eb5744580b1e 100644 --- a/docs/en/install.md +++ b/docs/en/install.md @@ -106,13 +106,13 @@ spark = SparkSession.builder \ ``` -**spark-nlp-m1:** +**spark-nlp-silicon:** ```xml - + com.johnsnowlabs.nlp - spark-nlp-m1_2.12 + spark-nlp-silicon_2.12 4.4.0 ``` @@ -144,11 +144,11 @@ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.4.0" libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.4.0" ``` -**spark-nlp-m1:** +**spark-nlp-silicon:** ```scala -// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0" +// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0" ``` **spark-nlp-aarch64:** @@ -220,7 +220,7 @@ as expected. Adding Spark NLP to your Scala or Java project is easy: -Simply change to dependency coordinates to `spark-nlp-m1` and add the dependency to your +Simply change to dependency coordinates to `spark-nlp-silicon` and add the dependency to your project. How to do this is mentioned above: [Scala And Java](#scala-and-java) @@ -229,10 +229,10 @@ So for example for Spark NLP with Apache Spark 3.0.x and 3.1.x you will end up w maven coordinates like these: ```xml - + com.johnsnowlabs.nlp - spark-nlp-m1_2.12 + spark-nlp-silicon_2.12 4.4.0 ``` @@ -241,7 +241,7 @@ or in case of sbt: ```scala // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0" ``` If everything went well, you can now start Spark NLP with the `m1` flag set to `true`: @@ -269,7 +269,7 @@ If everything went well, you can now start Spark NLP with the `m1` flag set to ` ```python import sparknlp -spark = sparknlp.start(m1=True) +spark = sparknlp.start(apple_silicon=True) ``` ## Installation for Linux Aarch64 Systems diff --git a/python/README.md b/python/README.md index fa5682b3092c..82746009c1b8 100644 --- a/python/README.md +++ b/python/README.md @@ -19,10 +19,10 @@ Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed environment. -Spark NLP comes with **11000+** pretrained **pipelines** and **models** in more than **200+** languages. -It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, and many more [NLP tasks](#features). +Spark NLP comes with **17000+** pretrained **pipelines** and **models** in more than **200+** languages. +It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features). -**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Google T5**, **MarianMT**, **GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively. +**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively. ## Project's website @@ -137,19 +137,22 @@ documentation and examples - Longformer for Question Answering - Table Question Answering (TAPAS) - Zero-Shot NER Model +- Zero Shot Text Classification by BERT (ZSL) - Neural Machine Translation (MarianMT) - Text-To-Text Transfer Transformer (Google T5) - Generative Pre-trained Transformer 2 (OpenAI GPT2) -- Vision Transformer (ViT) -- Swin Image Classification +- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART) +- Vision Transformer (Google ViT) +- Swin Image Classification (Microsoft Swin Transformer) +- ConvNext Image Classification (Facebook ConvNext) - Automatic Speech Recognition (Wav2Vec2) - Automatic Speech Recognition (HuBERT) - Named entity recognition (Deep learning) - Easy TensorFlow integration - GPU Support - Full integration with Spark ML functions -- +9400 pre-trained models in +200 languages! -- +3200 pre-trained pipelines in +200 languages! +- +12000 pre-trained models in +200 languages! +- +5000 pre-trained pipelines in +200 languages! - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.