Add new features and update docs[run doc]

JohnSnowLabs · Apr 10, 2023 · 6fb47c0 · 6fb47c0
1 parent 46d4cb4
commit 6fb47c0
Show file tree

Hide file tree

Showing 4 changed files with 51 additions and 43 deletions.
diff --git a/README.md b/README.md
@@ -19,10 +19,10 @@
 
 Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
 environment.
-Spark NLP comes with **11000+** pretrained **pipelines** and **models** in more than **200+** languages.
-It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, and many more [NLP tasks](#features).
+Spark NLP comes with **17000+** pretrained **pipelines** and **models** in more than **200+** languages.
+It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
 
-**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Google T5**, **MarianMT**, **GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
+**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
 
 ## Project's website
 
@@ -137,19 +137,22 @@ documentation and examples
 - Longformer for Question Answering
 - Table Question Answering (TAPAS)
 - Zero-Shot NER Model
+- Zero Shot Text Classification by BERT (ZSL)
 - Neural Machine Translation (MarianMT)
 - Text-To-Text Transfer Transformer (Google T5)
 - Generative Pre-trained Transformer 2 (OpenAI GPT2)
-- Vision Transformer (ViT)
-- Swin Image Classification
+- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
+- Vision Transformer (Google ViT)
+- Swin Image Classification (Microsoft Swin Transformer)
+- ConvNext Image Classification (Facebook ConvNext)
 - Automatic Speech Recognition (Wav2Vec2)
 - Automatic Speech Recognition (HuBERT)
 - Named entity recognition (Deep learning)
 - Easy TensorFlow integration
 - GPU Support
 - Full integration with Spark ML functions
-- +9400 pre-trained models in +200 languages!
-- +3200 pre-trained pipelines in +200 languages!
+- +12000 pre-trained models in +200 languages!
+- +5000 pre-trained pipelines in +200 languages!
 - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
   Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.
 

diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html
@@ -201,22 +201,22 @@ <h3 class="grey h3_title">{{ _section.title }}</h3>
                   <div class="highlight-box">
     {% highlight bash %}
     # Install Spark NLP from PyPI
-    $ pip install spark-nlp==4.3.2 pyspark==3.3.0
+    $ pip install spark-nlp==4.4.0 pyspark==3.3.0
 
     # Install Spark NLP from Anaconda/Conda
     $ conda install -c johnsnowlabs spark-nlp
 
     # Load Spark NLP with Spark Shell
-    $ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2
+    $ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0
 
     # Load Spark NLP with PySpark
-    $ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2
+    $ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0
 
     # Load Spark NLP with Spark Submit
-    $ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2
+    $ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0
 
     # Load Spark NLP as an external Fat JAR
-    $ spark-shell --jars spark-nlp-assembly-4.3.2.jar
+    $ spark-shell --jars spark-nlp-assembly-4.4.0.jar
     {% endhighlight %}
                     </div>
                 </div>
@@ -236,8 +236,8 @@ <h2 class="h2_title grey">Transformers at Scale</h2>
                   <div class="transformer-descr">
                     <b>Spark NLP</b> is the only open-source NLP library in <b>production</b> that offers state-of-the-art transformers such as
                     <b>BERT</b>, <b>CamemBERT</b>, <b>ALBERT</b>, <b>ELECTRA</b>, <b>XLNet</b>, <b>DistilBERT</b>, <b>RoBERTa</b>, <b>DeBERTa</b>,
-                    <b>XLM-RoBERTa</b>, <b>Longformer</b>, <b>ELMO</b>, <b>Universal Sentence Encoder</b>, <b>Google T5</b>, <b>MarianMT</b>, and <b>OpenAI GPT2</b> not only to <b>Python</b>, and <b>R</b>
-                    but also to <b>JVM</b> ecosystem (<b>Java</b>, <b>Scala</b>, and <b>Kotlin</b>) at <b>scale</b> by extending <b>Apache Spark</b> natively
+                    <b>XLM-RoBERTa</b>, <b>Longformer</b>, <b>ELMO</b>, <b>Universal Sentence Encoder</b>, <b>Facebook BART</b>, <b>Google T5</b>, <b>MarianMT</b>, <b>OpenAI GPT2</b>,
+                    <b>Google ViT</b>, <b>ASR Wav2Vec2</b> and many more not only to <b>Python</b>, and <b>R</b> but also to <b>JVM</b> ecosystem (<b>Java</b>, <b>Scala</b>, and <b>Kotlin</b>) at <b>scale</b> by extending <b>Apache Spark</b> natively
                   </div>
                 </div>
               </div>
@@ -313,18 +313,11 @@ <h4 class="blue h4_title">NLP Features</h4>
                     <li><strong>ALBERT</strong> Embeddings</li>
                     <li><strong>XLNet</strong> Embeddings</li>
                     <li><strong>ELMO</strong> Embeddings</li>
-
-                  </ul>
-                  <ul class="list1">
-
                     <li><strong>Universal Sentence</strong> Encoder</li>
                     <li><strong>Sentence</strong> Embeddings</li>
                     <li><strong>Chunk</strong> Embeddings</li>
-                    <li>Neural <strong>Machine Translation</strong> (MarianMT)</li>
-                    <li><strong>Text-To-Text</strong> Transfer Transformer <strong>(Google T5)</strong></li>
-                    <li><strong>Generative Pre-trained</strong> Transformer 2 <strong>(OpenAI GPT-2)</strong></li>
-                    <li>Vision Transformer (ViT) <strong>Image Classification</strong></li>
-                    <li>Automatic Speech Recognition <strong>(Wav2Vec2)</strong></li>
+                  </ul>
+                  <ul class="list1">
                     <li>Table Question Answering <strong>(TAPAS)</strong></li>
                     <li>Unsupervised <strong>keywords extraction</strong></li>
                     <li>Language <strong>Detection</strong> & <strong>Identification</strong> (up to 375 languages)</li>
@@ -342,11 +335,20 @@ <h4 class="blue h4_title">NLP Features</h4>
                     <li>Longformer for <strong>Token & Sequence Classification</strong></li>
                     <li>Transformer-based <strong>Question Answering</strong></li>
                     <li><strong>Named entity</strong> recognition (DL model)</li>
+                    <li>Facebook BART <strong>NLG, Translation, and Comprehension</strong></li>
+                    <li>Zero-Shot <strong>NER & Text</strong> Classification (ZSL)</li>
+                    <li>Neural <strong>Machine Translation</strong> (MarianMT)</li>
+                    <li><strong>Text-To-Text</strong> Transfer Transformer <strong>(Google T5)</strong></li>
+                    <li><strong>Generative Pre-trained</strong> Transformer 2 <strong>(OpenAI GPT-2)</strong></li>
+                    <li>Vision Transformer (Google ViT) <strong>Image Classification</strong></li>
+                    <li>Microsoft Swin Transformer <strong>Image Classification</strong></li>
+                    <li>Facebook ConvNext <strong>Image Classification</strong></li>
+                    <li>Automatic Speech Recognition <strong>(Wav2Vec2 & HuBERT)</strong></li>
                     <li>Easy <strong>TensorFlow</strong> integration</li>
                     <li><strong>GPU</strong> Support</li>
                     <li>Full integration with <strong>Spark ML</strong> functions</li>
-                    <li><strong>9400+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
-                    <li><strong>3200+</strong> pre-trained <strong>pipelines </strong> in <strong>200+ languages! </strong>
+                    <li><strong>12000+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
+                    <li><strong>5000+</strong> pre-trained <strong>pipelines </strong> in <strong>200+ languages! </strong>
                   </ul>
                 </div>
 {% highlight python %}

diff --git a/docs/en/install.md b/docs/en/install.md
@@ -106,13 +106,13 @@ spark = SparkSession.builder \
 </dependency>
 ```
 
-**spark-nlp-m1:**
+**spark-nlp-silicon:**
 
 ```xml
-<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1 -->
+<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -->
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
-    <artifactId>spark-nlp-m1_2.12</artifactId>
+    <artifactId>spark-nlp-silicon_2.12</artifactId>
     <version>4.4.0</version>
 </dependency>
 ```
@@ -144,11 +144,11 @@ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.4.0"
 libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.4.0"
 ```
 
-**spark-nlp-m1:**
+**spark-nlp-silicon:**
 
 ```scala
-// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0"
+// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0"
 ```
 
 **spark-nlp-aarch64:**
@@ -220,7 +220,7 @@ as expected.
 
 Adding Spark NLP to your Scala or Java project is easy:
 
-Simply change to dependency coordinates to `spark-nlp-m1` and add the dependency to your
+Simply change to dependency coordinates to `spark-nlp-silicon` and add the dependency to your
 project.
 
 How to do this is mentioned above: [Scala And Java](#scala-and-java)
@@ -229,10 +229,10 @@ So for example for Spark NLP with Apache Spark 3.0.x and 3.1.x you will end up w
 maven coordinates like these:
 
 ```xml
-<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1 -->
+<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -->
 <dependency>
     <groupId>com.johnsnowlabs.nlp</groupId>
-    <artifactId>spark-nlp-m1_2.12</artifactId>
+    <artifactId>spark-nlp-silicon_2.12</artifactId>
     <version>4.4.0</version>
 </dependency>
 ```
@@ -241,7 +241,7 @@ or in case of sbt:
 
 ```scala
 // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0"
 ```
 
 If everything went well, you can now start Spark NLP with the `m1` flag set to `true`:
@@ -269,7 +269,7 @@ If everything went well, you can now start Spark NLP with the `m1` flag set to `
 ```python
 import sparknlp
 
-spark = sparknlp.start(m1=True)
+spark = sparknlp.start(apple_silicon=True)
 ```
 
 ## Installation for Linux Aarch64 Systems

diff --git a/python/README.md b/python/README.md
@@ -19,10 +19,10 @@
 
 Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
 environment.
-Spark NLP comes with **11000+** pretrained **pipelines** and **models** in more than **200+** languages.
-It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, and many more [NLP tasks](#features).
+Spark NLP comes with **17000+** pretrained **pipelines** and **models** in more than **200+** languages.
+It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).
 
-**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Google T5**, **MarianMT**, **GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
+**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
 
 ## Project's website
 
@@ -137,19 +137,22 @@ documentation and examples
 - Longformer for Question Answering
 - Table Question Answering (TAPAS)
 - Zero-Shot NER Model
+- Zero Shot Text Classification by BERT (ZSL)
 - Neural Machine Translation (MarianMT)
 - Text-To-Text Transfer Transformer (Google T5)
 - Generative Pre-trained Transformer 2 (OpenAI GPT2)
-- Vision Transformer (ViT)
-- Swin Image Classification
+- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
+- Vision Transformer (Google ViT)
+- Swin Image Classification (Microsoft Swin Transformer)
+- ConvNext Image Classification (Facebook ConvNext)
 - Automatic Speech Recognition (Wav2Vec2)
 - Automatic Speech Recognition (HuBERT)
 - Named entity recognition (Deep learning)
 - Easy TensorFlow integration
 - GPU Support
 - Full integration with Spark ML functions
-- +9400 pre-trained models in +200 languages!
-- +3200 pre-trained pipelines in +200 languages!
+- +12000 pre-trained models in +200 languages!
+- +5000 pre-trained pipelines in +200 languages!
 - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
   Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.