Skip to content

Commit

Permalink
Add new features and update docs[run doc]
Browse files Browse the repository at this point in the history
  • Loading branch information
maziyarpanahi committed Apr 10, 2023
1 parent 46d4cb4 commit 6fb47c0
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 43 deletions.
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@

Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
environment.
Spark NLP comes with **11000+** pretrained **pipelines** and **models** in more than **200+** languages.
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, and many more [NLP tasks](#features).
Spark NLP comes with **17000+** pretrained **pipelines** and **models** in more than **200+** languages.
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).

**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Google T5**, **MarianMT**, **GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.

## Project's website

Expand Down Expand Up @@ -137,19 +137,22 @@ documentation and examples
- Longformer for Question Answering
- Table Question Answering (TAPAS)
- Zero-Shot NER Model
- Zero Shot Text Classification by BERT (ZSL)
- Neural Machine Translation (MarianMT)
- Text-To-Text Transfer Transformer (Google T5)
- Generative Pre-trained Transformer 2 (OpenAI GPT2)
- Vision Transformer (ViT)
- Swin Image Classification
- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
- Vision Transformer (Google ViT)
- Swin Image Classification (Microsoft Swin Transformer)
- ConvNext Image Classification (Facebook ConvNext)
- Automatic Speech Recognition (Wav2Vec2)
- Automatic Speech Recognition (HuBERT)
- Named entity recognition (Deep learning)
- Easy TensorFlow integration
- GPU Support
- Full integration with Spark ML functions
- +9400 pre-trained models in +200 languages!
- +3200 pre-trained pipelines in +200 languages!
- +12000 pre-trained models in +200 languages!
- +5000 pre-trained pipelines in +200 languages!
- Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.

Expand Down
38 changes: 20 additions & 18 deletions docs/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -201,22 +201,22 @@ <h3 class="grey h3_title">{{ _section.title }}</h3>
<div class="highlight-box">
{% highlight bash %}
# Install Spark NLP from PyPI
$ pip install spark-nlp==4.3.2 pyspark==3.3.0
$ pip install spark-nlp==4.4.0 pyspark==3.3.0

# Install Spark NLP from Anaconda/Conda
$ conda install -c johnsnowlabs spark-nlp

# Load Spark NLP with Spark Shell
$ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2
$ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0

# Load Spark NLP with PySpark
$ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2
$ pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0

# Load Spark NLP with Spark Submit
$ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2
$ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0

# Load Spark NLP as an external Fat JAR
$ spark-shell --jars spark-nlp-assembly-4.3.2.jar
$ spark-shell --jars spark-nlp-assembly-4.4.0.jar
{% endhighlight %}
</div>
</div>
Expand All @@ -236,8 +236,8 @@ <h2 class="h2_title grey">Transformers at Scale</h2>
<div class="transformer-descr">
<b>Spark NLP</b> is the only open-source NLP library in <b>production</b> that offers state-of-the-art transformers such as
<b>BERT</b>, <b>CamemBERT</b>, <b>ALBERT</b>, <b>ELECTRA</b>, <b>XLNet</b>, <b>DistilBERT</b>, <b>RoBERTa</b>, <b>DeBERTa</b>,
<b>XLM-RoBERTa</b>, <b>Longformer</b>, <b>ELMO</b>, <b>Universal Sentence Encoder</b>, <b>Google T5</b>, <b>MarianMT</b>, and <b>OpenAI GPT2</b> not only to <b>Python</b>, and <b>R</b>
but also to <b>JVM</b> ecosystem (<b>Java</b>, <b>Scala</b>, and <b>Kotlin</b>) at <b>scale</b> by extending <b>Apache Spark</b> natively
<b>XLM-RoBERTa</b>, <b>Longformer</b>, <b>ELMO</b>, <b>Universal Sentence Encoder</b>, <b>Facebook BART</b>, <b>Google T5</b>, <b>MarianMT</b>, <b>OpenAI GPT2</b>,
<b>Google ViT</b>, <b>ASR Wav2Vec2</b> and many more not only to <b>Python</b>, and <b>R</b> but also to <b>JVM</b> ecosystem (<b>Java</b>, <b>Scala</b>, and <b>Kotlin</b>) at <b>scale</b> by extending <b>Apache Spark</b> natively
</div>
</div>
</div>
Expand Down Expand Up @@ -313,18 +313,11 @@ <h4 class="blue h4_title">NLP Features</h4>
<li><strong>ALBERT</strong> Embeddings</li>
<li><strong>XLNet</strong> Embeddings</li>
<li><strong>ELMO</strong> Embeddings</li>

</ul>
<ul class="list1">

<li><strong>Universal Sentence</strong> Encoder</li>
<li><strong>Sentence</strong> Embeddings</li>
<li><strong>Chunk</strong> Embeddings</li>
<li>Neural <strong>Machine Translation</strong> (MarianMT)</li>
<li><strong>Text-To-Text</strong> Transfer Transformer <strong>(Google T5)</strong></li>
<li><strong>Generative Pre-trained</strong> Transformer 2 <strong>(OpenAI GPT-2)</strong></li>
<li>Vision Transformer (ViT) <strong>Image Classification</strong></li>
<li>Automatic Speech Recognition <strong>(Wav2Vec2)</strong></li>
</ul>
<ul class="list1">
<li>Table Question Answering <strong>(TAPAS)</strong></li>
<li>Unsupervised <strong>keywords extraction</strong></li>
<li>Language <strong>Detection</strong> & <strong>Identification</strong> (up to 375 languages)</li>
Expand All @@ -342,11 +335,20 @@ <h4 class="blue h4_title">NLP Features</h4>
<li>Longformer for <strong>Token & Sequence Classification</strong></li>
<li>Transformer-based <strong>Question Answering</strong></li>
<li><strong>Named entity</strong> recognition (DL model)</li>
<li>Facebook BART <strong>NLG, Translation, and Comprehension</strong></li>
<li>Zero-Shot <strong>NER & Text</strong> Classification (ZSL)</li>
<li>Neural <strong>Machine Translation</strong> (MarianMT)</li>
<li><strong>Text-To-Text</strong> Transfer Transformer <strong>(Google T5)</strong></li>
<li><strong>Generative Pre-trained</strong> Transformer 2 <strong>(OpenAI GPT-2)</strong></li>
<li>Vision Transformer (Google ViT) <strong>Image Classification</strong></li>
<li>Microsoft Swin Transformer <strong>Image Classification</strong></li>
<li>Facebook ConvNext <strong>Image Classification</strong></li>
<li>Automatic Speech Recognition <strong>(Wav2Vec2 & HuBERT)</strong></li>
<li>Easy <strong>TensorFlow</strong> integration</li>
<li><strong>GPU</strong> Support</li>
<li>Full integration with <strong>Spark ML</strong> functions</li>
<li><strong>9400+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
<li><strong>3200+</strong> pre-trained <strong>pipelines </strong> in <strong>200+ languages! </strong>
<li><strong>12000+</strong> pre-trained <strong>models </strong> in <strong>200+ languages! </strong>
<li><strong>5000+</strong> pre-trained <strong>pipelines </strong> in <strong>200+ languages! </strong>
</ul>
</div>
{% highlight python %}
Expand Down
22 changes: 11 additions & 11 deletions docs/en/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,13 +106,13 @@ spark = SparkSession.builder \
</dependency>
```

**spark-nlp-m1:**
**spark-nlp-silicon:**

```xml
<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1 -->
<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -->
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-m1_2.12</artifactId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>4.4.0</version>
</dependency>
```
Expand Down Expand Up @@ -144,11 +144,11 @@ libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.4.0"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.4.0"
```

**spark-nlp-m1:**
**spark-nlp-silicon:**

```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0"
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0"
```

**spark-nlp-aarch64:**
Expand Down Expand Up @@ -220,7 +220,7 @@ as expected.

Adding Spark NLP to your Scala or Java project is easy:

Simply change to dependency coordinates to `spark-nlp-m1` and add the dependency to your
Simply change to dependency coordinates to `spark-nlp-silicon` and add the dependency to your
project.

How to do this is mentioned above: [Scala And Java](#scala-and-java)
Expand All @@ -229,10 +229,10 @@ So for example for Spark NLP with Apache Spark 3.0.x and 3.1.x you will end up w
maven coordinates like these:

```xml
<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1 -->
<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -->
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-m1_2.12</artifactId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>4.4.0</version>
</dependency>
```
Expand All @@ -241,7 +241,7 @@ or in case of sbt:

```scala
// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0"
```

If everything went well, you can now start Spark NLP with the `m1` flag set to `true`:
Expand Down Expand Up @@ -269,7 +269,7 @@ If everything went well, you can now start Spark NLP with the `m1` flag set to `
```python
import sparknlp

spark = sparknlp.start(m1=True)
spark = sparknlp.start(apple_silicon=True)
```

## Installation for Linux Aarch64 Systems
Expand Down
17 changes: 10 additions & 7 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@

Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed
environment.
Spark NLP comes with **11000+** pretrained **pipelines** and **models** in more than **200+** languages.
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, and many more [NLP tasks](#features).
Spark NLP comes with **17000+** pretrained **pipelines** and **models** in more than **200+** languages.
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).

**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Google T5**, **MarianMT**, **GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Facebook BART**, **Google T5**, **MarianMT**, **OpenAI GPT2**, and **Vision Transformers (ViT)** not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.

## Project's website

Expand Down Expand Up @@ -137,19 +137,22 @@ documentation and examples
- Longformer for Question Answering
- Table Question Answering (TAPAS)
- Zero-Shot NER Model
- Zero Shot Text Classification by BERT (ZSL)
- Neural Machine Translation (MarianMT)
- Text-To-Text Transfer Transformer (Google T5)
- Generative Pre-trained Transformer 2 (OpenAI GPT2)
- Vision Transformer (ViT)
- Swin Image Classification
- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART)
- Vision Transformer (Google ViT)
- Swin Image Classification (Microsoft Swin Transformer)
- ConvNext Image Classification (Facebook ConvNext)
- Automatic Speech Recognition (Wav2Vec2)
- Automatic Speech Recognition (HuBERT)
- Named entity recognition (Deep learning)
- Easy TensorFlow integration
- GPU Support
- Full integration with Spark ML functions
- +9400 pre-trained models in +200 languages!
- +3200 pre-trained pipelines in +200 languages!
- +12000 pre-trained models in +200 languages!
- +5000 pre-trained pipelines in +200 languages!
- Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian,
Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more.

Expand Down

0 comments on commit 6fb47c0

Please sign in to comment.