Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
8deb7da
Models hub (#14490)
maziyarpanahi Dec 20, 2024
1ea76c3
[SPARKNLP-1105] Introducing AlbertForMultipleChoice
danilojsl Dec 27, 2024
9e67d89
[SPARKNLP-1105] Addiong test tags
danilojsl Dec 27, 2024
3b86715
[SPARKNLP-1106] Introducing DistilBertForMultipleChoice
danilojsl Dec 31, 2024
191c78b
[SPARKNLP-1106] Adding notebook examples for DistilBertForMultipleChoice
danilojsl Jan 2, 2025
52e4cd4
[SPARKNLP-1107] Introducing RoBertaForMultipleChoice
danilojsl Jan 3, 2025
4d2c06e
[SPARKNLP-1107] Adding example notebooks for RobertaForMultipleChoice
danilojsl Jan 3, 2025
0898d8c
Fix the bug generating wrong embeddings
maziyarpanahi Jan 6, 2025
9ac93ba
fixing attention mask in bge, e5, mxbai, nomic, snowflake, and uae
maziyarpanahi Jan 6, 2025
873e224
[SPARKNLP-1108] Introducing XlmRoBertaForMultipleChoice
danilojsl Jan 6, 2025
6c3e9cc
[SPARKNLP-1108] Adding notebooks example for XlmRoBertaForMultipleChoice
danilojsl Jan 8, 2025
4cd472d
[SPARKNLP-1098] Adding PDF reader support
danilojsl Jan 8, 2025
f3583d1
[SPARKNLP-1098] Adding docs and notebook example for PDF reader
danilojsl Jan 15, 2025
a6a6fca
fix issue with attention mask calculation
maziyarpanahi Jan 16, 2025
d7be7f0
Merge release/600
DevinTDHa Jan 18, 2025
208bb75
Refactor automatic gpu support
DevinTDHa Oct 25, 2024
e5c24f5
[SPARKNLP-1079] AutoGGUFVisionModel Scala Side
DevinTDHa Dec 14, 2024
544f722
[SPARKNLP-1079] AutoGGUFVisionModel Python Side
DevinTDHa Jan 18, 2025
ac80d3f
[SPARKNLP-1079] AutoGGUFVisionModel documentation and end-to-end example
DevinTDHa Jan 18, 2025
0f5d073
[SPARKNLP-1079] Bump jsl-llamacpp version
DevinTDHa Jan 18, 2025
998d1f5
[SPARKNLP-1079] AutoGGUFVisionModel pretrained model
DevinTDHa Jan 24, 2025
73cd3ad
fixing typo in MXBAI notebook
ahmedlone127 Jan 29, 2025
d0da98d
Add a new CLSToken feature to BGE annotator [skip test]
maziyarpanahi Jan 29, 2025
aefce9b
code reformat [skip test]
maziyarpanahi Jan 29, 2025
7a9f845
LLama3 download fix (#14508)
C-K-Loan Jan 29, 2025
14daf85
Merge pull request #14496 from JohnSnowLabs/bugfix/mpnet-attentionmas…
maziyarpanahi Jan 29, 2025
d64fd69
Models hub (#14513)
maziyarpanahi Jan 30, 2025
69de5c2
Bump to 5.5.3 [run doc]
maziyarpanahi Jan 30, 2025
9948696
Fix duplicate docstring causing docs not to build [run doc]
maziyarpanahi Jan 30, 2025
7137acb
Update Scala and Python APIs
actions-user Jan 30, 2025
7d2bed7
Merge pull request #14511 from JohnSnowLabs/release/553-release-candi…
maziyarpanahi Jan 30, 2025
cdab6bb
Janus Scala API
prabod Feb 6, 2025
195c097
Janus Scala Documentation
prabod Feb 6, 2025
082db05
Janus Python API
prabod Feb 6, 2025
4d8bf47
[SPARKNLP-1079] AutoGGUFVisionModel pretrained model
DevinTDHa Jan 24, 2025
deb3952
[SPARKNLP-1079] Fix loadImagesAsBytes path creation
DevinTDHa Jan 24, 2025
f2be057
[SPARKNLP-1079] Fix batch inference for AutoGGUFVisionModel
DevinTDHa Feb 9, 2025
3d31759
[SPARKNLP-1079] Add note that only CLIP models are supported
DevinTDHa Feb 9, 2025
c3dca2d
update config values on the instance
prabod Feb 12, 2025
f9bd02d
added OLMo scala api
prabod Apr 22, 2024
32635d2
added OLMo scala api
prabod Apr 22, 2024
d52d4e0
added OLMo python API and tests
prabod Apr 24, 2024
2eedcb3
OlMo Notebook and bug fixes
prabod Feb 12, 2025
05cbe8b
update default name and documentation
prabod Feb 12, 2025
408958a
update default name
prabod Feb 12, 2025
a369ce9
Phi3V preprocessing utils
prabod Sep 19, 2024
6896a02
added phi3v
prabod Oct 23, 2024
621411b
add phi3v scala API
prabod Oct 28, 2024
c047188
Added tests
prabod Oct 29, 2024
59e596c
Phi3V python api and tests
prabod Oct 29, 2024
d0ad585
added byte fallback
prabod Oct 29, 2024
c12713d
changed to pretrained
prabod Oct 29, 2024
d12ae3d
export notebook
prabod Oct 30, 2024
5040468
updated testes
prabod Oct 30, 2024
27140d7
update default name and documentation
prabod Feb 13, 2025
9752516
update documentation and resource downloader entry
prabod Feb 13, 2025
1af39be
LLAVA Scala API and Tests
prabod Nov 1, 2024
b5872e7
LLAVA Test
prabod Nov 1, 2024
6f7c4d6
LLAVA python api
prabod Nov 6, 2024
6f2f3a9
LLAVA notebook
prabod Nov 7, 2024
64b6b20
Add custom model requirements
prabod Nov 8, 2024
2661bfb
update documentation and resource downloader entry
prabod Feb 13, 2025
af28bf2
cohere scala and python api
prabod Nov 13, 2024
133b326
Cohere Notebook
prabod Nov 14, 2024
028ca67
update documentation and resource downloader entry
prabod Feb 13, 2025
db48372
update documentation and resource downloader entry
prabod Feb 13, 2025
b967682
Qwen2VL scala API
prabod Dec 9, 2024
e5017e2
QWEN2VL python api
prabod Dec 10, 2024
16c9716
QWEN2VL Notebook
prabod Dec 10, 2024
934af90
update default_model and resource downloader entry
prabod Feb 14, 2025
d19b9f7
update documentation
prabod Feb 14, 2025
2cd2cae
update model
prabod Feb 14, 2025
7052361
added preprocessing utils for MLLama
prabod Dec 25, 2024
89c1803
MLLama tokenizers and utils
prabod Jan 8, 2025
58e309b
MLLama scala api
prabod Jan 20, 2025
46fe907
MLLama scala api changes
prabod Jan 21, 2025
c19f4eb
MLLama python api
prabod Jan 23, 2025
1e0500f
update default model, notebook and documentation
prabod Feb 14, 2025
ac203f2
Update create_search_index.yml (#14526)
agsfer Feb 20, 2025
0f9d4d9
[SPARKNLP-1098] Enabling getStoreSplittedPdf parameter to PDF reader
danilojsl Feb 21, 2025
3815c20
Merge branch 'master' of github.com:JohnSnowLabs/spark-nlp into featu…
danilojsl Feb 21, 2025
638d26b
[SPARKNLP-1098] Adding PdfToText notebook example
danilojsl Feb 24, 2025
ffe4e21
added image generation scala API
prabod Feb 26, 2025
dd2a400
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
b4dc462
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
c92a544
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
f7d5893
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
afd7338
added image generation python API and tests
prabod Mar 4, 2025
8eeccf7
[SPARKNLP-1117] Adding storeContent to HTML, Word and Email readers
danilojsl Mar 6, 2025
9386b44
[SPARKNLP-1117] Refactoring documentation for readers
danilojsl Mar 6, 2025
649c862
[SPARKNLP-1102] Adding support to read Excel files
danilojsl Dec 17, 2024
60a8521
[SPARKNLP-1102] Adding notebook example to read Excel files
danilojsl Dec 19, 2024
4bb3ad5
[SPARKNLP-1102] Refactoring documentation for excel reader
danilojsl Mar 6, 2025
af7b13e
[SPARKNLP-1117] Adding storeContent param
danilojsl Mar 6, 2025
1999ae5
[SPARKNLP-1103] Adding support to read PowerPoint files and adds loca…
danilojsl Dec 24, 2024
6dc756e
[SPARKNLP-1103] Adding documentation and notebook example for PowerPo…
danilojsl Dec 24, 2024
26e023e
[SPARKNLP-1117] Adding storeContent param
danilojsl Mar 6, 2025
8710011
[SPARKNLP-1113] Adding Text Reader
danilojsl Feb 17, 2025
30502cc
[SPARKNLP-1113] Adding txt reader notebook example
danilojsl Feb 17, 2025
cdb8f36
[SPARKNLP-1117] Adding storeContent param
danilojsl Mar 7, 2025
19d2dd4
added notebook
prabod Mar 11, 2025
d7bbb51
Improved Error Handling for AutoGGUF models
DevinTDHa Mar 14, 2025
489af7e
Add setNParallel for AutoGGUF models on python side
DevinTDHa Mar 14, 2025
9247f0d
Merge branch 'bug/gguf-embeddings-context' into feature/SPARKNLP-1079…
DevinTDHa Mar 14, 2025
9df68a7
Improved Error Handling and setNParallel alias for batch size
DevinTDHa Mar 14, 2025
f3d353c
Fix notebook error format
DevinTDHa Mar 14, 2025
05000ab
Merge pull request #14242 from JohnSnowLabs/SPARKNLP-1006-Implement-OLMo
maziyarpanahi Mar 16, 2025
6d71770
Merge branch 'release/600-release-candidate' into SPARKNLP-1060-Imple…
maziyarpanahi Mar 16, 2025
44fb92a
Merge pull request #14444 from JohnSnowLabs/SPARKNLP-1060-Implement-P…
maziyarpanahi Mar 16, 2025
f33ce00
Merge branch 'release/600-release-candidate' into SPARKNLP-1033-Imple…
maziyarpanahi Mar 16, 2025
7b65030
Merge pull request #14450 from JohnSnowLabs/SPARKNLP-1033-Implement-L…
maziyarpanahi Mar 16, 2025
92c7e12
Merge branch 'release/600-release-candidate' into SPARKNLP-1032-CoHere
maziyarpanahi Mar 16, 2025
2c867de
Merge pull request #14457 from JohnSnowLabs/SPARKNLP-1032-CoHere
maziyarpanahi Mar 16, 2025
39ed5e7
Merge branch 'release/600-release-candidate' into SPARKNLP-1077-Imple…
maziyarpanahi Mar 16, 2025
c31306f
Merge pull request #14474 from JohnSnowLabs/SPARKNLP-1077-Implementin…
maziyarpanahi Mar 16, 2025
5417d91
updating python and scala model names (#14488)
ahmedlone127 Mar 16, 2025
b35c90c
Merge pull request #14489 from JohnSnowLabs/feature/SPARKNLP-1102-Add…
maziyarpanahi Mar 16, 2025
e7a79fb
Merge pull request #14491 from JohnSnowLabs/feature/SPARKNLP-1103-Add…
maziyarpanahi Mar 16, 2025
dd57c97
Merge pull request #14492 from JohnSnowLabs/feature/SPARKNLP-1105-Imp…
maziyarpanahi Mar 16, 2025
d7e2851
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
cbbca68
Merge pull request #14493 from JohnSnowLabs/feature/SPARKNLP-1106-Imp…
maziyarpanahi Mar 16, 2025
06ef557
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
7bd3ca0
Merge pull request #14495 from JohnSnowLabs/feature/SPARKNLP-1107-Imp…
maziyarpanahi Mar 16, 2025
c737a27
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
0420d04
Merge pull request #14497 from JohnSnowLabs/feature/SPARKNLP-1108-Imp…
maziyarpanahi Mar 16, 2025
2b363e2
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-10…
maziyarpanahi Mar 16, 2025
3999409
Merge pull request #14499 from JohnSnowLabs/feature/SPARKNLP-1098-Add…
maziyarpanahi Mar 16, 2025
7673843
Merge branch 'release/600-release-candidate' into SPARKNLP-1078-Imple…
maziyarpanahi Mar 16, 2025
6283a8f
Merge pull request #14502 from JohnSnowLabs/SPARKNLP-1078-Implement-L…
maziyarpanahi Mar 16, 2025
6194f03
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-10…
maziyarpanahi Mar 16, 2025
3def1a0
Merge pull request #14505 from DevinTDHa/feature/SPARKNLP-1079-AutoGG…
maziyarpanahi Mar 16, 2025
e4f1961
Merge pull request #14510 from JohnSnowLabs/Fixing-MXBAI-Embedding-no…
maziyarpanahi Mar 16, 2025
a9d7980
SPARKNLP-1109 Adding Extractor to Sparknlp (#14519)
danilojsl Mar 16, 2025
94290cc
Merge pull request #14524 from JohnSnowLabs/feature/SPARKNLP-1113-Add…
maziyarpanahi Mar 16, 2025
d807b49
Merge branch 'release/600-release-candidate' into SPARKNLP-1088-Imple…
maziyarpanahi Mar 16, 2025
6b80b40
Merge pull request #14532 from JohnSnowLabs/SPARKNLP-1088-Implement-D…
maziyarpanahi Mar 16, 2025
39e60e3
Merge pull request #14533 from DevinTDHa/bug/gguf-embeddings-context
maziyarpanahi Mar 16, 2025
0dc22eb
Adding missing bracket in SparkNLPReader and formatting some files
danilojsl Mar 17, 2025
311f988
Adding misssing return dataframe for PDF reader in Python
danilojsl Mar 18, 2025
9e7c2fc
Updating reader notebooks
danilojsl Mar 19, 2025
8ba1d4c
add janus to resourcedownloader
prabod Mar 27, 2025
3ca997b
update to use pretrained model
prabod Mar 27, 2025
7d48e2f
Bump version [run doc]
maziyarpanahi Apr 24, 2025
77809b8
Update VisionEncoderDecoder.scala (#14553)
ahmedlone127 Apr 24, 2025
57c3211
use ubuntu-latest for docs [run doc]
maziyarpanahi Apr 24, 2025
b005a7d
Merge branch 'release/600-release-candidate' of https://github.com/Jo…
maziyarpanahi Apr 24, 2025
fe66723
fixing name (#14554)
ahmedlone127 Apr 24, 2025
c3baa51
run doc [run doc]
maziyarpanahi Apr 24, 2025
518019c
update build yaml [run doc]
maziyarpanahi Apr 27, 2025
f914289
add missing sbt install [run doc]
maziyarpanahi Apr 27, 2025
308dc87
add sbt deps [run doc]
maziyarpanahi Apr 27, 2025
5c888c0
fix bad docstring [run doc]
maziyarpanahi Apr 27, 2025
17223a7
Update Scala and Python APIs
actions-user Apr 27, 2025
a95c2b6
Models hub (#14557)
maziyarpanahi Apr 28, 2025
7f160f3
release conda 6.0.0 [skip test]
maziyarpanahi Apr 28, 2025
3fce83f
Merge pull request #14534 from JohnSnowLabs/release/600-release-candi…
maziyarpanahi Apr 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
2 changes: 1 addition & 1 deletion .github/workflows/create_search_index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ jobs:
./docs/backup-benchmarking.json \
./docs/backup-references.json
- name: Upload artifacts
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: jekyll-build
path: |
Expand Down
113 changes: 65 additions & 48 deletions .github/workflows/publish_docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,60 +3,77 @@ name: Publish APIs
on:
push:
branches:
- '*release*'
- 'release/**'
- "*release*"
- "release/**"
pull_request:
branches:
- 'main'
- 'master'
- '*release*'
- 'release/**'
- "main"
- "master"
- "*release*"
- "release/**"

env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

jobs:
build:
if: "contains(toJSON(github.event.commits.*.message), '[run doc]')"
runs-on: ubuntu-20.04
runs-on: ubuntu-22.04
steps:
- name: checkout repo
uses: actions/checkout@v2
- name: Set up JDK 8
uses: actions/setup-java@v1
with:
java-version: 1.8
- name: Install Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7.7
architecture: x64
- name: Build Scala APIs
run: |
sbt doc
- name: Install PyPI dependencies
run: |
python -m pip install --upgrade pip
cd ./python/docs && pip install -r requirements_doc.txt
- name: Build Python APIs
run: |
cd ./python/docs
make html
- name: Commit changes
id: commit
run: |
git config --local user.email "action@github.com"
git config --local user.name "github-actions"
git add --all
if [-z "$(git status --porcelain)"]; then
echo "::set-output name=push::false"
else
git commit -m "Update Scala and Python APIs" -a
echo "::set-output name=push::true"
fi
shell: bash
- name: Push changes
if: steps.commit.outputs.push == 'true'
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
branch: ${{ github.ref }}
- name: Checkout repo
uses: actions/checkout@v2

- name: Set up JDK 8
uses: actions/setup-java@v1
with:
java-version: 1.8

- name: Install Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
architecture: "x64"

- name: Install SBT
run: |
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x99E82A75642AC823" | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/sbt.gpg > /dev/null
sudo apt-get update
sudo apt-get install -y sbt

- name: Build Scala APIs
run: sbt doc

- name: Install PyPI dependencies
run: |
python -m pip install --upgrade pip
cd ./python/docs && pip install -r requirements_doc.txt

- name: Build Python APIs
run: |
cd ./python/docs
# Run with verbose output to debug any issues
SPHINX_APIDOC_OPTIONS=members,undoc-members,show-inheritance sphinx-apidoc -e -f -o ./_api ../sparknlp ../sparknlp/tests
make html SPHINXOPTS="-v"

- name: Commit changes
id: commit
run: |
git config --local user.email "action@github.com"
git config --local user.name "github-actions"
git add --all
if [ -z "$(git status --porcelain)" ]; then
echo "push=false" >> $GITHUB_OUTPUT
else
git commit -m "Update Scala and Python APIs" -a
echo "push=true" >> $GITHUB_OUTPUT
fi
shell: bash

- name: Push changes
if: ${{ steps.commit.outputs.push == 'true' }}
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
branch: ${{ github.ref }}
54 changes: 54 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,57 @@
=======
6.0.0
=======
----------------
New Features & Enhancements
----------------
* Introducing new large language models:
* OLMo model support (SPARKNLP-1006)
* Phi 3.5 Vision model support (SPARKNLP-1060)
* LLAVA model support (SPARKNLP-1033)
* CoHere model support (SPARKNLP-1032)
* Qwen2-VL model support (SPARKNLP-1077)
* Llama 3.2 Vision models (SPARKNLP-1078)
* Deepseek Janus model support (SPARKNLP-1088)
* Added LLAVA v1.5 7b quantized model
* Added StarCoder2 3b int8 model

* New MultipleChoice Transformers:
* AlbertForMultipleChoice (SPARKNLP-1105)
* DistilBertForMultipleChoice (SPARKNLP-1106)
* RoBertaForMultipleChoice (SPARKNLP-1107)
* XlmRoBertaForMultipleChoice (SPARKNLP-1108)

* New file format support:
* Excel files reader (SPARKNLP-1102)
* PowerPoint files reader (SPARKNLP-1103)
* PDF reader (SPARKNLP-1098)
* Text reader (SPARKNLP-1113)

* Other improvements:
* AutoGGUFVisionModel for vision model support (SPARKNLP-1079)
* Added Extractor to SparkNLP (SPARKNLP-1109)
* Updated Python and Scala model names
* Improved error handling for AutoGGUF models

----------------
Bug Fixes
----------------
* Fixed typo in MXBAI notebook

========
5.5.3
========
----------------
Bug Fixes & Enhancements
----------------
* BGEEmbeddings: The default pretrained model for BGEEmbeddings has been changed from "bge_base" to "bge_small_en_v1.5". Users relying on the old default will need to explicitly specify "bge_base" in the pretrained method.
* Added useCLSToken parameter to allow users to choose between CLS token pooling and attention-based average pooling for sentence embeddings.
* BGEEmbeddings: BGEEmbeddings now supports a `useCLSToken` parameter, which defaults to True. This affects the embedding calculation strategy. Existing users should verify their usage and potentially set this parameter explicitly.
* Added HasClsTokenProperties in Scala: Introduced the HasClsTokenProperties trait in Scala providing useCLSToken parameter functionality for relevant annotators.
* Fixing wrong padding in attention mask in `MPNet`, `BGE`, `E5`, `Mxbai`, `Nomic`, `SnowFlake`, and `UAE`. This resulted in wrong inference results in some cases and not equal to the ONNX version in transformers/sentence-transformers.
* Various Performance Optimizations: Multiple changes across different models (Albert, Bart, CLIP, CamemBert, ConvNextClassifier, DeBerta, DistilBert, E5, Instructor, MPNet, Mxbai, Nomic, RoBerta, SnowFlake, UAE, ViTClassifier, VisionEncoderDecoder, Wav2Vec2, XlmRoBertaClassification, XlmRoberta) appear to focus on performance improvements and code cleanup, especially related to OpenVINO and ONNX inference. These may lead to faster inference times.
* Fixing Llama3 download issue in Python.

========
5.5.2
========
Expand Down
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed environment.

Spark NLP comes with **83000+** pretrained **pipelines** and **models** in more than **200+** languages.
Spark NLP comes with **100000+** pretrained **pipelines** and **models** in more than **200+** languages.
It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features).

**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, **Llama**, **Mistral**, **Phi**, **Qwen2**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively.
Expand Down Expand Up @@ -63,7 +63,7 @@ $ java -version
$ conda create -n sparknlp python=3.7 -y
$ conda activate sparknlp
# spark-nlp by default is based on pyspark 3.x
$ pip install spark-nlp==5.5.2 pyspark==3.3.1
$ pip install spark-nlp==6.0.0 pyspark==3.3.1
```

In Python console or Jupyter `Python3` kernel:
Expand Down Expand Up @@ -129,10 +129,11 @@ For a quick example of using pipelines and models take a look at our official [d

### Apache Spark Support

Spark NLP *5.5.2* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x
Spark NLP *6.0.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x

| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x |
|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| 6.0.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.5.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO |
| 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO |
Expand All @@ -146,6 +147,7 @@ Find out more about `Spark NLP` versions from our [release notes](https://github

| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 |
|-----------|------------|------------|------------|------------|------------|------------|------------|
| 6.0.x | NO | YES | YES | YES | YES | NO | YES |
| 5.5.x | NO | YES | YES | YES | YES | NO | YES |
| 5.4.x | NO | YES | YES | YES | YES | NO | YES |
| 5.3.x | NO | YES | YES | YES | YES | NO | YES |
Expand All @@ -157,7 +159,7 @@ Find out more about 4.x `SparkNLP` versions in our official [documentation](http

### Databricks Support

Spark NLP 5.5.2 has been tested and is compatible with the following runtimes:
Spark NLP 6.0.0 has been tested and is compatible with the following runtimes:

| **CPU** | **GPU** |
|--------------------|--------------------|
Expand All @@ -174,7 +176,7 @@ We are compatible with older runtimes. For a full list check databricks support

### EMR Support

Spark NLP 5.5.2 has been tested and is compatible with the following EMR releases:
Spark NLP 6.0.0 has been tested and is compatible with the following EMR releases:

| **EMR Release** |
|--------------------|
Expand All @@ -184,6 +186,13 @@ Spark NLP 5.5.2 has been tested and is compatible with the following EMR release
| emr-7.0.0 |
| emr-7.1.0 |
| emr-7.2.0 |
| emr-7.3.0 |
| emr-7.4.0 |
| emr-7.5.0 |
| emr-7.6.0 |
| emr-7.7.0 |
| emr-7.8.0 |


We are compatible with older EMR releases. For a full list check EMR support in our official [documentation](https://sparknlp.org/docs/en/install#emr-support)

Expand All @@ -205,7 +214,7 @@ deployed to Maven central. To add any of our packages as a dependency in your ap
from our official documentation.

If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your
projects [Spark NLP SBT S5.5.2r](https://github.com/maziyarpanahi/spark-nlp-starter)
projects [Spark NLP SBT S6.0.0r](https://github.com/maziyarpanahi/spark-nlp-starter)

### Python

Expand Down Expand Up @@ -250,7 +259,7 @@ In Spark NLP we can define S3 locations to:

Please check [these instructions](https://sparknlp.org/docs/en/install#s3-integration) from our official documentation.

## Document5.5.2
## Documentation

### Examples

Expand Down Expand Up @@ -283,7 +292,7 @@ the Spark NLP library:
keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster},
abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.}
}
}5.5.2
}
```

## Community support
Expand Down
5 changes: 3 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64)

organization := "com.johnsnowlabs.nlp"

version := "5.5.2"
version := "6.0.0"

(ThisBuild / scalaVersion) := scalaVer

Expand Down Expand Up @@ -163,7 +163,8 @@ lazy val utilDependencies = Seq(
poiDocx
exclude ("org.apache.logging.log4j", "log4j-api"),
scratchpad
exclude ("org.apache.logging.log4j", "log4j-api")
exclude ("org.apache.logging.log4j", "log4j-api"),
pdfBox
)

lazy val typedDependencyParserDependencies = Seq(junit)
Expand Down
4 changes: 2 additions & 2 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{% set name = "spark-nlp" %}
{% set version = "5.5.2" %}
{% set version = "6.0.0" %}

package:
name: {{ name|lower }}
version: {{ version }}

source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/spark-nlp-{{ version }}.tar.gz
sha256: b620487092256d02bf8d277374c564cd22384d437c97a4bb5b3b0f1fdfc696e8
sha256: 58f4f530105d5c5522fc37ce4d3b63af1e2463b43e000cf69838e0854b468365

build:
noarch: python
Expand Down
2 changes: 1 addition & 1 deletion docs/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ <h3 class="grey h3_title">{{ _section.title }}</h3>
<div class="highlight-box">
{% highlight bash %}
# Using PyPI
$ pip install spark-nlp==5.5.2
$ pip install spark-nlp==6.0.0

# Using Anaconda/Conda
$ conda install -c johnsnowlabs spark-nlp
Expand Down
Loading
Loading