diff --git a/README.md b/README.md index 418e62ea9fa7..14da5a95cec1 100644 --- a/README.md +++ b/README.md @@ -162,7 +162,7 @@ To use Spark NLP you need the following requirements: **GPU (optional):** -Spark NLP 4.3.2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: +Spark NLP 4.4.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 @@ -178,7 +178,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 ``` In Python console or Jupyter `Python3` kernel: @@ -223,11 +223,12 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh ## Apache Spark Support -Spark NLP *4.3.2* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and +Spark NLP *4.4.0* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x: | Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| +| 4.4.x | NO | NO | YES | YES | YES | YES | | 4.3.x | NO | NO | YES | YES | YES | YES | | 4.2.x | NO | NO | YES | YES | YES | YES | | 4.1.x | NO | NO | YES | YES | YES | YES | @@ -246,22 +247,23 @@ Find out more about `Spark NLP` versions from our [release notes](https://github ## Scala and Python Support -| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Scala 2.11 | Scala 2.12 | -|-----------|------------|------------|------------|------------|------------|------------| -| 4.3.x | YES | YES | YES | YES | NO | YES | -| 4.2.x | YES | YES | YES | YES | NO | YES | -| 4.1.x | YES | YES | YES | YES | NO | YES | -| 4.0.x | YES | YES | YES | YES | NO | YES | -| 3.4.x | YES | YES | YES | YES | YES | YES | -| 3.3.x | YES | YES | YES | NO | YES | YES | -| 3.2.x | YES | YES | YES | NO | YES | YES | -| 3.1.x | YES | YES | YES | NO | YES | YES | -| 3.0.x | YES | YES | YES | NO | YES | YES | -| 2.7.x | YES | YES | NO | NO | YES | NO | +| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 | +|-----------|------------|------------|------------|------------|------------|------------|------------| +| 4.4.x | NO | YES | YES | YES | YES | NO | YES | +| 4.3.x | YES | YES | YES | YES | YES | NO | YES | +| 4.2.x | YES | YES | YES | YES | YES | NO | YES | +| 4.1.x | YES | YES | YES | YES | NO | NO | YES | +| 4.0.x | YES | YES | YES | YES | NO | NO | YES | +| 3.4.x | YES | YES | YES | YES | NO | YES | YES | +| 3.3.x | YES | YES | YES | NO | NO | YES | YES | +| 3.2.x | YES | YES | YES | NO | NO | YES | YES | +| 3.1.x | YES | YES | YES | NO | NO | YES | YES | +| 3.0.x | YES | YES | YES | NO | NO | YES | YES | +| 2.7.x | YES | YES | NO | NO | NO | YES | NO | ## Databricks Support -Spark NLP 4.3.2 has been tested and is compatible with the following runtimes: +Spark NLP 4.4.0 has been tested and is compatible with the following runtimes: **CPU:** @@ -315,7 +317,7 @@ runtimes supporting CUDA 11 are 9.x and above as listed under GPU. ## EMR Support -Spark NLP 4.3.2 has been tested and is compatible with the following EMR releases: +Spark NLP 4.4.0 has been tested and is compatible with the following EMR releases: - emr-6.2.0 - emr-6.3.0 @@ -359,11 +361,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, ```sh # CPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` The `spark-nlp` has been published to @@ -372,11 +374,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # GPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0 ``` @@ -386,11 +388,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # AArch64 -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.0 ``` @@ -400,11 +402,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # M1/M2 (Apple Silicon) -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.0 ``` @@ -418,7 +420,7 @@ set in your SparkSession: spark-shell \ --driver-memory 16g \ --conf spark.kryoserializer.buffer.max=2000M \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` ## Scala @@ -436,7 +438,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp_2.12 - 4.3.2 + 4.4.0 ``` @@ -447,7 +449,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-gpu_2.12 - 4.3.2 + 4.4.0 ``` @@ -458,7 +460,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 - 4.3.2 + 4.4.0 ``` @@ -469,7 +471,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-silicon_2.12 - 4.3.2 + 4.4.0 ``` @@ -479,28 +481,28 @@ coordinates: ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.4.0" ``` **spark-nlp-gpu:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.4.0" ``` **spark-nlp-aarch64:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.4.0" ``` **spark-nlp-silicon:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0" ``` Maven @@ -522,7 +524,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through Pip: ```bash -pip install spark-nlp==4.3.2 +pip install spark-nlp==4.4.0 ``` Conda: @@ -551,7 +553,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0") .getOrCreate() ``` @@ -622,7 +624,7 @@ Use either one of the following options - Add the following Maven Coordinates to the interpreter's library list ```bash -com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is @@ -633,7 +635,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 Apart from the previous step, install the python module through pip ```bash -pip install spark-nlp==4.3.2 +pip install spark-nlp==4.4.0 ``` Or you can install `spark-nlp` from inside Zeppelin by using Conda: @@ -661,7 +663,7 @@ launch the Jupyter from the same Python environment: $ conda create -n sparknlp python=3.8 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 jupyter +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 jupyter $ jupyter notebook ``` @@ -678,7 +680,7 @@ export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=notebook -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` @@ -705,7 +707,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.3.2 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.4.0 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) @@ -728,7 +730,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.3.2 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.4.0 ``` [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live @@ -747,9 +749,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP 3. In `Libraries` tab inside your cluster you need to follow these steps: - 3.1. Install New -> PyPI -> `spark-nlp==4.3.2` -> Install + 3.1. Install New -> PyPI -> `spark-nlp==4.4.0` -> Install - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2` -> Install + 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0` -> Install 4. Now you can attach your notebook to the cluster and use Spark NLP! @@ -800,7 +802,7 @@ A sample of your software configuration in JSON on S3 (must be public access): "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2" + "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0" } }] ``` @@ -809,7 +811,7 @@ A sample of AWS CLI to launch EMR cluster: ```.sh aws emr create-cluster \ ---name "Spark NLP 4.3.2" \ +--name "Spark NLP 4.4.0" \ --release-label emr-6.2.0 \ --applications Name=Hadoop Name=Spark Name=Hive \ --instance-type m4.4xlarge \ @@ -873,7 +875,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \ --enable-component-gateway \ --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \ - --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` 2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI. @@ -912,7 +914,7 @@ spark = SparkSession.builder .config("spark.kryoserializer.buffer.max", "2000m") .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0") .getOrCreate() ``` @@ -926,7 +928,7 @@ spark-shell \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` **pyspark:** @@ -939,7 +941,7 @@ pyspark \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` **Databricks:** @@ -1211,7 +1213,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars", "/tmp/spark-nlp-assembly-4.3.2.jar") + .config("spark.jars", "/tmp/spark-nlp-assembly-4.4.0.jar") .getOrCreate() ``` @@ -1220,7 +1222,7 @@ spark = SparkSession.builder version (3.0.x, 3.1.x, 3.2.x, and 3.3.x) - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/spark-nlp-assembly-4.3.2.jar`) + i.e., `hdfs:///tmp/spark-nlp-assembly-4.4.0.jar`) Example of using pretrained Models and Pipelines in offline: diff --git a/build.sbt b/build.sbt index 4a62fb8c806b..3e164402d0b8 100644 --- a/build.sbt +++ b/build.sbt @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64) organization := "com.johnsnowlabs.nlp" -version := "4.3.2" +version := "4.4.0" (ThisBuild / scalaVersion) := scalaVer diff --git a/docs/demos/mental_health.md b/docs/demos/mental_health.md deleted file mode 100644 index 68f2d9af04d7..000000000000 --- a/docs/demos/mental_health.md +++ /dev/null @@ -1,71 +0,0 @@ ---- -layout: demopagenew -title: Mental Health - Clinical NLP Demos & Notebooks -seotitle: 'Clinical NLP: Mental Health - John Snow Labs' -full_width: true -permalink: /mental_health -key: demo -nav_key: demo -article_header: - type: demo -license: false -mode: immersivebg -show_edit_on_github: false -show_date: false -data: - sections: - - secheader: yes - secheader: - - subtitle: Mental Health - Live Demos & Notebooks - activemenu: mental_health - source: yes - source: - - title: Identify Depression for Patient Posts - id: depression_classifier_tweets - image: - src: /assets/images/Depression_Classifier_for_Tweets.svg - excerpt: This demo shows a classifier that can classify whether tweets contain depressive text. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/healthcare/MENTAL_HEALTH_DEPRESSION/ - - text: Colab - type: blue_btn - url: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/MENTAL_HEALTH.ipynb - - title: Identify Intimate Partner Violence from Patient Posts - id: classify_intimate_partner_violence_tweet - image: - src: /assets/images/Classify_Intimate_Partner_Violence_Tweet.svg - excerpt: This model involves the detection the potential IPV victims on social media platforms (in English tweets). - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/healthcare/PUBLIC_HEALTH_PARTNER_VIOLENCE/ - - text: Colab - type: blue_btn - url: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/PUBLIC_HEALTH_MB4SC.ipynb - - title: Identify Stress from Patient Posts - id: classify_stress_tweet - image: - src: /assets/images/Classify_Stress_Tweet.svg - excerpt: This model can identify stress in social media (Twitter) posts in the self-disclosure category. The model finds whether a person claims he/she is stressed or not. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/healthcare/PUBLIC_HEALTH_STRESS/ - - text: Colab - type: blue_btn - url: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/PUBLIC_HEALTH_MB4SC.ipynb - - title: Identify the Source of Stress from Patient Posts - id: identify_source_stress_patient_posts - image: - src: /assets/images/Identify_Source_Stress_Patient.svg - excerpt: This demo shows how to classify source of emotional stress in text. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/healthcare/PUBLIC_HEALTH_SOURCE_OF_STRESS/ - - text: Colab - type: blue_btn - url: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/PUBLIC_HEALTH_MB4SC.ipynb ---- diff --git a/docs/demos/understand_financial_entities_context.md b/docs/demos/understand_financial_entities_context.md deleted file mode 100644 index ae6704a2931f..000000000000 --- a/docs/demos/understand_financial_entities_context.md +++ /dev/null @@ -1,84 +0,0 @@ ---- -layout: demopagenew -title: Understand Entities in Context - Finance NLP Demos & Notebooks -seotitle: 'Finance NLP: Understand Entities in Context - John Snow Labs' -subtitle: Run 300+ live demos and notebooks -full_width: true -permalink: /understand_financial_entities_context -key: demo -nav_key: demo -article_header: - type: demo -license: false -mode: immersivebg -show_edit_on_github: false -show_date: false -data: - sections: - - secheader: yes - secheader: - - subtitle: Understand Entities in Context - Live Demos & Notebooks - activemenu: understand_financial_entities_context - source: yes - source: - - title: Identify Competitors in a text - id: identify_competitors_text - image: - src: /assets/images/Identify_Competitors_in_a_text.svg - excerpt: This model uses Assertion Status to identify if a PRODUCT or an ORG is mentioned to be a competitor. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/finance/ASSERTIONDL_COMPETITORS - - text: Colab - type: blue_btn - url: - - title: Identify Past Work Experience - id: identify_past_work_experience - image: - src: /assets/images/Identify_Competitors_in_a_text.svg - excerpt: This model uses Assertion Status to identify if a mention to an Organization, Job Title or Date is about the past. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/finance/ASSERTIONDL_PAST_ROLES/ - - text: Colab - type: blue_btn - url: - - title: Detect Temporality and Certainty in Financial texts - id: detect_temporality_certainty_financial_texts - image: - src: /assets/images/Detect_Temporality_and_Certainty.svg - excerpt: This demo shows how to use Assertion Status to identify if financial information is described to happen in present, past, future or it’s just possible. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/finance/FINASSERTION_TEMPORALITY/ - - text: Colab - type: blue_btn - url: - - title: Financial Assertion Status (Negation) - id: financial_assertion_status_negation - image: - src: /assets/images/Financial_Assertion_Status_Negation.svg - excerpt: This is a Financial Negation model, aimed to identify if an NER entity is mentioned in the context to be negated or not. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/finance/NEGATION_DETECTION_IN_FINANCIAL_TEXTS/ - - text: Colab - type: blue_btn - url: - - title: Understand Increased or Decreased Amounts and Percentages in Context - id: understand_increased_decreased_amounts_percentages_context - image: - src: /assets/images/Understand_Increased_or_Decreased_Amounts_and_Percentages_in_Context.svg - excerpt: This demo shows how to use the Assertion Status model to identify if a mentioned amount or percentage is stated to be increased or decreased in context. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/finance/FINASSERTION_INCREASE_DECREASE/ - - text: Colab - type: blue_btn - url: ---- \ No newline at end of file diff --git a/docs/demos/understand_legal_entities_context.md b/docs/demos/understand_legal_entities_context.md deleted file mode 100644 index 1553748a4207..000000000000 --- a/docs/demos/understand_legal_entities_context.md +++ /dev/null @@ -1,48 +0,0 @@ ---- -layout: demopagenew -title: Understand Entities in Context - Spark NLP Demos & Notebooks -seotitle: 'Spark NLP: Understand Entities in Context - John Snow Labs' -subtitle: Run 300+ live demos and notebooks -full_width: true -permalink: /understand_legal_entities_context -key: demo -nav_key: demo -article_header: - type: demo -license: false -mode: immersivebg -show_edit_on_github: false -show_date: false -data: - sections: - - secheader: yes - secheader: - - subtitle: Understand Entities in Context - Live Demos & Notebooks - activemenu: understand_legal_entities_context - source: yes - source: - - title: Detect Temporality and Certainty in Legal texts - id: detect_temporality_certainty_legal_texts - image: - src: /assets/images/Detect_Temporality.svg - excerpt: This demo shows how to use Assertion Status to identify if legal information is described to happen in the present, past, future or if it’s just possible. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/legal/LEGASSERTION_TEMPORALITY/ - - text: Colab - type: blue_btn - url: - - title: Legal Assertion Status (Negation) - id: legal_assertion_status_negation - image: - src: /assets/images/Legal_Assertion_Status_Negation.svg - excerpt: This is a Legal Negation model, aimed to identify if an NER entity is mentioned in the context to be negated or not. - actions: - - text: Live Demo - type: normal - url: https://demo.johnsnowlabs.com/finance/NEGATION_DETECTION_IN_FINANCIAL_TEXTS/ - - text: Colab - type: blue_btn - url: ---- diff --git a/docs/en/concepts.md b/docs/en/concepts.md index e8eb559c62f4..35a12232444c 100644 --- a/docs/en/concepts.md +++ b/docs/en/concepts.md @@ -62,7 +62,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 jupyter +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 jupyter $ jupyter notebook ``` diff --git a/docs/en/examples.md b/docs/en/examples.md index 881729ac9b69..3923b36c66bc 100644 --- a/docs/en/examples.md +++ b/docs/en/examples.md @@ -16,7 +16,7 @@ $ java -version # should be Java 8 (Oracle or OpenJDK) $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 ``` ## Google Colab Notebook @@ -36,7 +36,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -p is for pyspark # -s is for spark-nlp # by default they are set to the latest -!bash colab.sh -p 3.2.3 -s 4.3.2 +!bash colab.sh -p 3.2.3 -s 4.4.0 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines. diff --git a/docs/en/hardware_acceleration.md b/docs/en/hardware_acceleration.md index 3bdf36b0faae..4a93b374e2b5 100644 --- a/docs/en/hardware_acceleration.md +++ b/docs/en/hardware_acceleration.md @@ -49,7 +49,7 @@ Since the new Transformer models such as BERT for Word and Sentence embeddings a | DeBERTa Large | +477%(5.8x) | | Longformer Base | +52%(1.5x) | -Spark NLP 4.3.2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: +Spark NLP 4.4.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 diff --git a/docs/en/install.md b/docs/en/install.md index 629db45c8e73..4de942cecae4 100644 --- a/docs/en/install.md +++ b/docs/en/install.md @@ -15,22 +15,22 @@ sidebar: ```bash # Install Spark NLP from PyPI -pip install spark-nlp==4.3.2 +pip install spark-nlp==4.4.0 # Install Spark NLP from Anacodna/Conda conda install -c johnsnowlabs spark-nlp # Load Spark NLP with Spark Shell -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 # Load Spark NLP with PySpark -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 # Load Spark NLP with Spark Submit -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 # Load Spark NLP as external JAR after compiling and building Spark NLP by `sbt assembly` -spark-shell --jars spark-nlp-assembly-4.3.2.jar +spark-shell --jars spark-nlp-assembly-4.4.0.jar ``` ## Python @@ -49,7 +49,7 @@ $ java -version # should be Java 8 (Oracle or OpenJDK) $ conda create -n sparknlp python=3.8 -y $ conda activate sparknlp -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 ``` Of course you will need to have jupyter installed in your system: @@ -76,7 +76,7 @@ spark = SparkSession.builder \ .config("spark.driver.memory","16G")\ .config("spark.driver.maxResultSize", "0") \ .config("spark.kryoserializer.buffer.max", "2000M")\ - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2")\ + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0")\ .getOrCreate() ``` @@ -91,7 +91,7 @@ spark = SparkSession.builder \ com.johnsnowlabs.nlp spark-nlp_2.12 - 4.3.2 + 4.4.0 ``` @@ -102,7 +102,7 @@ spark = SparkSession.builder \ com.johnsnowlabs.nlp spark-nlp-gpu_2.12 - 4.3.2 + 4.4.0 ``` @@ -113,7 +113,7 @@ spark = SparkSession.builder \ com.johnsnowlabs.nlp spark-nlp-m1_2.12 - 4.3.2 + 4.4.0 ``` @@ -124,7 +124,7 @@ spark = SparkSession.builder \ com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 - 4.3.2 + 4.4.0 ``` @@ -134,28 +134,28 @@ spark = SparkSession.builder \ ```scala // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.4.0" ``` **spark-nlp-gpu:** ```scala // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.4.0" ``` **spark-nlp-m1:** ```scala // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-m1 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0" ``` **spark-nlp-aarch64:** ```scala // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.4.0" ``` Maven Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp) @@ -233,7 +233,7 @@ maven coordinates like these: com.johnsnowlabs.nlp spark-nlp-m1_2.12 - 4.3.2 + 4.4.0 ``` @@ -241,7 +241,7 @@ or in case of sbt: ```scala // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-m1" % "4.4.0" ``` If everything went well, you can now start Spark NLP with the `m1` flag set to `true`: @@ -274,7 +274,7 @@ spark = sparknlp.start(m1=True) ## Installation for Linux Aarch64 Systems -Starting from version 4.3.2, Spark NLP supports Linux systems running on an aarch64 +Starting from version 4.4.0, Spark NLP supports Linux systems running on an aarch64 processor architecture. The necessary dependencies have been built on Ubuntu 16.04, so a recent system with an environment of at least that will be needed. @@ -318,7 +318,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -p is for pyspark # -s is for spark-nlp # by default they are set to the latest -!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.3.2 +!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.4.0 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start_google_colab.ipynb) is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP pretrained pipelines. @@ -337,7 +337,7 @@ Run the following code in Kaggle Kernel and start using spark-nlp right away. ## Databricks Support -Spark NLP 4.3.2 has been tested and is compatible with the following runtimes: +Spark NLP 4.4.0 has been tested and is compatible with the following runtimes: **CPU:** @@ -403,7 +403,7 @@ NOTE: Spark NLP 4.0.x is based on TensorFlow 2.7.x which is compatible with CUDA 3.1. Install New -> PyPI -> `spark-nlp` -> Install - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2` -> Install + 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0` -> Install 4. Now you can attach your notebook to the cluster and use Spark NLP! @@ -419,7 +419,7 @@ Note: You can import these notebooks by using their URLs. ## EMR Support -Spark NLP 4.3.2 has been tested and is compatible with the following EMR releases: +Spark NLP 4.4.0 has been tested and is compatible with the following EMR releases: - emr-6.2.0 - emr-6.3.0 @@ -477,7 +477,7 @@ A sample of your software configuration in JSON on S3 (must be public access): "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2" + "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0" } } ] @@ -487,7 +487,7 @@ A sample of AWS CLI to launch EMR cluster: ```sh aws emr create-cluster \ ---name "Spark NLP 4.3.2" \ +--name "Spark NLP 4.4.0" \ --release-label emr-6.2.0 \ --applications Name=Hadoop Name=Spark Name=Hive \ --instance-type m4.4xlarge \ @@ -741,7 +741,7 @@ We recommend using `conda` to manage your Python environment on Windows. Now you can use the downloaded binary by navigating to `%SPARK_HOME%\bin` and running -Either create a conda env for python 3.6, install *pyspark==3.3.1 spark-nlp numpy* and use Jupyter/python console, or in the same conda env you can go to spark bin for *pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2*. +Either create a conda env for python 3.6, install *pyspark==3.3.1 spark-nlp numpy* and use Jupyter/python console, or in the same conda env you can go to spark bin for *pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0*. @@ -767,12 +767,12 @@ spark = SparkSession.builder \ .config("spark.driver.memory","16G")\ .config("spark.driver.maxResultSize", "0") \ .config("spark.kryoserializer.buffer.max", "2000M")\ - .config("spark.jars", "/tmp/spark-nlp-assembly-4.3.2.jar")\ + .config("spark.jars", "/tmp/spark-nlp-assembly-4.4.0.jar")\ .getOrCreate() ``` - You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark version (3.x) -- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.3.2.jar`) +- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. (i.e., `hdfs:///tmp/spark-nlp-assembly-4.4.0.jar`) Example of using pretrained Models and Pipelines in offline: diff --git a/docs/en/spark_nlp.md b/docs/en/spark_nlp.md index e60d4f75ea13..b308fc05b843 100644 --- a/docs/en/spark_nlp.md +++ b/docs/en/spark_nlp.md @@ -25,7 +25,7 @@ Spark NLP is built on top of **Apache Spark 3.x**. For using Spark NLP you need: **GPU (optional):** -Spark NLP 4.3.2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: +Spark NLP 4.4.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 diff --git a/examples/docker/README.md b/examples/docker/README.md index a94cd58239d5..758a10266956 100644 --- a/examples/docker/README.md +++ b/examples/docker/README.md @@ -73,7 +73,7 @@ docker run -it --name sparknlp-container \ --conf "spark.serializer"="org.apache.spark.serializer.KryoSerializer" \ --conf "spark.kryoserializer.buffer.max"="2000M" \ --conf "spark.driver.maxResultSize"="0" \ - --packages "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2" + --packages "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0" ``` To run the shell with GPU support, we use the image from [Jupyter Notebook with GPU @@ -91,5 +91,5 @@ docker run -it --name sparknlp-container \ --conf "spark.serializer"="org.apache.spark.serializer.KryoSerializer" \ --conf "spark.kryoserializer.buffer.max"="2000M" \ --conf "spark.driver.maxResultSize"="0" \ - --packages "com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2" + --packages "com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0" ``` diff --git a/examples/util/Training_Helpers.ipynb b/examples/util/Training_Helpers.ipynb index d2eb474ed15e..03c9bc06d0a4 100644 --- a/examples/util/Training_Helpers.ipynb +++ b/examples/util/Training_Helpers.ipynb @@ -129,7 +129,7 @@ "id": "Jn3axMFZTaxV" }, "source": [ - "Starting at spark-nlp 4.3.2, you can also set an S3 URI. To configure this, it is necessary to set up the Spark session with the appropriate settings for both Spark NLP and Spark ML." + "Starting at spark-nlp 4.4.0, you can also set an S3 URI. To configure this, it is necessary to set up the Spark session with the appropriate settings for both Spark NLP and Spark ML." ] }, { @@ -246,7 +246,7 @@ " .config(\"spark.jsl.settings.aws.credentials.secret_access_key\", MY_SECRET_KEY) \\\n", " .config(\"spark.jsl.settings.aws.credentials.session_token\", MY_SESSION_KEY) \\\n", " .config(\"spark.jsl.settings.aws.region\", \"us-east-1\") \\\n", - " .config(\"spark.jars.packages\", \"com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2,org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk:1.11.901\") \\\n", + " .config(\"spark.jars.packages\", \"com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0,org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk:1.11.901\") \\\n", " .config(\"spark.hadoop.fs.s3a.impl\", \"org.apache.hadoop.fs.s3a.S3AFileSystem\") \\\n", " .config(\"spark.hadoop.fs.s3a.path.style.access\", \"true\") \\\n", " .getOrCreate()" diff --git a/python/README.md b/python/README.md index 418e62ea9fa7..14da5a95cec1 100644 --- a/python/README.md +++ b/python/README.md @@ -162,7 +162,7 @@ To use Spark NLP you need the following requirements: **GPU (optional):** -Spark NLP 4.3.2 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: +Spark NLP 4.4.0 is built with TensorFlow 2.7.1 and the following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 @@ -178,7 +178,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 ``` In Python console or Jupyter `Python3` kernel: @@ -223,11 +223,12 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh ## Apache Spark Support -Spark NLP *4.3.2* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and +Spark NLP *4.4.0* has been built on top of Apache Spark 3.2 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, and 3.3.x: | Spark NLP | Apache Spark 2.3.x | Apache Spark 2.4.x | Apache Spark 3.0.x | Apache Spark 3.1.x | Apache Spark 3.2.x | Apache Spark 3.3.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| +| 4.4.x | NO | NO | YES | YES | YES | YES | | 4.3.x | NO | NO | YES | YES | YES | YES | | 4.2.x | NO | NO | YES | YES | YES | YES | | 4.1.x | NO | NO | YES | YES | YES | YES | @@ -246,22 +247,23 @@ Find out more about `Spark NLP` versions from our [release notes](https://github ## Scala and Python Support -| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Scala 2.11 | Scala 2.12 | -|-----------|------------|------------|------------|------------|------------|------------| -| 4.3.x | YES | YES | YES | YES | NO | YES | -| 4.2.x | YES | YES | YES | YES | NO | YES | -| 4.1.x | YES | YES | YES | YES | NO | YES | -| 4.0.x | YES | YES | YES | YES | NO | YES | -| 3.4.x | YES | YES | YES | YES | YES | YES | -| 3.3.x | YES | YES | YES | NO | YES | YES | -| 3.2.x | YES | YES | YES | NO | YES | YES | -| 3.1.x | YES | YES | YES | NO | YES | YES | -| 3.0.x | YES | YES | YES | NO | YES | YES | -| 2.7.x | YES | YES | NO | NO | YES | NO | +| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 | +|-----------|------------|------------|------------|------------|------------|------------|------------| +| 4.4.x | NO | YES | YES | YES | YES | NO | YES | +| 4.3.x | YES | YES | YES | YES | YES | NO | YES | +| 4.2.x | YES | YES | YES | YES | YES | NO | YES | +| 4.1.x | YES | YES | YES | YES | NO | NO | YES | +| 4.0.x | YES | YES | YES | YES | NO | NO | YES | +| 3.4.x | YES | YES | YES | YES | NO | YES | YES | +| 3.3.x | YES | YES | YES | NO | NO | YES | YES | +| 3.2.x | YES | YES | YES | NO | NO | YES | YES | +| 3.1.x | YES | YES | YES | NO | NO | YES | YES | +| 3.0.x | YES | YES | YES | NO | NO | YES | YES | +| 2.7.x | YES | YES | NO | NO | NO | YES | NO | ## Databricks Support -Spark NLP 4.3.2 has been tested and is compatible with the following runtimes: +Spark NLP 4.4.0 has been tested and is compatible with the following runtimes: **CPU:** @@ -315,7 +317,7 @@ runtimes supporting CUDA 11 are 9.x and above as listed under GPU. ## EMR Support -Spark NLP 4.3.2 has been tested and is compatible with the following EMR releases: +Spark NLP 4.4.0 has been tested and is compatible with the following EMR releases: - emr-6.2.0 - emr-6.3.0 @@ -359,11 +361,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, ```sh # CPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` The `spark-nlp` has been published to @@ -372,11 +374,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # GPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.4.0 ``` @@ -386,11 +388,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # AArch64 -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:4.4.0 ``` @@ -400,11 +402,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # M1/M2 (Apple Silicon) -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.3.2 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.3.2 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:4.4.0 ``` @@ -418,7 +420,7 @@ set in your SparkSession: spark-shell \ --driver-memory 16g \ --conf spark.kryoserializer.buffer.max=2000M \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` ## Scala @@ -436,7 +438,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp_2.12 - 4.3.2 + 4.4.0 ``` @@ -447,7 +449,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-gpu_2.12 - 4.3.2 + 4.4.0 ``` @@ -458,7 +460,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 - 4.3.2 + 4.4.0 ``` @@ -469,7 +471,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-silicon_2.12 - 4.3.2 + 4.4.0 ``` @@ -479,28 +481,28 @@ coordinates: ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "4.4.0" ``` **spark-nlp-gpu:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "4.4.0" ``` **spark-nlp-aarch64:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "4.4.0" ``` **spark-nlp-silicon:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.3.2" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "4.4.0" ``` Maven @@ -522,7 +524,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through Pip: ```bash -pip install spark-nlp==4.3.2 +pip install spark-nlp==4.4.0 ``` Conda: @@ -551,7 +553,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0") .getOrCreate() ``` @@ -622,7 +624,7 @@ Use either one of the following options - Add the following Maven Coordinates to the interpreter's library list ```bash -com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is @@ -633,7 +635,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 Apart from the previous step, install the python module through pip ```bash -pip install spark-nlp==4.3.2 +pip install spark-nlp==4.4.0 ``` Or you can install `spark-nlp` from inside Zeppelin by using Conda: @@ -661,7 +663,7 @@ launch the Jupyter from the same Python environment: $ conda create -n sparknlp python=3.8 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==4.3.2 pyspark==3.3.1 jupyter +$ pip install spark-nlp==4.4.0 pyspark==3.3.1 jupyter $ jupyter notebook ``` @@ -678,7 +680,7 @@ export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=notebook -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` @@ -705,7 +707,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.3.2 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.4.0 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) @@ -728,7 +730,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.3.2 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 4.4.0 ``` [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live @@ -747,9 +749,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP 3. In `Libraries` tab inside your cluster you need to follow these steps: - 3.1. Install New -> PyPI -> `spark-nlp==4.3.2` -> Install + 3.1. Install New -> PyPI -> `spark-nlp==4.4.0` -> Install - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2` -> Install + 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0` -> Install 4. Now you can attach your notebook to the cluster and use Spark NLP! @@ -800,7 +802,7 @@ A sample of your software configuration in JSON on S3 (must be public access): "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2" + "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0" } }] ``` @@ -809,7 +811,7 @@ A sample of AWS CLI to launch EMR cluster: ```.sh aws emr create-cluster \ ---name "Spark NLP 4.3.2" \ +--name "Spark NLP 4.4.0" \ --release-label emr-6.2.0 \ --applications Name=Hadoop Name=Spark Name=Hive \ --instance-type m4.4xlarge \ @@ -873,7 +875,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \ --enable-component-gateway \ --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \ - --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` 2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI. @@ -912,7 +914,7 @@ spark = SparkSession.builder .config("spark.kryoserializer.buffer.max", "2000m") .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0") .getOrCreate() ``` @@ -926,7 +928,7 @@ spark-shell \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` **pyspark:** @@ -939,7 +941,7 @@ pyspark \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.3.2 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:4.4.0 ``` **Databricks:** @@ -1211,7 +1213,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars", "/tmp/spark-nlp-assembly-4.3.2.jar") + .config("spark.jars", "/tmp/spark-nlp-assembly-4.4.0.jar") .getOrCreate() ``` @@ -1220,7 +1222,7 @@ spark = SparkSession.builder version (3.0.x, 3.1.x, 3.2.x, and 3.3.x) - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/spark-nlp-assembly-4.3.2.jar`) + i.e., `hdfs:///tmp/spark-nlp-assembly-4.4.0.jar`) Example of using pretrained Models and Pipelines in offline: diff --git a/python/docs/conf.py b/python/docs/conf.py index 028615dbe872..06ebf3cdb6a8 100644 --- a/python/docs/conf.py +++ b/python/docs/conf.py @@ -23,7 +23,7 @@ author = "John Snow Labs" # The full version, including alpha/beta/rc tags -release = "4.3.2" +release = "4.4.0" pyspark_version = "3.2.3" # -- General configuration --------------------------------------------------- diff --git a/python/setup.py b/python/setup.py index 31dca66a73cf..b0bb5695f537 100644 --- a/python/setup.py +++ b/python/setup.py @@ -41,7 +41,7 @@ # project code, see # https://packaging.python.org/en/latest/single_source_version.html - version='4.3.2', # Required + version='4.4.0', # Required # This is a one-line description or tagline of what your project does. This # corresponds to the 'Summary' metadata field: diff --git a/python/sparknlp/__init__.py b/python/sparknlp/__init__.py index ede23d2fd859..562f6b8eb9c5 100644 --- a/python/sparknlp/__init__.py +++ b/python/sparknlp/__init__.py @@ -128,7 +128,7 @@ def start(gpu=False, The initiated Spark session. """ - current_version = "4.3.2" + current_version = "4.4.0" if params is None: params = {} @@ -298,4 +298,4 @@ def version(): str The current Spark NLP version. """ - return '4.3.2' + return '4.4.0' diff --git a/scripts/colab_setup.sh b/scripts/colab_setup.sh index a23017c2df5e..bd91295c8dd4 100644 --- a/scripts/colab_setup.sh +++ b/scripts/colab_setup.sh @@ -1,7 +1,7 @@ #!/bin/bash #default values for pyspark, spark-nlp, and SPARK_HOME -SPARKNLP="4.3.2" +SPARKNLP="4.4.0" PYSPARK="3.2.3" while getopts s:p:g option diff --git a/scripts/kaggle_setup.sh b/scripts/kaggle_setup.sh index 492365c31d2a..33b7fb3c7cad 100644 --- a/scripts/kaggle_setup.sh +++ b/scripts/kaggle_setup.sh @@ -1,7 +1,7 @@ #!/bin/bash #default values for pyspark, spark-nlp, and SPARK_HOME -SPARKNLP="4.3.2" +SPARKNLP="4.4.0" PYSPARK="3.2.3" while getopts s:p:g option diff --git a/scripts/sagemaker_setup.sh b/scripts/sagemaker_setup.sh index fba7d9f80a6a..29828875db87 100644 --- a/scripts/sagemaker_setup.sh +++ b/scripts/sagemaker_setup.sh @@ -1,7 +1,7 @@ #!/bin/bash # Default values for pyspark, spark-nlp, and SPARK_HOME -SPARKNLP="4.3.2" +SPARKNLP="4.4.0" PYSPARK="3.2.3" echo "Setup SageMaker for PySpark $PYSPARK and Spark NLP $SPARKNLP" diff --git a/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala b/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala index 98f4b6e6d9e9..608a5003b87c 100644 --- a/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala +++ b/src/main/scala/com/johnsnowlabs/nlp/SparkNLP.scala @@ -20,7 +20,7 @@ import org.apache.spark.sql.SparkSession object SparkNLP { - val currentVersion = "4.3.2" + val currentVersion = "4.4.0" val MavenSpark3 = s"com.johnsnowlabs.nlp:spark-nlp_2.12:$currentVersion" val MavenGpuSpark3 = s"com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:$currentVersion" val MavenSparkSilicon = s"com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:$currentVersion" diff --git a/src/main/scala/com/johnsnowlabs/util/Build.scala b/src/main/scala/com/johnsnowlabs/util/Build.scala index aa56594894b4..23922a8c8bb0 100644 --- a/src/main/scala/com/johnsnowlabs/util/Build.scala +++ b/src/main/scala/com/johnsnowlabs/util/Build.scala @@ -17,5 +17,5 @@ package com.johnsnowlabs.util object Build { - val version: String = "4.3.2" + val version: String = "4.4.0" }