JohnSnowLabs · KshitizGIT · Jul 8, 2024 · Jul 8, 2024
diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
@@ -229,7 +229,7 @@ sparknlp-healthcare:
         url: /licensed/api/python
       - title: Wiki
         url: /docs/en/wiki
-      - title: Speed Benchmarks
+      - title: Benchmarks
         url: /docs/en/benchmark      
       - title: Best Practices Using Pretrained Models Together
         url: /docs/en/best_practices_pretrained_models

diff --git a/docs/_includes/docs-sparckocr-pagination.html b/docs/_includes/docs-sparckocr-pagination.html
@@ -1,3 +1,14 @@
+<ul class="pagination">
+  <li>
+      <a href="#">Version <strong id="previosver"></strong></a>
+  </li>
+  <li>
+      <strong>Version <strong id="currversion"></strong></strong>
+  </li>
+  <li>
+      <a href="#">Version <strong id="nextver"></strong></a>
+  </li>
+</ul>
 <ul class="pagination owl-carousel pagination_big">
    <li><a href="release_notes_5_3_1">5.3.1</a></li>
     <li><a href="release_notes_5_3_0">5.3.0</a></li>

diff --git a/docs/_includes/scripts/article.js b/docs/_includes/scripts/article.js
@@ -30,8 +30,6 @@ $(document).ready(function () {
       $('.pagination_big').owlCarousel({
         margin:10,
         nav:true,
-        center: true,
-        loop: true,
         dots:false,
         responsive:{
             0:{

diff --git a/docs/en/legal_release_notes.md b/docs/en/legal_release_notes.md
@@ -15,7 +15,7 @@ sidebar:
 
 ## Releases log
 
-
+{:.table-model-big}
 | 	                                                                                                            |                                                                                                                                                      |                                                                                                             |                                                                                                        |
 |--------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
 | [1.0.0](https://medium.com/spark-nlp/spark-nlp-for-legal-1-0-0-over-300-new-state-of-the-art-models-in-multiple-languages-f3bae55c32e1)                         | [1.1.0](https://medium.com/@muhendisbp/legal-nlp-1-1-0-for-spark-nlp-has-been-released-89de7f099bdc)                                                 | [1.2.0](https://medium.com/spark-nlp/legal-nlp-1-2-0-for-spark-nlp-has-been-released-%EF%B8%8F-8d060b3391ef) | [1.3.0](https://gaddesaishailesh.medium.com/spark-nlp-for-legal-1-3-0-over-100-new-state-of-the-art-models-%EF%B8%8F-b069207ce77f) |

diff --git a/docs/en/licensed_version_compatibility.md b/docs/en/licensed_version_compatibility.md
@@ -13,7 +13,7 @@ sidebar:
 
 <div class="h3-box" markdown="1">
 
-
+{:.table-model-big}
 | Spark NLP for Healthcare	| Spark NLP (Public) |
 |---------------------------|--------------------|
 | 5.3.3                     | 5.3.2              |
@@ -95,7 +95,7 @@ sidebar:
 | 2.3.4   			        | 2.3.4              |
 
 
-
+{:.table-model-big}
 | Spark NLP for Healthcare	| Spark OCR          |
 |---------------------------|--------------------|
 | 4.3.0                     | 4.3.1              |

diff --git a/docs/en/ocr_benchmark.md b/docs/en/ocr_benchmark.md
@@ -33,7 +33,7 @@ sidebar:
 
 #### Benchmark Table
 
-{:.table-model-big.db}
+{:.table-model-big}
 | Instance      | memory | cores | input\_data\_pages| partition     | second per page | timing  |
 | ------------- | ------ | ----- | ----------------- | ------------- | --------------- | ------- |
 | m5n.4xlarge   | 64 GB  | 16    | 1000              | 10            | 0.24            | 4 mins  |

diff --git a/docs/en/ocr_structures.md b/docs/en/ocr_structures.md
@@ -367,6 +367,7 @@ Show single image with metadata in Jupyter notebook.
 
 {:.table-model-big}
 | Param name | Type | Default | Description |
+|------------|------|---------|-------------|
 | width | string | "600" | width of image |
 | show_meta | boolean | true | enable/disable displaying metadata of image |
 

diff --git a/docs/en/spark_nlp_healthcare_versions/release_notes_2_7_1.md b/docs/en/spark_nlp_healthcare_versions/release_notes_2_7_1.md
@@ -79,6 +79,7 @@ Output:
 
 {:.table-model-big}
 |    | chunks                      |   begin |   end |      code | resolutions
+|----|-----------------------------|---------|-------|-----------|------------|
 |  2 | COPD  				 |     113 |   116 |  13645005 | copd - chronic obstructive pulmonary disease
 |  8 | PTCA                        |     324 |   327 | 373108000 | post percutaneous transluminal coronary angioplasty (finding)
 | 16 | close monitoring            |     519 |   534 | 417014005 | on examination - vigilance

diff --git a/docs/en/spark_nlp_healthcare_versions/release_notes_4_4_0.md b/docs/en/spark_nlp_healthcare_versions/release_notes_4_4_0.md
@@ -88,7 +88,7 @@ Our clinical summarizer models with only 250M parameters perform 30-35% better t
 
 🔎 Benchmark on MtSamples Summarization Dataset
 
-{:.table-model-big}
+{:.table-model-big.db}
 | model_name | model_size | Rouge | Bleu | bertscore_precision | bertscore_recall: | bertscore_f1 |
 |--|--|--|--|--|--|--|
 philschmid/flan-t5-base-samsum | 250M | 0.1919 | 0.1124 | 0.8409 | 0.8964 | 0.8678 |
@@ -100,7 +100,7 @@ transformersbook/pegasus-samsum | 500M | 0.1924 | 0.0965 | 0.8920 | 0.8149 | 0.8
 
 🔎 Benchmark on MIMIC Summarization Dataset
 
-{:.table-model-big}
+{:.table-model-big.db}
 | model_name | model_size | Rouge | Bleu | bertscore_precision | bertscore_recall: | bertscore_f1 |
 |--|--|--|--|--|--|--|
 philschmid/flan-t5-base-samsum | 250M | 0.1910 | 0.1037 | 0.8708 | 0.9056 | 0.8879 |
@@ -110,7 +110,7 @@ transformersbook/pegasus-samsum | 570M | 0.1425 | 0.0582 | 0.9171 | 0.8682 | 0.8
 **summarizer_clinical_jsl** | **250M** | **0.395** | **0.2962** | **0.895** | **0.9316** | **0.913** |
 **summarizer_clinical_jsl_augmented** | **250M** | **0.3964** | **0.307** | **0.9109** | **0.9452** | **0.9227** |
 
-![image](https://user-images.githubusercontent.com/64752006/230899745-3a67d142-1bdf-4f4b-83cb-d012953b1e09.png)
+![Benchmark on MIMIC Summarization Dataset](https://user-images.githubusercontent.com/64752006/230899745-3a67d142-1bdf-4f4b-83cb-d012953b1e09.png)
 
 </div><div class="h3-box" markdown="1">
 

diff --git a/docs/en/spark_ocr_versions/ocr_release_notes.md b/docs/en/spark_ocr_versions/ocr_release_notes.md
@@ -39,6 +39,7 @@ Release date: 11-04-2024
 ## Improved table extraction capabilities in HocrToTextTable
 Many issues related to column detection in our Table Extraction pipelines are addressed in this release, compared to previous Visual NLP version the metrics have improved. Table below shows F1-score(CAR or Cell Adjacency Relationship) performances on ICDAR 19 Track B dataset for different IoU values of our two versions in comparison with [other results](https://paperswithcode.com/paper/multi-type-td-tsr-extracting-tables-from/review/).
 
+{:.table-model-big}
 | Model  | 0.6 | 0.7 | 0.8 | 0.9 |
 | ------------- | ------------- |------------- |------------- |------------- |
 | CascadeTabNet	  | 0.438  | 0.354 | 0.19 | 0.036 |
@@ -88,7 +89,7 @@ ocr = ImageToTextV2.pretrained("ocr_base_printed_v2_opt", "en", "clinical/ocr")
     .setIncludeConfidence(True)
 ```
 
-![image](/assets/images/ocr/confidence_score.png)
+![Confidence scores in ImageToTextV2](/assets/images/ocr/confidence_score.png)
 
 Check this [updated notebook](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/jupyter/TextRecognition/SparkOcrImageToTextV2.ipynb) for an end-to-end example.
 
@@ -131,9 +132,6 @@ ImageDrawRegions is the annotator used for rendering regions into images so we c
 ### Bug Fixes
 + PdfToImage resetting page information when used in the same pipeline as PdfToText: When the sequence {PdfToText, PdfToImage} was used the original pages computed at PdfToText where resetted to zero by PdfToImage.
 
-
-
-
 </div><div class="prev_ver h3-box" markdown="1">
 
 ## Previous versions

diff --git a/docs/en/spark_ocr_versions/release_notes_1_10_0.md b/docs/en/spark_ocr_versions/release_notes_1_10_0.md
@@ -21,6 +21,8 @@ Release date: 20-01-2021
 
 Support Microsoft Docx documents.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Added [DocToText](/docs/en/ocr_pipeline_components#doctotext) transformer for extract text
@@ -30,6 +32,8 @@ table data from DOCX documents.
 * Added [DocToPdf](/docs/en/ocr_pipeline_components#doctopdf) transformer for convert DOCX
  documents to PDF format.
 
+</div><div class="h3-box" markdown="1">
+
 #### Bugfixes
 
 * Fixed issue with loading model data on some cluster configurations

diff --git a/docs/en/spark_ocr_versions/release_notes_1_11_0.md b/docs/en/spark_ocr_versions/release_notes_1_11_0.md
@@ -22,22 +22,30 @@ Release date: 25-02-2021
 Support German, French, Spanish and Russian languages.
 Improving [PositionsFinder](/docs/en/ocr_pipeline_components#positionsfinder) and ImageToText for better support de-identification.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Loading model data from S3 in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 * Added support German, French, Spanish, Russian languages in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 * Added different OCR model types: Base, Best, Fast in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 
+</div><div class="h3-box" markdown="1">
+
 #### Enhancements
 
 * Added spaces symbols to the output positions in the [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) transformer.
 * Eliminate python-levensthein from dependencies for simplify installation.
 
+</div><div class="h3-box" markdown="1">
+
 #### Bugfixes
 
 * Fixed issue with extracting coordinates in  in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 * Fixed loading model data on cluster in yarn mode.
 
+</div><div class="h3-box" markdown="1">
+
 #### New notebooks
 
 * [Languages Support](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/1.11.0/jupyter/SparkOcrLanguagesSupport.ipynb)

diff --git a/docs/en/spark_ocr_versions/release_notes_1_2_0.md b/docs/en/spark_ocr_versions/release_notes_1_2_0.md
@@ -22,13 +22,17 @@ Release date: 08-04-2020
 
 Improved support Databricks and processing selectable pdfs.
 
+</div><div class="h3-box" markdown="1">
+
 #### Enhancements
 
 * Adapted Spark OCR for run on Databricks.
 * Added rewriting positions in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) when run together with PdfToText.
 * Added 'positionsCol' param to [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 * Improved support Spark NLP. Changed [start](/docs/en/ocr_install#using-start-function) function.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Added [showImage](/docs/en/ocr_structures#showimages) implicit to Dataframe for display images in Scala Databricks notebooks.

diff --git a/docs/en/spark_ocr_versions/release_notes_1_3_0.md b/docs/en/spark_ocr_versions/release_notes_1_3_0.md
@@ -21,12 +21,16 @@ Release date: 22-05-2020
 
 New functionality for de-identification problem.
 
+</div><div class="h3-box" markdown="1">
+
 #### Enhancements
 
 * Renamed TesseractOCR to ImageToText. 
 * Simplified installation.
 * Added check license from `SPARK_NLP_LICENSE` env varibale.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Support storing for binaryFormat. Added support storing Image and PDF files.

diff --git a/docs/en/spark_ocr_versions/release_notes_1_4_0.md b/docs/en/spark_ocr_versions/release_notes_1_4_0.md
@@ -21,19 +21,25 @@ Release date: 23-06-2020
 
 Added support Dicom format and improved support image morphological operations.
 
+</div><div class="h3-box" markdown="1">
+
 #### Enhancements
 
 * Updated [start](/docs/en/ocr_install#using-start-function) function. Improved support Spark NLP internal.
 * `ImageMorphologyOpening` and `ImageErosion` are removed.
 * Improved existing transformers for support de-identification Dicom documents.
 * Added possibility to draw filled rectangles to [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions).
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Support reading and writing Dicom documents.
 * Added [ImageMorphologyOperation](/docs/en/ocr_pipeline_components#imagemorphologyoperation) transformer which support:
  erosion, dilation, opening and closing operations.
-
+
+</div><div class="h3-box" markdown="1">
+
 #### Bugfixes
 
 * Fixed issue in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) related to extraction coordinates.

diff --git a/docs/en/spark_ocr_versions/release_notes_1_6_0.md b/docs/en/spark_ocr_versions/release_notes_1_6_0.md
@@ -21,6 +21,7 @@ Release date: 05-09-2020
 
 Support parsing data from tables for selectable PDFs.
 
+</div><div class="h3-box" markdown="1">
 
 #### New Features
 

diff --git a/docs/en/spark_ocr_versions/release_notes_1_8_0.md b/docs/en/spark_ocr_versions/release_notes_1_8_0.md
@@ -22,6 +22,8 @@ Release date: 20-11-2020
 Optimisation performance for processing multipage PDF documents.
 Support up to 10k pages per document.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Added [ImageAdaptiveBinarizer](/docs/en/ocr_pipeline_components#imageadaptivebinarizer) Scala transformer with support:
@@ -30,6 +32,7 @@ Support up to 10k pages per document.
     - Sauvola local thresholding
 * Added possibility to split pdf to small documents for optimize processing in [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage).
 
+</div><div class="h3-box" markdown="1">
 
 #### Enhancements
 

diff --git a/docs/en/spark_ocr_versions/release_notes_1_9_0.md b/docs/en/spark_ocr_versions/release_notes_1_9_0.md
@@ -21,6 +21,8 @@ Release date: 11-12-2020
 
 Extension of  FoundationOne report parser and support HOCR output format.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Added [ImageToHocr](/docs/en/ocr_pipeline_components#imagetohocr) transformer for recognize text from image and store it to HOCR format.

diff --git a/docs/en/spark_ocr_versions/release_notes_3_0_0.md b/docs/en/spark_ocr_versions/release_notes_3_0_0.md
@@ -25,6 +25,8 @@ Spark OCR 3.0.0 extends the support for Apache Spark 3.0.x and 3.1.x major relea
 
 Spark OCR started to support Tensorflow models. First model is [VisualDocumentClassifier](/docs/en/ocr_pipeline_components#visualdocumentclassifier).
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Support for Apache Spark and PySpark 3.0.x on Scala 2.12
@@ -47,6 +49,8 @@ Spark OCR started to support Tensorflow models. First model is [VisualDocumentCl
 * [VisualDocumentClassifier](/docs/en/ocr_pipeline_components#visualdocumentclassifier) model for classification documents using text and layout data.
 * Added support Vietnamese language.
 
+</div><div class="h3-box" markdown="1">
+
 #### New notebooks
 
 * [Visual Document Classifier](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/jupyter/SparkOCRVisualDocumentClassifier.ipynb)

diff --git a/docs/en/spark_ocr_versions/release_notes_3_10_0.md b/docs/en/spark_ocr_versions/release_notes_3_10_0.md
@@ -22,6 +22,7 @@ Release date: 10-01-2022
 
 Form recognition using LayoutLMv2 and text detection.
 
+</div><div class="h3-box" markdown="1">
 
 #### New Features
 
@@ -30,12 +31,14 @@ Form recognition using LayoutLMv2 and text detection.
 * Support rotated regions in [ImageSplitRegions](/docs/en/ocr_pipeline_components#imagesplitregions)
 * Support rotated regions in [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions)
 
+</div><div class="h3-box" markdown="1">
 
 #### New Models
 
 * LayoutLMv2 fine-tuned on FUNSD dataset
 * Text detection model based on CRAFT architecture
 
+</div><div class="h3-box" markdown="1">
 
 #### New notebooks
 

diff --git a/docs/en/spark_ocr_versions/release_notes_3_11_0.md b/docs/en/spark_ocr_versions/release_notes_3_11_0.md
@@ -20,9 +20,11 @@ Release date: 28-02-2022
 
 #### Overview
 
-We are glad to announce that Spark OCR 3.11.0 has been released!.
+We are glad to announce that Spark OCR 3.11.0 has been released!
 This release comes with new models, new features, bug fixes, and notebook examples.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Added [ImageTextDetectorV2](/docs/en/ocr_object_detection#imagetextdetectorv2) Python Spark-OCR Transformer for detecting printed and handwritten text
@@ -32,11 +34,15 @@ This release comes with new models, new features, bug fixes, and notebook exampl
 * Added [FormRelationExtractor](/docs/en/ocr_visual_document_understanding#formrelationextractor) for detecting relations between key and value entities in forms.
 * Added the capability of fine tuning VisualDocumentNerV2 models for key-value pairs extraction.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Models
 
 * ImageTextDetectorV2: this extends the ImageTextDetectorV1 character level text detection model with a refiner net architecture.
 * ImageTextRecognizerV2: Text recognition for printed text based on the Deep Learning Transformer Architecture.
 
+</div><div class="h3-box" markdown="1">
+
 #### New notebooks
 
 * [SparkOcrImageToTextV2](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/3110-release-candidate/jupyter/TextRecognition/SparkOcrImageToTextV2.ipynb)

diff --git a/docs/en/spark_ocr_versions/release_notes_3_12_0.md b/docs/en/spark_ocr_versions/release_notes_3_12_0.md
@@ -21,6 +21,8 @@ Release date: 14-04-2022
 We're glad to announce that Spark OCR 3.12.0 has been released!
 This release comes with new models for Handwritten Text Recognition, Spark 3.2 support, bug fixes, and notebook examples.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Features
 
 * Added to the ImageTextDetectorV2:
@@ -57,16 +59,22 @@ spark.conf.set("spark.sql.optimizer.nestedSchemaPruning.enabled", False)
 
 * Improved documentation on the website.
 
+</div><div class="h3-box" markdown="1">
+
 #### New Models
 
 ocr_small_printed: Text recognition small model for printed text based on ImageToTextV2
 ocr_small_handwritten: Text recognition small model for handwritten text based on ImageToTextV2
 ocr_base_handwritten: Text recognition base model for handwritten text based on ImageToTextV2
 
+</div><div class="h3-box" markdown="1">
+
 #### Bug Fixes
 
 * display_table() function failing to display tables coming from digital PDFs.
 
+</div><div class="h3-box" markdown="1">
+
 #### New notebooks
 
 * [SparkOcrImageToTextV2OutputFormats.ipynb](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/3120-release-candidate/jupyter/TextRecognition/SparkOcrImageToTextV2OutputFormats.ipynb), different output formats for ImageToTextV2.