update fixes (#208)

JohnSnowLabs · May 11, 2023 · d82d881 · d82d881
1 parent 44de430
commit d82d881
Show file tree

Hide file tree

Showing 24 changed files with 80 additions and 77 deletions.
diff --git a/docs/_includes/demomenu.html b/docs/_includes/demomenu.html
@@ -89,6 +89,9 @@
           <li {% if _section.secheader %} {% for child in _section.secheader %} {% if child.activemenu == "risk_factors" %} class="active"  {% endif %} {% endfor %} {% endif %} >
             <a href="/risk_factors">Risk Factors</a>
           </li>
+          <li>
+            <a target="_blank" href="https://demo.johnsnowlabs.com/healthcare/MODELS/">Explore Healthcare NLP Models</a>
+          </li>
         </ul> 
         <ul class="acc-top"><li class="acc-top-item">Voice of Patients<i class="fas fa-chevron-down"></i></li></ul>
         <ul class="acc-body">          

diff --git a/docs/_includes/head.html b/docs/_includes/head.html
@@ -11,7 +11,7 @@
 <!-- End Google Tag Manager -->
 <!-- Google tag (gtag.js) Additional adding by Andrew -->
 
-<script async src=https://www.googletagmanager.com/gtag/js?id=UA-70312582-1></script>
+<!-- <script async src=https://www.googletagmanager.com/gtag/js?id=UA-70312582-1></script>
 
 <script>
 
@@ -25,7 +25,7 @@
 
   gtag('config', 'UA-70312582-1');
 
-</script>
+</script> -->
 
 {%- include snippets/get-article-pagetitle.html article=page -%}
 {%- assign _article_pagetitle = __return -%}

diff --git a/docs/en/spark_ocr_versions/release_notes_1_11_0.md b/docs/en/spark_ocr_versions/release_notes_1_11_0.md
@@ -20,22 +20,22 @@ Release date: 25-02-2021
 #### Overview
 
 Support German, French, Spanish and Russian languages.
-Improving [PositionsFinder](ocr_pipeline_components#positionsfinder) and ImageToText for better support de-identification.
+Improving [PositionsFinder](/docs/en/ocr_pipeline_components#positionsfinder) and ImageToText for better support de-identification.
 
 #### New Features
 
-* Loading model data from S3 in [ImageToText](ocr_pipeline_components#imagetotext).
-* Added support German, French, Spanish, Russian languages in [ImageToText](ocr_pipeline_components#imagetotext).
-* Added different OCR model types: Base, Best, Fast in [ImageToText](ocr_pipeline_components#imagetotext).
+* Loading model data from S3 in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
+* Added support German, French, Spanish, Russian languages in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
+* Added different OCR model types: Base, Best, Fast in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 
 #### Enhancements
 
-* Added spaces symbols to the output positions in the [ImageToText](ocr_pipeline_components#imagetotext) transformer.
+* Added spaces symbols to the output positions in the [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) transformer.
 * Eliminate python-levensthein from dependencies for simplify installation.
 
 #### Bugfixes
 
-* Fixed issue with extracting coordinates in  in [ImageToText](ocr_pipeline_components#imagetotext).
+* Fixed issue with extracting coordinates in  in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
 * Fixed loading model data on cluster in yarn mode.
 
 #### New notebooks

diff --git a/docs/en/spark_ocr_versions/release_notes_1_2_0.md b/docs/en/spark_ocr_versions/release_notes_1_2_0.md
@@ -25,15 +25,15 @@ Improved support Databricks and processing selectable pdfs.
 #### Enhancements
 
 * Adapted Spark OCR for run on Databricks.
-* Added rewriting positions in [ImageToText](../ocr_pipeline_components#imagetotext) when run together with PdfToText.
-* Added 'positionsCol' param to [ImageToText](../ocr_pipeline_components#imagetotext).
-* Improved support Spark NLP. Changed [start](../ocr_install#using-start-function) function.
+* Added rewriting positions in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) when run together with PdfToText.
+* Added 'positionsCol' param to [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
+* Improved support Spark NLP. Changed [start](/docs/en/ocr_install#using-start-function) function.
 
 #### New Features
 
-* Added [showImage](../ocr_structures#showimages) implicit to Dataframe for display images in Scala Databricks notebooks.
-* Added [display_images](../ocr_structures#display_images) function for display images in Python Databricks notebooks.
-* Added propagation selectable pdf file in [TextToPdf](../ocr_pipeline_components#texttopdf). Added 'inputContent' param to 'TextToPdf'.
+* Added [showImage](/docs/en/ocr_structures#showimages) implicit to Dataframe for display images in Scala Databricks notebooks.
+* Added [display_images](/docs/en/ocr_structures#display_images) function for display images in Python Databricks notebooks.
+* Added propagation selectable pdf file in [TextToPdf](/docs/en/ocr_pipeline_components#texttopdf). Added 'inputContent' param to 'TextToPdf'.
 
 
 </div><div class="prev_ver h3-box" markdown="1">

diff --git a/docs/en/spark_ocr_versions/release_notes_1_3_0.md b/docs/en/spark_ocr_versions/release_notes_1_3_0.md
@@ -30,8 +30,8 @@ New functionality for de-identification problem.
 #### New Features
 
 * Support storing for binaryFormat. Added support storing Image and PDF files.
-* Support selectable pdf for [TextToPdf](ocr_pipeline_components#texttopdf) transformer.
-* Added [UpdateTextPosition](ocr_pipeline_components#updatetextposition) transformer.
+* Support selectable pdf for [TextToPdf](/docs/en/ocr_pipeline_components#texttopdf) transformer.
+* Added [UpdateTextPosition](/docs/en/ocr_pipeline_components#updatetextposition) transformer.
 
 
 </div><div class="prev_ver h3-box" markdown="1">

diff --git a/docs/en/spark_ocr_versions/release_notes_1_4_0.md b/docs/en/spark_ocr_versions/release_notes_1_4_0.md
@@ -23,20 +23,20 @@ Added support Dicom format and improved support image morphological operations.
 
 #### Enhancements
 
-* Updated [start](ocr_install#using-start-function) function. Improved support Spark NLP internal.
+* Updated [start](/docs/en/ocr_install#using-start-function) function. Improved support Spark NLP internal.
 * `ImageMorphologyOpening` and `ImageErosion` are removed.
 * Improved existing transformers for support de-identification Dicom documents.
-* Added possibility to draw filled rectangles to [ImageDrawRegions](ocr_pipeline_components#imagedrawregions).
+* Added possibility to draw filled rectangles to [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions).
 
 #### New Features
 
 * Support reading and writing Dicom documents.
-* Added [ImageMorphologyOperation](ocr_pipeline_components#imagemorphologyoperation) transformer which support:
+* Added [ImageMorphologyOperation](/docs/en/ocr_pipeline_components#imagemorphologyoperation) transformer which support:
  erosion, dilation, opening and closing operations.
 
 #### Bugfixes
 
-* Fixed issue in [ImageToText](ocr_pipeline_components#imagetotext) related to extraction coordinates.
+* Fixed issue in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) related to extraction coordinates.
 
 
 </div><div class="prev_ver h3-box" markdown="1">

diff --git a/docs/en/spark_ocr_versions/release_notes_1_5_0.md b/docs/en/spark_ocr_versions/release_notes_1_5_0.md
@@ -28,7 +28,7 @@ FoundationOne report parsing support.
 
 #### New Features
 
-* Added [FoundationOneReportParser](ocr_pipeline_components#foundationonereportparser) which support parsing patient info,
+* Added [FoundationOneReportParser](/docs/en/ocr_pipeline_components#foundationonereportparser) which support parsing patient info,
 genomic and biomarker findings.
 
 

diff --git a/docs/en/spark_ocr_versions/release_notes_1_6_0.md b/docs/en/spark_ocr_versions/release_notes_1_6_0.md
@@ -24,9 +24,9 @@ Support parsing data from tables for selectable PDFs.
 
 #### New Features
 
-* Added [PdfToTextTable](ocr_pipeline_components#pdftotexttable) transformer for extract tables from Pdf document per each page.
-* Added [ImageCropper](ocr_pipeline_components#imagecropper) transformer for crop images.
-* Added [ImageBrandsToText](ocr_pipeline_components#imagebrandstotext) transformer for detect text in defined areas.
+* Added [PdfToTextTable](/docs/en/ocr_pipeline_components#pdftotexttable) transformer for extract tables from Pdf document per each page.
+* Added [ImageCropper](/docs/en/ocr_pipeline_components#imagecropper) transformer for crop images.
+* Added [ImageBrandsToText](/docs/en/ocr_pipeline_components#imagebrandstotext) transformer for detect text in defined areas.
 
 
 </div><div class="prev_ver h3-box" markdown="1">

diff --git a/docs/en/spark_ocr_versions/release_notes_1_8_0.md b/docs/en/spark_ocr_versions/release_notes_1_8_0.md
@@ -24,22 +24,22 @@ Support up to 10k pages per document.
 
 #### New Features
 
-* Added [ImageAdaptiveBinarizer](ocr_pipeline_components#imageadaptivebinarizer) Scala transformer with support:
+* Added [ImageAdaptiveBinarizer](/docs/en/ocr_pipeline_components#imageadaptivebinarizer) Scala transformer with support:
     - Gaussian local thresholding
     - Otsu thresholding
     - Sauvola local thresholding
-* Added possibility to split pdf to small documents for optimize processing in [PdfToImage](ocr_pipeline_components#pdftoimage).
+* Added possibility to split pdf to small documents for optimize processing in [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage).
 
 
 #### Enhancements
 
-* Added applying binarization in [PdfToImage](ocr_pipeline_components#pdftoimage) for optimize memory usage.
-* Added `pdfCoordinates` param to the [ImageToText](ocr_pipeline_components#imagetotext) transformer.
-* Added 'total_pages' field to the [PdfToImage](ocr_pipeline_components#pdftoimage) transformer.
-* Added different splitting strategies to the [PdfToImage](ocr_pipeline_components#pdftoimage) transformer.
-* Simplified paging [PdfToImage](ocr_pipeline_components#pdftoimage) when run it with splitting to small PDF.
-* Added params to the [PdfToText](ocr_pipeline_components#pdftotext) for disable extra functionality.
-* Added `master_url` param to the python [start](ocr_install#using-start-function) function.
+* Added applying binarization in [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) for optimize memory usage.
+* Added `pdfCoordinates` param to the [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) transformer.
+* Added 'total_pages' field to the [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) transformer.
+* Added different splitting strategies to the [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) transformer.
+* Simplified paging [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) when run it with splitting to small PDF.
+* Added params to the [PdfToText](/docs/en/ocr_pipeline_components#pdftotext) for disable extra functionality.
+* Added `master_url` param to the python [start](/docs/en/ocr_install#using-start-function) function.
 
 
 </div><div class="prev_ver h3-box" markdown="1">

diff --git a/docs/en/spark_ocr_versions/release_notes_1_9_0.md b/docs/en/spark_ocr_versions/release_notes_1_9_0.md
@@ -23,8 +23,8 @@ Extension of  FoundationOne report parser and support HOCR output format.
 
 #### New Features
 
-* Added [ImageToHocr](ocr_pipeline_components#imagetohocr) transformer for recognize text from image and store it to HOCR format.
-* Added parsing gene lists from 'Appendix' in [FoundationOneReportParser](ocr_pipeline_components#foundationonereportparser) transformer.
+* Added [ImageToHocr](/docs/en/ocr_pipeline_components#imagetohocr) transformer for recognize text from image and store it to HOCR format.
+* Added parsing gene lists from 'Appendix' in [FoundationOneReportParser](/docs/en/ocr_pipeline_components#foundationonereportparser) transformer.
 
 
 </div><div class="prev_ver h3-box" markdown="1">

diff --git a/docs/en/spark_ocr_versions/release_notes_3_0_0.md b/docs/en/spark_ocr_versions/release_notes_3_0_0.md
@@ -23,7 +23,7 @@ We are very excited to release Spark OCR 3.0.0!
 
 Spark OCR 3.0.0 extends the support for Apache Spark 3.0.x and 3.1.x major releases on Scala 2.12 with both Hadoop 2.7. and 3.2. We will support all 4 major Apache Spark and PySpark releases of 2.3.x, 2.4.x, 3.0.x, and 3.1.x.
 
-Spark OCR started to support Tensorflow models. First model is [VisualDocumentClassifier](ocr_pipeline_components#visualdocumentclassifier).
+Spark OCR started to support Tensorflow models. First model is [VisualDocumentClassifier](/docs/en/ocr_pipeline_components#visualdocumentclassifier).
 
 #### New Features
 
@@ -44,7 +44,7 @@ Spark OCR started to support Tensorflow models. First model is [VisualDocumentCl
 * Support 2x new EMR 6.x: 
   * EMR 6.1.0 (Apache Spark 3.0.0 / Hadoop 3.2.1)
   * EMR 6.2.0 (Apache Spark 3.0.1 / Hadoop 3.2.1)
-* [VisualDocumentClassifier](ocr_pipeline_components#visualdocumentclassifier) model for classification documents using text and layout data.
+* [VisualDocumentClassifier](/docs/en/ocr_pipeline_components#visualdocumentclassifier) model for classification documents using text and layout data.
 * Added support Vietnamese language.
 
 #### New notebooks

diff --git a/docs/en/spark_ocr_versions/release_notes_3_10_0.md b/docs/en/spark_ocr_versions/release_notes_3_10_0.md
@@ -25,10 +25,10 @@ Form recognition using LayoutLMv2 and text detection.
 
 #### New Features
 
-* Added [VisualDocumentNERv2](ocr_visual_document_understanding#visualdocumentnerv2) transformer
-* Added DL based [ImageTextDetector](ocr_object_detection#imagetextdetector) transformer
-* Support rotated regions in [ImageSplitRegions](ocr_pipeline_components#imagesplitregions)
-* Support rotated regions in [ImageDrawRegions](ocr_pipeline_components#imagedrawregions)
+* Added [VisualDocumentNERv2](/docs/en/ocr_visual_document_understanding#visualdocumentnerv2) transformer
+* Added DL based [ImageTextDetector](/docs/en/ocr_object_detection#imagetextdetector) transformer
+* Support rotated regions in [ImageSplitRegions](/docs/en/ocr_pipeline_components#imagesplitregions)
+* Support rotated regions in [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions)
 
 
 #### New Models

diff --git a/docs/en/spark_ocr_versions/release_notes_3_11_0.md b/docs/en/spark_ocr_versions/release_notes_3_11_0.md
@@ -25,11 +25,11 @@ This release comes with new models, new features, bug fixes, and notebook exampl
 
 #### New Features
 
-* Added [ImageTextDetectorV2](ocr_object_detection#imagetextdetectorv2) Python Spark-OCR Transformer for detecting printed and handwritten text
+* Added [ImageTextDetectorV2](/docs/en/ocr_object_detection#imagetextdetectorv2) Python Spark-OCR Transformer for detecting printed and handwritten text
  using CRAFT architecture with Refiner Net.
-* Added [ImageTextRecognizerV2](ocr_pipeline_components#imagetotextv2) Python Spark-OCR Transformer for recognizing
+* Added [ImageTextRecognizerV2](/docs/en/ocr_pipeline_components#imagetotextv2) Python Spark-OCR Transformer for recognizing
  printed and handwritten text based on Deep Learning Transformer Architecture.
-* Added [FormRelationExtractor](ocr_visual_document_understanding#formrelationextractor) for detecting relations between key and value entities in forms.
+* Added [FormRelationExtractor](/docs/en/ocr_visual_document_understanding#formrelationextractor) for detecting relations between key and value entities in forms.
 * Added the capability of fine tuning VisualDocumentNerV2 models for key-value pairs extraction.
 
 #### New Models

diff --git a/docs/en/spark_ocr_versions/release_notes_3_1_0.md b/docs/en/spark_ocr_versions/release_notes_3_1_0.md
@@ -26,16 +26,16 @@ More details please read in [GPU image preprocessing in Spark OCR](https://mediu
 
 #### New Features
 
-* [GPUImageTransformer](ocr_pipeline_components#gpuimagetransformer) with support: scaling, erosion, delation, Otsu and Huang thresholding.
-* Added [display_images](ocr_structures#displayimages) util function for displaying images from Spark DataFrame in Jupyter notebooks.
+* [GPUImageTransformer](/docs/en/ocr_pipeline_components#gpuimagetransformer) with support: scaling, erosion, delation, Otsu and Huang thresholding.
+* Added [display_images](/docs/en/ocr_structures#displayimages) util function for displaying images from Spark DataFrame in Jupyter notebooks.
 
 #### Enhancements
 
-* Improve [display_image](ocr_structures#displayimage) util function.
+* Improve [display_image](/docs/en/ocr_structures#displayimage) util function.
 
 #### Bug fixes
 
-* Fixed issue with extra dependencies in [start](ocr_install#using-start-function) function
+* Fixed issue with extra dependencies in [start](/docs/en/ocr_install#using-start-function) function
 
 #### New notebooks
 

diff --git a/docs/en/spark_ocr_versions/release_notes_3_2_0.md b/docs/en/spark_ocr_versions/release_notes_3_2_0.md
@@ -26,7 +26,7 @@ including form understanding and receipt understanding.
 
 #### New Features
 
-* [VisualDocumentNER](ocr_pipeline_components#visualdocumentner) is a DL model for NER problem using text and layout data.
+* [VisualDocumentNER](/docs/en/ocr_pipeline_components#visualdocumentner) is a DL model for NER problem using text and layout data.
   Currently available pre-trained model on the SROIE dataset.
 
 

diff --git a/docs/en/spark_ocr_versions/release_notes_3_3_0.md b/docs/en/spark_ocr_versions/release_notes_3_3_0.md
@@ -29,9 +29,9 @@ More details please read in [Table Detection & Extraction in Spark OCR](https://
 
 #### New Features
 
-* [ImageTableDetector](ocr_table_recognition#imagetabledetector) is a DL model for detect tables on the image.
-* [ImageTableCellDetector](ocr_table_recognition#imagetablecelldetector) is a transformer for detect regions of cells in the table image.
-* [ImageCellsToTextTable](ocr_table_recognition#imagecellstotexttable) is a transformer for extract text from the detected cells.
+* [ImageTableDetector](/docs/en/ocr_table_recognition#imagetabledetector) is a DL model for detect tables on the image.
+* [ImageTableCellDetector](/docs/en/ocr_table_recognition#imagetablecelldetector) is a transformer for detect regions of cells in the table image.
+* [ImageCellsToTextTable](/docs/en/ocr_table_recognition#imagecellstotexttable) is a transformer for extract text from the detected cells.
 
 #### New notebooks
 

diff --git a/docs/en/spark_ocr_versions/release_notes_3_4_0.md b/docs/en/spark_ocr_versions/release_notes_3_4_0.md
@@ -25,7 +25,7 @@ More details please read in [Signature Detection in Spark OCR](https://medium.co
 
 #### New Features
 
-* [ImageSignatureDetector](ocr_object_detection#imagehandwrittendetector) is a DL model for detecting signature on the image.
+* [ImageSignatureDetector](/docs/en/ocr_object_detection#imagehandwrittendetector) is a DL model for detecting signature on the image.
 
 
 #### New notebooks

diff --git a/docs/en/spark_ocr_versions/release_notes_3_5_0.md b/docs/en/spark_ocr_versions/release_notes_3_5_0.md
@@ -26,15 +26,15 @@ More details please read in [Extract Tabular Data from PDF in Spark OCR](https:/
 
 #### New Features
 
-* Added new method to [ImageTableCellDetector](ocr_table_recognition#imagetablecelldetector) which support 
+* Added new method to [ImageTableCellDetector](/docs/en/ocr_table_recognition#imagetablecelldetector) which support 
 borderless tables and combined tables.
 * Added __Wolf__ and __Singh__ adaptive binarization methods to the [ImageAdaptiveThresholding](ocr_pipeline_components#imageadaptivethresholding).
 
 
 #### Enhancements
 
-* Added possibility to use different type of images as input for [ImageTableDetector](ocr_table_recognition#imagetabledetector).
-* Added [display_pdf](ocr_structures#displaypdf) and [display_images_horizontal](ocr_structures#displayimageshorizontal) util functions.
+* Added possibility to use different type of images as input for [ImageTableDetector](/docs/en/ocr_table_recognition#imagetabledetector).
+* Added [display_pdf](/docs/en/ocr_structures#displaypdf) and [display_images_horizontal](/docs/en/ocr_structures#displayimageshorizontal) util functions.
 
 #### New notebooks
 

diff --git a/docs/en/spark_ocr_versions/release_notes_3_6_0.md b/docs/en/spark_ocr_versions/release_notes_3_6_0.md
@@ -24,10 +24,10 @@ Handwritten detection and visualization improvement.
 
 #### New Features
 
-* Added [ImageHandwrittenDetector](ocr_object_detection#imagehandwrittendetector) for detecting 'signature', 'date', 'name',
+* Added [ImageHandwrittenDetector](/docs/en/ocr_object_detection#imagehandwrittendetector) for detecting 'signature', 'date', 'name',
  'title', 'address' and others handwritten text.
-* Added rendering labels and scores in [ImageDrawRegions](ocr_pipeline_components#imagedrawregions).
-* Added possibility to scale image to fixed size in [ImageScaler](ocr_pipeline_components#imagescaler)
+* Added rendering labels and scores in [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions).
+* Added possibility to scale image to fixed size in [ImageScaler](/docs/en/ocr_pipeline_components#imagescaler)
  with keeping original ratio.