Skip to content

Commit

Permalink
update fixes (#208)
Browse files Browse the repository at this point in the history
  • Loading branch information
agsfer committed May 11, 2023
1 parent 44de430 commit d82d881
Show file tree
Hide file tree
Showing 24 changed files with 80 additions and 77 deletions.
3 changes: 3 additions & 0 deletions docs/_includes/demomenu.html
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@
<li {% if _section.secheader %} {% for child in _section.secheader %} {% if child.activemenu == "risk_factors" %} class="active" {% endif %} {% endfor %} {% endif %} >
<a href="/risk_factors">Risk Factors</a>
</li>
<li>
<a target="_blank" href="https://demo.johnsnowlabs.com/healthcare/MODELS/">Explore Healthcare NLP Models</a>
</li>
</ul>
<ul class="acc-top"><li class="acc-top-item">Voice of Patients<i class="fas fa-chevron-down"></i></li></ul>
<ul class="acc-body">
Expand Down
4 changes: 2 additions & 2 deletions docs/_includes/head.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
<!-- End Google Tag Manager -->
<!-- Google tag (gtag.js) Additional adding by Andrew -->

<script async src=https://www.googletagmanager.com/gtag/js?id=UA-70312582-1></script>
<!-- <script async src=https://www.googletagmanager.com/gtag/js?id=UA-70312582-1></script>
<script>
Expand All @@ -25,7 +25,7 @@
gtag('config', 'UA-70312582-1');
</script>
</script> -->

{%- include snippets/get-article-pagetitle.html article=page -%}
{%- assign _article_pagetitle = __return -%}
Expand Down
12 changes: 6 additions & 6 deletions docs/en/spark_ocr_versions/release_notes_1_11_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,22 @@ Release date: 25-02-2021
#### Overview

Support German, French, Spanish and Russian languages.
Improving [PositionsFinder](ocr_pipeline_components#positionsfinder) and ImageToText for better support de-identification.
Improving [PositionsFinder](/docs/en/ocr_pipeline_components#positionsfinder) and ImageToText for better support de-identification.

#### New Features

* Loading model data from S3 in [ImageToText](ocr_pipeline_components#imagetotext).
* Added support German, French, Spanish, Russian languages in [ImageToText](ocr_pipeline_components#imagetotext).
* Added different OCR model types: Base, Best, Fast in [ImageToText](ocr_pipeline_components#imagetotext).
* Loading model data from S3 in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
* Added support German, French, Spanish, Russian languages in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
* Added different OCR model types: Base, Best, Fast in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).

#### Enhancements

* Added spaces symbols to the output positions in the [ImageToText](ocr_pipeline_components#imagetotext) transformer.
* Added spaces symbols to the output positions in the [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) transformer.
* Eliminate python-levensthein from dependencies for simplify installation.

#### Bugfixes

* Fixed issue with extracting coordinates in in [ImageToText](ocr_pipeline_components#imagetotext).
* Fixed issue with extracting coordinates in in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
* Fixed loading model data on cluster in yarn mode.

#### New notebooks
Expand Down
12 changes: 6 additions & 6 deletions docs/en/spark_ocr_versions/release_notes_1_2_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,15 @@ Improved support Databricks and processing selectable pdfs.
#### Enhancements

* Adapted Spark OCR for run on Databricks.
* Added rewriting positions in [ImageToText](../ocr_pipeline_components#imagetotext) when run together with PdfToText.
* Added 'positionsCol' param to [ImageToText](../ocr_pipeline_components#imagetotext).
* Improved support Spark NLP. Changed [start](../ocr_install#using-start-function) function.
* Added rewriting positions in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) when run together with PdfToText.
* Added 'positionsCol' param to [ImageToText](/docs/en/ocr_pipeline_components#imagetotext).
* Improved support Spark NLP. Changed [start](/docs/en/ocr_install#using-start-function) function.

#### New Features

* Added [showImage](../ocr_structures#showimages) implicit to Dataframe for display images in Scala Databricks notebooks.
* Added [display_images](../ocr_structures#display_images) function for display images in Python Databricks notebooks.
* Added propagation selectable pdf file in [TextToPdf](../ocr_pipeline_components#texttopdf). Added 'inputContent' param to 'TextToPdf'.
* Added [showImage](/docs/en/ocr_structures#showimages) implicit to Dataframe for display images in Scala Databricks notebooks.
* Added [display_images](/docs/en/ocr_structures#display_images) function for display images in Python Databricks notebooks.
* Added propagation selectable pdf file in [TextToPdf](/docs/en/ocr_pipeline_components#texttopdf). Added 'inputContent' param to 'TextToPdf'.


</div><div class="prev_ver h3-box" markdown="1">
Expand Down
4 changes: 2 additions & 2 deletions docs/en/spark_ocr_versions/release_notes_1_3_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ New functionality for de-identification problem.
#### New Features

* Support storing for binaryFormat. Added support storing Image and PDF files.
* Support selectable pdf for [TextToPdf](ocr_pipeline_components#texttopdf) transformer.
* Added [UpdateTextPosition](ocr_pipeline_components#updatetextposition) transformer.
* Support selectable pdf for [TextToPdf](/docs/en/ocr_pipeline_components#texttopdf) transformer.
* Added [UpdateTextPosition](/docs/en/ocr_pipeline_components#updatetextposition) transformer.


</div><div class="prev_ver h3-box" markdown="1">
Expand Down
8 changes: 4 additions & 4 deletions docs/en/spark_ocr_versions/release_notes_1_4_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,20 @@ Added support Dicom format and improved support image morphological operations.

#### Enhancements

* Updated [start](ocr_install#using-start-function) function. Improved support Spark NLP internal.
* Updated [start](/docs/en/ocr_install#using-start-function) function. Improved support Spark NLP internal.
* `ImageMorphologyOpening` and `ImageErosion` are removed.
* Improved existing transformers for support de-identification Dicom documents.
* Added possibility to draw filled rectangles to [ImageDrawRegions](ocr_pipeline_components#imagedrawregions).
* Added possibility to draw filled rectangles to [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions).

#### New Features

* Support reading and writing Dicom documents.
* Added [ImageMorphologyOperation](ocr_pipeline_components#imagemorphologyoperation) transformer which support:
* Added [ImageMorphologyOperation](/docs/en/ocr_pipeline_components#imagemorphologyoperation) transformer which support:
erosion, dilation, opening and closing operations.

#### Bugfixes

* Fixed issue in [ImageToText](ocr_pipeline_components#imagetotext) related to extraction coordinates.
* Fixed issue in [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) related to extraction coordinates.


</div><div class="prev_ver h3-box" markdown="1">
Expand Down
2 changes: 1 addition & 1 deletion docs/en/spark_ocr_versions/release_notes_1_5_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ FoundationOne report parsing support.

#### New Features

* Added [FoundationOneReportParser](ocr_pipeline_components#foundationonereportparser) which support parsing patient info,
* Added [FoundationOneReportParser](/docs/en/ocr_pipeline_components#foundationonereportparser) which support parsing patient info,
genomic and biomarker findings.


Expand Down
6 changes: 3 additions & 3 deletions docs/en/spark_ocr_versions/release_notes_1_6_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ Support parsing data from tables for selectable PDFs.

#### New Features

* Added [PdfToTextTable](ocr_pipeline_components#pdftotexttable) transformer for extract tables from Pdf document per each page.
* Added [ImageCropper](ocr_pipeline_components#imagecropper) transformer for crop images.
* Added [ImageBrandsToText](ocr_pipeline_components#imagebrandstotext) transformer for detect text in defined areas.
* Added [PdfToTextTable](/docs/en/ocr_pipeline_components#pdftotexttable) transformer for extract tables from Pdf document per each page.
* Added [ImageCropper](/docs/en/ocr_pipeline_components#imagecropper) transformer for crop images.
* Added [ImageBrandsToText](/docs/en/ocr_pipeline_components#imagebrandstotext) transformer for detect text in defined areas.


</div><div class="prev_ver h3-box" markdown="1">
Expand Down
18 changes: 9 additions & 9 deletions docs/en/spark_ocr_versions/release_notes_1_8_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,22 @@ Support up to 10k pages per document.

#### New Features

* Added [ImageAdaptiveBinarizer](ocr_pipeline_components#imageadaptivebinarizer) Scala transformer with support:
* Added [ImageAdaptiveBinarizer](/docs/en/ocr_pipeline_components#imageadaptivebinarizer) Scala transformer with support:
- Gaussian local thresholding
- Otsu thresholding
- Sauvola local thresholding
* Added possibility to split pdf to small documents for optimize processing in [PdfToImage](ocr_pipeline_components#pdftoimage).
* Added possibility to split pdf to small documents for optimize processing in [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage).


#### Enhancements

* Added applying binarization in [PdfToImage](ocr_pipeline_components#pdftoimage) for optimize memory usage.
* Added `pdfCoordinates` param to the [ImageToText](ocr_pipeline_components#imagetotext) transformer.
* Added 'total_pages' field to the [PdfToImage](ocr_pipeline_components#pdftoimage) transformer.
* Added different splitting strategies to the [PdfToImage](ocr_pipeline_components#pdftoimage) transformer.
* Simplified paging [PdfToImage](ocr_pipeline_components#pdftoimage) when run it with splitting to small PDF.
* Added params to the [PdfToText](ocr_pipeline_components#pdftotext) for disable extra functionality.
* Added `master_url` param to the python [start](ocr_install#using-start-function) function.
* Added applying binarization in [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) for optimize memory usage.
* Added `pdfCoordinates` param to the [ImageToText](/docs/en/ocr_pipeline_components#imagetotext) transformer.
* Added 'total_pages' field to the [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) transformer.
* Added different splitting strategies to the [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) transformer.
* Simplified paging [PdfToImage](/docs/en/ocr_pipeline_components#pdftoimage) when run it with splitting to small PDF.
* Added params to the [PdfToText](/docs/en/ocr_pipeline_components#pdftotext) for disable extra functionality.
* Added `master_url` param to the python [start](/docs/en/ocr_install#using-start-function) function.


</div><div class="prev_ver h3-box" markdown="1">
Expand Down
4 changes: 2 additions & 2 deletions docs/en/spark_ocr_versions/release_notes_1_9_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ Extension of FoundationOne report parser and support HOCR output format.

#### New Features

* Added [ImageToHocr](ocr_pipeline_components#imagetohocr) transformer for recognize text from image and store it to HOCR format.
* Added parsing gene lists from 'Appendix' in [FoundationOneReportParser](ocr_pipeline_components#foundationonereportparser) transformer.
* Added [ImageToHocr](/docs/en/ocr_pipeline_components#imagetohocr) transformer for recognize text from image and store it to HOCR format.
* Added parsing gene lists from 'Appendix' in [FoundationOneReportParser](/docs/en/ocr_pipeline_components#foundationonereportparser) transformer.


</div><div class="prev_ver h3-box" markdown="1">
Expand Down
4 changes: 2 additions & 2 deletions docs/en/spark_ocr_versions/release_notes_3_0_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ We are very excited to release Spark OCR 3.0.0!

Spark OCR 3.0.0 extends the support for Apache Spark 3.0.x and 3.1.x major releases on Scala 2.12 with both Hadoop 2.7. and 3.2. We will support all 4 major Apache Spark and PySpark releases of 2.3.x, 2.4.x, 3.0.x, and 3.1.x.

Spark OCR started to support Tensorflow models. First model is [VisualDocumentClassifier](ocr_pipeline_components#visualdocumentclassifier).
Spark OCR started to support Tensorflow models. First model is [VisualDocumentClassifier](/docs/en/ocr_pipeline_components#visualdocumentclassifier).

#### New Features

Expand All @@ -44,7 +44,7 @@ Spark OCR started to support Tensorflow models. First model is [VisualDocumentCl
* Support 2x new EMR 6.x:
* EMR 6.1.0 (Apache Spark 3.0.0 / Hadoop 3.2.1)
* EMR 6.2.0 (Apache Spark 3.0.1 / Hadoop 3.2.1)
* [VisualDocumentClassifier](ocr_pipeline_components#visualdocumentclassifier) model for classification documents using text and layout data.
* [VisualDocumentClassifier](/docs/en/ocr_pipeline_components#visualdocumentclassifier) model for classification documents using text and layout data.
* Added support Vietnamese language.

#### New notebooks
Expand Down
8 changes: 4 additions & 4 deletions docs/en/spark_ocr_versions/release_notes_3_10_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ Form recognition using LayoutLMv2 and text detection.

#### New Features

* Added [VisualDocumentNERv2](ocr_visual_document_understanding#visualdocumentnerv2) transformer
* Added DL based [ImageTextDetector](ocr_object_detection#imagetextdetector) transformer
* Support rotated regions in [ImageSplitRegions](ocr_pipeline_components#imagesplitregions)
* Support rotated regions in [ImageDrawRegions](ocr_pipeline_components#imagedrawregions)
* Added [VisualDocumentNERv2](/docs/en/ocr_visual_document_understanding#visualdocumentnerv2) transformer
* Added DL based [ImageTextDetector](/docs/en/ocr_object_detection#imagetextdetector) transformer
* Support rotated regions in [ImageSplitRegions](/docs/en/ocr_pipeline_components#imagesplitregions)
* Support rotated regions in [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions)


#### New Models
Expand Down
6 changes: 3 additions & 3 deletions docs/en/spark_ocr_versions/release_notes_3_11_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ This release comes with new models, new features, bug fixes, and notebook exampl

#### New Features

* Added [ImageTextDetectorV2](ocr_object_detection#imagetextdetectorv2) Python Spark-OCR Transformer for detecting printed and handwritten text
* Added [ImageTextDetectorV2](/docs/en/ocr_object_detection#imagetextdetectorv2) Python Spark-OCR Transformer for detecting printed and handwritten text
using CRAFT architecture with Refiner Net.
* Added [ImageTextRecognizerV2](ocr_pipeline_components#imagetotextv2) Python Spark-OCR Transformer for recognizing
* Added [ImageTextRecognizerV2](/docs/en/ocr_pipeline_components#imagetotextv2) Python Spark-OCR Transformer for recognizing
printed and handwritten text based on Deep Learning Transformer Architecture.
* Added [FormRelationExtractor](ocr_visual_document_understanding#formrelationextractor) for detecting relations between key and value entities in forms.
* Added [FormRelationExtractor](/docs/en/ocr_visual_document_understanding#formrelationextractor) for detecting relations between key and value entities in forms.
* Added the capability of fine tuning VisualDocumentNerV2 models for key-value pairs extraction.

#### New Models
Expand Down
8 changes: 4 additions & 4 deletions docs/en/spark_ocr_versions/release_notes_3_1_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,16 @@ More details please read in [GPU image preprocessing in Spark OCR](https://mediu

#### New Features

* [GPUImageTransformer](ocr_pipeline_components#gpuimagetransformer) with support: scaling, erosion, delation, Otsu and Huang thresholding.
* Added [display_images](ocr_structures#displayimages) util function for displaying images from Spark DataFrame in Jupyter notebooks.
* [GPUImageTransformer](/docs/en/ocr_pipeline_components#gpuimagetransformer) with support: scaling, erosion, delation, Otsu and Huang thresholding.
* Added [display_images](/docs/en/ocr_structures#displayimages) util function for displaying images from Spark DataFrame in Jupyter notebooks.

#### Enhancements

* Improve [display_image](ocr_structures#displayimage) util function.
* Improve [display_image](/docs/en/ocr_structures#displayimage) util function.

#### Bug fixes

* Fixed issue with extra dependencies in [start](ocr_install#using-start-function) function
* Fixed issue with extra dependencies in [start](/docs/en/ocr_install#using-start-function) function

#### New notebooks

Expand Down
2 changes: 1 addition & 1 deletion docs/en/spark_ocr_versions/release_notes_3_2_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ including form understanding and receipt understanding.

#### New Features

* [VisualDocumentNER](ocr_pipeline_components#visualdocumentner) is a DL model for NER problem using text and layout data.
* [VisualDocumentNER](/docs/en/ocr_pipeline_components#visualdocumentner) is a DL model for NER problem using text and layout data.
Currently available pre-trained model on the SROIE dataset.


Expand Down
6 changes: 3 additions & 3 deletions docs/en/spark_ocr_versions/release_notes_3_3_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ More details please read in [Table Detection & Extraction in Spark OCR](https://

#### New Features

* [ImageTableDetector](ocr_table_recognition#imagetabledetector) is a DL model for detect tables on the image.
* [ImageTableCellDetector](ocr_table_recognition#imagetablecelldetector) is a transformer for detect regions of cells in the table image.
* [ImageCellsToTextTable](ocr_table_recognition#imagecellstotexttable) is a transformer for extract text from the detected cells.
* [ImageTableDetector](/docs/en/ocr_table_recognition#imagetabledetector) is a DL model for detect tables on the image.
* [ImageTableCellDetector](/docs/en/ocr_table_recognition#imagetablecelldetector) is a transformer for detect regions of cells in the table image.
* [ImageCellsToTextTable](/docs/en/ocr_table_recognition#imagecellstotexttable) is a transformer for extract text from the detected cells.

#### New notebooks

Expand Down
2 changes: 1 addition & 1 deletion docs/en/spark_ocr_versions/release_notes_3_4_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ More details please read in [Signature Detection in Spark OCR](https://medium.co

#### New Features

* [ImageSignatureDetector](ocr_object_detection#imagehandwrittendetector) is a DL model for detecting signature on the image.
* [ImageSignatureDetector](/docs/en/ocr_object_detection#imagehandwrittendetector) is a DL model for detecting signature on the image.


#### New notebooks
Expand Down
6 changes: 3 additions & 3 deletions docs/en/spark_ocr_versions/release_notes_3_5_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,15 @@ More details please read in [Extract Tabular Data from PDF in Spark OCR](https:/

#### New Features

* Added new method to [ImageTableCellDetector](ocr_table_recognition#imagetablecelldetector) which support
* Added new method to [ImageTableCellDetector](/docs/en/ocr_table_recognition#imagetablecelldetector) which support
borderless tables and combined tables.
* Added __Wolf__ and __Singh__ adaptive binarization methods to the [ImageAdaptiveThresholding](ocr_pipeline_components#imageadaptivethresholding).


#### Enhancements

* Added possibility to use different type of images as input for [ImageTableDetector](ocr_table_recognition#imagetabledetector).
* Added [display_pdf](ocr_structures#displaypdf) and [display_images_horizontal](ocr_structures#displayimageshorizontal) util functions.
* Added possibility to use different type of images as input for [ImageTableDetector](/docs/en/ocr_table_recognition#imagetabledetector).
* Added [display_pdf](/docs/en/ocr_structures#displaypdf) and [display_images_horizontal](/docs/en/ocr_structures#displayimageshorizontal) util functions.

#### New notebooks

Expand Down
6 changes: 3 additions & 3 deletions docs/en/spark_ocr_versions/release_notes_3_6_0.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ Handwritten detection and visualization improvement.

#### New Features

* Added [ImageHandwrittenDetector](ocr_object_detection#imagehandwrittendetector) for detecting 'signature', 'date', 'name',
* Added [ImageHandwrittenDetector](/docs/en/ocr_object_detection#imagehandwrittendetector) for detecting 'signature', 'date', 'name',
'title', 'address' and others handwritten text.
* Added rendering labels and scores in [ImageDrawRegions](ocr_pipeline_components#imagedrawregions).
* Added possibility to scale image to fixed size in [ImageScaler](ocr_pipeline_components#imagescaler)
* Added rendering labels and scores in [ImageDrawRegions](/docs/en/ocr_pipeline_components#imagedrawregions).
* Added possibility to scale image to fixed size in [ImageScaler](/docs/en/ocr_pipeline_components#imagescaler)
with keeping original ratio.


Expand Down
Loading

0 comments on commit d82d881

Please sign in to comment.