From b7caeecc9ef8bdaaa5bae4aa0a38777f2ddcb0cf Mon Sep 17 00:00:00 2001 From: nitink Date: Tue, 6 Feb 2024 15:19:48 +0530 Subject: [PATCH 1/3] Updated ImageToTextV2 documentation --- docs/en/ocr_pipeline_components.md | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/docs/en/ocr_pipeline_components.md b/docs/en/ocr_pipeline_components.md index 9277022fde..db8a90a548 100644 --- a/docs/en/ocr_pipeline_components.md +++ b/docs/en/ocr_pipeline_components.md @@ -3211,6 +3211,8 @@ others. One could almost say they feed on and grow on ideas. `ImageToTextV2` can work on CPU, but GPU is preferred in order to achieve acceptable performance. +`ImageToTextV2` can be used with caching enabled. + `ImageToTextV2` can receive regions representing single line texts, or regions coming from a text detection model.
@@ -3221,6 +3223,7 @@ others. One could almost say they feed on and grow on ideas. | Param name | Type | Default | Column Data Description | | --- | --- | --- | --- | | inputCols | Array[string] | [image] | Can use as input image struct ([Image schema](ocr_structures#image-schema)) and regions. | +| regionsColumn | string | regions | Input column containing regions to be processed. |
@@ -3232,6 +3235,14 @@ others. One could almost say they feed on and grow on ideas. | lineTolerance | integer | 15 | Line tolerance in pixels. It's used for grouping text regions by lines. | | borderWidth | integer | 5 | A value of more than 0 enables to border text regions with width equal to the value of the parameter. | | spaceWidth | integer | 10 | A value of more than 0 enables to add white spaces between words on the image. | +| limitMultiplier | float | 1.5 | Used to control the length of the final output text ,a higher value will result in longer text sequence if available. Defaults to 1.5 | +| maxImageRatio | float | 11.25 | Value for the width/height ratio of images that are fed to the model. Large values reduce inference time, but may cause the model to diverge. Defaults to 11.25. | +| groupImages | boolean | True | Whether to group images to maximize detection quality or not. | +| batchSize | integer | 3 | Number of text patches to feed the model at the same time. | +| taskParallelism | integer | 8 | How many threads to use when processing a single region. | +| useGPU | boolean | False | Enable to use GPU. | +| useCaching | boolean | True | Enable to use caching. | +| keepInput | boolean | True | Enable to preserve input column. |
@@ -3240,7 +3251,9 @@ others. One could almost say they feed on and grow on ideas. {:.table-model-big} | Param name | Type | Default | Column Data Description | | --- | --- | --- | --- | -| outputCol | string | text | Recognized text | +| outputCol | string | text | Recognized text. | +| positionsCol | string | positions | Position Col. | +| outputFormat | Enum | OcrOutputFormat.TEXT | Return output type. | **Example:** @@ -3251,6 +3264,7 @@ others. One could almost say they feed on and grow on ideas. ```python from pyspark.ml import PipelineModel from sparkocr.transformers import * +from sparkocr.enums import * imagePath = "path to image" @@ -3271,7 +3285,11 @@ text_detector = ImageTextDetectorV2 \ .setSizeThreshold(20) ocr = ImageToTextV2.pretrained("ocr_base_printed", "en", "clinical/ocr") \ - .setInputCols(["image", "text_regions"]) \ + .setInputCols(["image"]) \ + .setRegionsColumn("text_regions") \ + .setUseGPU(True) \ + .setUseCaching(True) \ + .setOutputFormat(OcrOutputFormat.TEXT) \ .setOutputCol("text") # Define pipeline From cd0a52685914108c4276063f86ddab1333bfda6c Mon Sep 17 00:00:00 2001 From: nitink Date: Tue, 6 Feb 2024 15:28:57 +0530 Subject: [PATCH 2/3] updates ImageToTextV2 --- docs/en/ocr_pipeline_components.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/ocr_pipeline_components.md b/docs/en/ocr_pipeline_components.md index db8a90a548..ea19f6cd4f 100644 --- a/docs/en/ocr_pipeline_components.md +++ b/docs/en/ocr_pipeline_components.md @@ -3235,8 +3235,8 @@ others. One could almost say they feed on and grow on ideas. | lineTolerance | integer | 15 | Line tolerance in pixels. It's used for grouping text regions by lines. | | borderWidth | integer | 5 | A value of more than 0 enables to border text regions with width equal to the value of the parameter. | | spaceWidth | integer | 10 | A value of more than 0 enables to add white spaces between words on the image. | -| limitMultiplier | float | 1.5 | Used to control the length of the final output text ,a higher value will result in longer text sequence if available. Defaults to 1.5 | -| maxImageRatio | float | 11.25 | Value for the width/height ratio of images that are fed to the model. Large values reduce inference time, but may cause the model to diverge. Defaults to 11.25. | +| limitMultiplier | float | 1.5 | Used to control the length of the final output text ,a higher value will result in longer text sequence if available. | +| maxImageRatio | float | 11.25 | Value for the width/height ratio of images that are fed to the model. Large values reduce inference time, but may cause the model to diverge. | | groupImages | boolean | True | Whether to group images to maximize detection quality or not. | | batchSize | integer | 3 | Number of text patches to feed the model at the same time. | | taskParallelism | integer | 8 | How many threads to use when processing a single region. | From a29fe9d0ee4e0fdf96402c9ef77a7b6878591c9a Mon Sep 17 00:00:00 2001 From: Nitin Kumar <72322393+nogifeet@users.noreply.github.com> Date: Wed, 14 Feb 2024 17:00:35 +0530 Subject: [PATCH 3/3] Update ocr_pipeline_components.md Remove edge case param limitMultiplier --- docs/en/ocr_pipeline_components.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/en/ocr_pipeline_components.md b/docs/en/ocr_pipeline_components.md index ea19f6cd4f..873bf9dcd9 100644 --- a/docs/en/ocr_pipeline_components.md +++ b/docs/en/ocr_pipeline_components.md @@ -3235,7 +3235,6 @@ others. One could almost say they feed on and grow on ideas. | lineTolerance | integer | 15 | Line tolerance in pixels. It's used for grouping text regions by lines. | | borderWidth | integer | 5 | A value of more than 0 enables to border text regions with width equal to the value of the parameter. | | spaceWidth | integer | 10 | A value of more than 0 enables to add white spaces between words on the image. | -| limitMultiplier | float | 1.5 | Used to control the length of the final output text ,a higher value will result in longer text sequence if available. | | maxImageRatio | float | 11.25 | Value for the width/height ratio of images that are fed to the model. Large values reduce inference time, but may cause the model to diverge. | | groupImages | boolean | True | Whether to group images to maximize detection quality or not. | | batchSize | integer | 3 | Number of text patches to feed the model at the same time. | @@ -4409,4 +4408,4 @@ Output: ``` -
\ No newline at end of file +