Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024 02 06 image to text v2 updates #943

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions docs/en/ocr_pipeline_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -3211,6 +3211,8 @@ others. One could almost say they feed on and grow on ideas.

`ImageToTextV2` can work on CPU, but GPU is preferred in order to achieve acceptable performance.

`ImageToTextV2` can be used with caching enabled.

`ImageToTextV2` can receive regions representing single line texts, or regions coming from a text detection model.

</div><div class="h3-box" markdown="1">
Expand All @@ -3221,6 +3223,7 @@ others. One could almost say they feed on and grow on ideas.
| Param name | Type | Default | Column Data Description |
| --- | --- | --- | --- |
| inputCols | Array[string] | [image] | Can use as input image struct ([Image schema](ocr_structures#image-schema)) and regions. |
| regionsColumn | string | regions | Input column containing regions to be processed. |

</div><div class="h3-box" markdown="1">

Expand All @@ -3232,6 +3235,13 @@ others. One could almost say they feed on and grow on ideas.
| lineTolerance | integer | 15 | Line tolerance in pixels. It's used for grouping text regions by lines. |
| borderWidth | integer | 5 | A value of more than 0 enables to border text regions with width equal to the value of the parameter. |
| spaceWidth | integer | 10 | A value of more than 0 enables to add white spaces between words on the image. |
| maxImageRatio | float | 11.25 | Value for the width/height ratio of images that are fed to the model. Large values reduce inference time, but may cause the model to diverge. |
| groupImages | boolean | True | Whether to group images to maximize detection quality or not. |
| batchSize | integer | 3 | Number of text patches to feed the model at the same time. |
| taskParallelism | integer | 8 | How many threads to use when processing a single region. |
| useGPU | boolean | False | Enable to use GPU. |
| useCaching | boolean | True | Enable to use caching. |
| keepInput | boolean | True | Enable to preserve input column. |

</div><div class="h3-box" markdown="1">

Expand All @@ -3240,7 +3250,9 @@ others. One could almost say they feed on and grow on ideas.
{:.table-model-big}
| Param name | Type | Default | Column Data Description |
| --- | --- | --- | --- |
| outputCol | string | text | Recognized text |
| outputCol | string | text | Recognized text. |
| positionsCol | string | positions | Position Col. |
| outputFormat | Enum | OcrOutputFormat.TEXT | Return output type. |

**Example:**

Expand All @@ -3251,6 +3263,7 @@ others. One could almost say they feed on and grow on ideas.
```python
from pyspark.ml import PipelineModel
from sparkocr.transformers import *
from sparkocr.enums import *

imagePath = "path to image"

Expand All @@ -3271,7 +3284,11 @@ text_detector = ImageTextDetectorV2 \
.setSizeThreshold(20)

ocr = ImageToTextV2.pretrained("ocr_base_printed", "en", "clinical/ocr") \
.setInputCols(["image", "text_regions"]) \
.setInputCols(["image"]) \
.setRegionsColumn("text_regions") \
.setUseGPU(True) \
.setUseCaching(True) \
.setOutputFormat(OcrOutputFormat.TEXT) \
.setOutputCol("text")

# Define pipeline
Expand Down Expand Up @@ -4391,4 +4408,4 @@ Output:

```

</div>
</div>