[20.09] nixos/test-driver: use a variety of different Tesseract settings for OCR #120452

lukegb · 2021-04-23T20:10:08Z

Motivation for this change

Backport of #120349, to unblock 20.09 channel.

When performing OCR, some of the Tesseract settings perform better than
others on a variety of different workloads, but they mostly take
~negligible incremental time to run compared to the overhead of running
the ImageMagick filters.

After this commit, we try using all three of the current Tesseract
models (classic, LSTM, and classic+LSTM) to generate output text. This
fixes chromium-90's tests at release-20.09, and should make cases where
you're looking for specific text better, with the tradeoff of running
Tesseract multiple times.

To make it sensible to cherrypick this into release-20.09, this doesn't
change the existing API surface for the test driver. In particular,
get_screen_text continues to have the existing behaviour.

(cherry picked from commit 4de343c)

Things done

When performing OCR, some of the Tesseract settings perform better than others on a variety of different workloads, but they mostly take ~negligible incremental time to run compared to the overhead of running the ImageMagick filters. After this commit, we try using all three of the current Tesseract models (classic, LSTM, and classic+LSTM) to generate output text. This fixes chromium-90's tests at release-20.09, and should make cases where you're looking for *specific* text better, with the tradeoff of running Tesseract multiple times. To make it sensible to cherrypick this into release-20.09, this doesn't change the existing API surface for the test driver. In particular, get_screen_text continues to have the existing behaviour. (cherry picked from commit 4de343c)

lukegb · 2021-04-23T20:10:16Z

@ofborg test chromium

lukegb requested a review from tfc as a code owner April 23, 2021 20:10

github-actions bot added 6.topic: nixos 8.has: documentation labels Apr 23, 2021

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 1-10 labels Apr 23, 2021

lukegb merged commit fe6c229 into NixOS:release-20.09 Apr 23, 2021

lukegb deleted the debug-release-2009 branch April 23, 2021 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20.09] nixos/test-driver: use a variety of different Tesseract settings for OCR #120452

[20.09] nixos/test-driver: use a variety of different Tesseract settings for OCR #120452

lukegb commented Apr 23, 2021

lukegb commented Apr 23, 2021

[20.09] nixos/test-driver: use a variety of different Tesseract settings for OCR #120452

[20.09] nixos/test-driver: use a variety of different Tesseract settings for OCR #120452

Conversation

lukegb commented Apr 23, 2021

Motivation for this change

Things done

lukegb commented Apr 23, 2021