Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_ocr tests fails #8931

Closed
2 tasks done
mcepl opened this issue Mar 15, 2023 · 14 comments
Closed
2 tasks done

test_ocr tests fails #8931

mcepl opened this issue Mar 15, 2023 · 14 comments
Assignees
Labels
enhancement Adding or requesting a new feature.
Milestone

Comments

@mcepl
Copy link

mcepl commented Mar 15, 2023

Describe the issue

When packaging weblate 4.14.2 for openSUSE test test_ocr fails almost always (more in the build server than on my workstation, but it is in both close to 100%):

[ 9019s] ======================================================================
[ 9019s] FAIL: test_ocr (weblate.screenshots.tests.ViewTest)
[ 9019s] ----------------------------------------------------------------------
[ 9019s] Traceback (most recent call last):
[ 9019s]   File "/home/abuild/rpmbuild/BUILD/Weblate-4.14.2/weblate/screenshots/tests.py", line 170, in test_ocr
[ 9019s]     self.assertIn('<a class="add-string', data["results"])
[ 9019s] AssertionError: '<a class="add-string' not found in '\n\n\n\n\n\n\n\n\n\n\n\n<table class="table table-condensed">\n  <thead>\n    <tr>\n      <th class="unit-state-cell"></th>\n      \n      \n      \n      \n      \n        \n          <th>\n        \n          English\n        \n        </th>\n      \n      \n        <th>Location</th>\n      \n      \n        <th>Assigned screenshots</th>\n      \n      \n        <th>Actions</th>\n      \n    </tr>\n  </thead>\n  \n    <tbody id="loading-screenshots" style="display:none;">\n      <tr><td colspan="5"><span class="icon-spin"  ><svg width="24" height="24" version="1.1" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M12,4V2A10,10 0 0,0 2,12H4A8,8 0 0,1 12,4Z"/></svg></span> Loading results…</td></tr>\n    </tbody>\n  \n  <tbody class="unit-listing-body">\n\n    \n      <tr class="warning">\n        <td colspan="5"><em>No matching strings found.</em></td>\n      </tr>\n    \n\n  </tbody>\n</table>\n\n'
[ 9019s] 

Complete build log with the list of all packages used and all steps taken to reproduce.

I already tried

  • I've read and searched the documentation.
  • I've searched for similar issues in this repository.

Steps to reproduce the behavior

  1. Run the test suite (see the attached log)

Expected behavior

All test passing

Screenshots

test_ocr fails almost always

Exception traceback

[ 9019s] ======================================================================
[ 9019s] FAIL: test_ocr (weblate.screenshots.tests.ViewTest)
[ 9019s] ----------------------------------------------------------------------
[ 9019s] Traceback (most recent call last):
[ 9019s]   File "/home/abuild/rpmbuild/BUILD/Weblate-4.14.2/weblate/screenshots/tests.py", line 170, in test_ocr
[ 9019s]     self.assertIn('<a class="add-string', data["results"])
[ 9019s] AssertionError: '<a class="add-string' not found in '\n\n\n\n\n\n\n\n\n\n\n\n<table class="table table-condensed">\n  <thead>\n    <tr>\n      <th class="unit-state-cell"></th>\n      \n      \n      \n      \n      \n        \n          <th>\n        \n          English\n        \n        </th>\n      \n      \n        <th>Location</th>\n      \n      \n        <th>Assigned screenshots</th>\n      \n      \n        <th>Actions</th>\n      \n    </tr>\n  </thead>\n  \n    <tbody id="loading-screenshots" style="display:none;">\n      <tr><td colspan="5"><span class="icon-spin"  ><svg width="24" height="24" version="1.1" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M12,4V2A10,10 0 0,0 2,12H4A8,8 0 0,1 12,4Z"/></svg></span> Loading results…</td></tr>\n    </tbody>\n  \n  <tbody class="unit-listing-body">\n\n    \n      <tr class="warning">\n        <td colspan="5"><em>No matching strings found.</em></td>\n      </tr>\n    \n\n  </tbody>\n</table>\n\n'
[ 9019s]

How do you run Weblate?

Other

Weblate versions

No response

Weblate deploy checks

No response

Additional context

No response

@nijel
Copy link
Member

nijel commented Mar 17, 2023

Do you have tesseract data installed on the test system? Maybe https://forums.opensuse.org/t/tesseract-ocr-wrong-data-directory/164659/9 is related?

@nijel nijel added the question This is more a question for the support than an issue. label Mar 17, 2023
@github-actions
Copy link

This issue looks more like a support question than an issue. We strive to answer these reasonably fast, but purchasing the support subscription is not only more responsible and faster for your business but also makes Weblate stronger.

In case your question is already answered, making a donation is the right way to say thank you!

@mcepl
Copy link
Author

mcepl commented Mar 17, 2023

Do you have tesseract data installed on the test system? Maybe https://forums.opensuse.org/t/tesseract-ocr-wrong-data-directory/164659/9 is related?

Hmm, it is in /usr/share/tessdata/eng.traineddata, that should be correct, shouldn’t it?

@github-actions
Copy link

This issue has been automatically marked as stale because there wasn’t any recent activity.

It will be closed soon if no further action occurs.

Thank you for your contributions!

@github-actions github-actions bot added the wontfix Nobody will work on this. label Mar 28, 2023
@mcepl
Copy link
Author

mcepl commented Mar 28, 2023

This is not a support question! Does failure of your test suite not matter at all?

@github-actions github-actions bot removed the wontfix Nobody will work on this. label Mar 29, 2023
@nijel
Copy link
Member

nijel commented Mar 30, 2023

I think it's tesseract not working on your system, that's why I've tagged it this way.

@mcepl
Copy link
Author

mcepl commented Mar 30, 2023

What do you mean by “tesseract not working” for the insides of our build system?

@nijel
Copy link
Member

nijel commented Mar 30, 2023

It's not recognizing the text. It can be caused by wrongly installed data files, missing dependency on the tesseract data, or something else.

This test has been there for years (introduced in df4a52a), it worked before on SUSE, it works anywhere else so far. All that makes me think the issue is in your environment, and not in Weblate.

@mcepl
Copy link
Author

mcepl commented Mar 30, 2023

Is there some command line command or something which could show me more what tesseract thinks is wrong with the setup?

@nijel
Copy link
Member

nijel commented Mar 30, 2023

tesseract weblate/trans/tests/data/screenshot.png - in Weblate sources should produce something like:

$ tesseract weblate/trans/tests/data/screenshot.png  -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 133
Source string

Hello, world!+

One
Orangutan has %d banana.«!

Other
Orangutan has %d bananas.

Try Weblate at <http://demo.weblate.org/>!«!

Thank you for using Weblate.

Screenshot is shown to add visual context for all listed source strings.

 

@github-actions
Copy link

This issue has been automatically marked as stale because there wasn’t any recent activity.

It will be closed soon if no further action occurs.

Thank you for your contributions!

@github-actions github-actions bot added the wontfix Nobody will work on this. label Apr 10, 2023
@mcepl
Copy link
Author

mcepl commented Apr 12, 2023

That’s exactly (I believe) what I’ve got:

[   63s] + tesseract weblate/trans/tests/data/screenshot.png -
[   63s] Estimating resolution as 133
[   63s] Source string
[   63s] 
[   63s] Hello, world!+
[   63s] 
[   63s] One
[   63s] Orangutan has %d banana.<!
[   63s] 
[   63s] Other
[   63s] Orangutan has %d bananas.
[   63s] 
[   63s] Try Weblate at <http://demo.weblate.org/>!«!
[   63s] 
[   63s] Thank you for using Weblate.
[   63s] 
[   63s] Screenshot is shown to add visual context for all listed source strings.
[   63s] 

Complete build log

Actually, not exactly: Orangutan has %d banana.«! became Orangutan has %d banana.<! (a guillemet turning into less-than).

@github-actions github-actions bot removed the wontfix Nobody will work on this. label Apr 13, 2023
@nijel nijel added enhancement Adding or requesting a new feature. and removed question This is more a question for the support than an issue. labels Apr 18, 2023
@nijel nijel self-assigned this Apr 18, 2023
@nijel nijel added this to the 4.18 milestone Apr 18, 2023
@nijel
Copy link
Member

nijel commented Apr 18, 2023

A few different characters should not matter. Anyway, this test is not really showing where the problem is, I will add a separate test for the OCR recognition itself so that the error is easier to diagnose.

@nijel nijel closed this as completed in c551d1a Apr 18, 2023
@github-actions
Copy link

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

@nijel nijel modified the milestones: 4.18, 5.0 Jul 21, 2023
nijel added a commit to nijel/weblate that referenced this issue Jul 21, 2023
- use different resolutions instead of scaling image
- load images directly by tesseract
- use iterator instead of manually doing recognition
- measure performance using Sentry
- improves compatiblity with tesseract 5.x, fixes WeblateOrg#8931
nijel added a commit that referenced this issue Jul 21, 2023
- use different resolutions instead of scaling image
- load images directly by tesseract
- use iterator instead of manually doing recognition
- measure performance using Sentry
- improves compatiblity with tesseract 5.x, fixes #8931
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants