Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue parsing inverted (white on black) text #112

Open
nhoffman opened this issue May 24, 2024 · 2 comments
Open

Issue parsing inverted (white on black) text #112

nhoffman opened this issue May 24, 2024 · 2 comments

Comments

@nhoffman
Copy link

Hi there - I am looking into parsing laboratory test results (unfortunately results are often received as pdfs), and performance seems to be great except in a very specific context: a report that I'm looking at contains a critical element with white text on a black background. In this case the text is either not detected or read incorrectly. I'm a bit limited in what I can share so this is lacking context, but for example, failure to detect text:

image

Incorrect results:

image image

Any suggestions on settings or pre-processing strategies that might help?

Thanks a lot!

@VikParuchuri
Copy link
Owner

This is a really interesting edge case. I think the challenge is the "mostly regular text with some inverted". Some ideas:

  • Finetune the text detection model with negative examples
  • Flood fill (I think it's called flood fill) from a corner with black (which will just leave the number 36 white), then invert colors and do OCR. Then OCR the normal page. Merge the two results by just blanking out any regions in the normal page where the inverted page has text.

@nhoffman
Copy link
Author

Thanks a lot for the suggestions - I'd love to give the fine tuning approach a shot, but I'm not sure where to start. I know it's a big topic, but can you suggest a) a general resource describing how I would go about fine tuning the text detection model (eg, an overview of the process, how many examples you think might be sufficient, would I provide examples cropped to the white on black text vs providing examples in context); b) in the context of this project, where is the model specified (I assume it downloads a model from huggingface, but I can't seem to find where this configuration is located), and how would I update the the configuration to refer to the fine-tuned model. I'd certainly be happy to document the process for anyone else with a need for something similar.

Thanks a lot for any help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants