Use text instead of OCR to get live transcribe content #9

alexschneider · 2021-11-23T04:39:51Z

I noticed OCR was particularly buggy when trying to use this service listening to podcasts. I also noticed that you can actually get the text content of the live transcribe window using the accessibility service, so I just did that and took the last 10 or so words. That seemed to be enough/not too much in some limited testing to keep up with scrolling but there's probably some more tuning that could be done.

This will partially fix #8 too (at least as far as it can be fixed from this service's side - live caption still has to support the languages) since we're not depending on OCR anymore.

KyleFin · 2021-11-24T05:52:38Z

This is fantastic! Thanks @alexschneider! This is exactly what I was hoping for, but I failed to find how to do it.

I re-wrote the same in #11 (Please review and add any comments if you want.) The main things I changed:

Truncate caption text before sending it to scrollToText. (I was amazed how long the text is allowed to get! This is awesome for finding a good match, but we don't need to split then discard those huge stings.)
More thorough removal of OCR (gradle / accessibility_service_config.xml / AndroidManifest)
Added comments about numCaptionCharsToLookAt and captionViewScrollsThreshold.

Out of curiosity, could you describe the issues you saw with OCR and podcasts? I haven't noticed any OCR issues in my testing.

I wasn't expecting anyone to find/use this yet, especially since I haven't added any instructions. Thanks for checking it out and for the great contribution! I'm happy to hear any other feedback about your experience or suggestions. Thanks!

alexschneider · 2021-11-24T16:02:37Z

Hey Kyle,

Thanks for the improvement. I'll review it separately on your PR.

The main issue I found is that often the longest word tended to be a mistranscription with OCR - it was garbled with the background text. Perhaps its because my Live Caption window is set up to be semitransparent. I didn't really debug it because I quickly found out that you can just get the text straight from the live caption window and that's a better solution than doing OCR. If you're really interested I could see if I could try replicating the issue but it'd probably have to wait until after Thanksgiving.

I definitely appreciate you writing this as it's something I've been meaning to write myself for years (in various iterations - the last one was a python webview utilizing Aeneas - a long audio aligner) but never got very far. As a deaf person I do really enjoy audio content but I need to pair it with a transcript in order to enjoy it fully, so something like this makes listening to podcasts much less tedious 😄.

The main feedback or suggestions I have are:

Highlighting current position would be nice (already mentioned in Improve user notifications #4 I think)
Figure out a way to hide the live caption window. Not even sure if this is technically possible
Figure out a way to replicate the functionality offline - probably just involves finding an app that can save websites and show them in a compatible web view Communicate to users which apps are supported #6

KyleFin · 2021-11-29T23:13:09Z

Ah transparent background makes sense. I only tested with a black background.

Great suggestions! I added a note about hiding the live caption window and totally agree about highlighting current position. I was able to download a podcast webpage in Chrome and have it scroll offline.

I'm glad you find this project useful too! Thanks!

See alexschneider's #9

Use text intsead of OCR to get live transcribe content

8c3e702

alexschneider force-pushed the alexschneider/use-text-instead-of-ocr branch from 11076b1 to 8c3e702 Compare November 23, 2021 04:45

alexschneider changed the title ~~Use text intsead of OCR to get live transcribe content~~ Use text instead of OCR to get live transcribe content Nov 23, 2021

Remove OCR reference

1dab4f9

alexschneider force-pushed the alexschneider/use-text-instead-of-ocr branch from e4de345 to 1dab4f9 Compare November 23, 2021 05:25

KyleFin added a commit that referenced this pull request Nov 24, 2021

Remove OCR (HUGE thanks to alexschneider's #9 !)

6dc7702

alexschneider closed this Nov 24, 2021

KyleFin mentioned this pull request Nov 29, 2021

Improve user notifications #4

Open

KyleFin mentioned this pull request Nov 29, 2021

Remove OCR (HUGE thanks to alexschneider!) #11

Merged

KyleFin added a commit that referenced this pull request Nov 29, 2021

Remove OCR (HUGE thanks to alexschneider!) (#11)

fd8a21a

See alexschneider's #9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use text instead of OCR to get live transcribe content #9

Use text instead of OCR to get live transcribe content #9

alexschneider commented Nov 23, 2021 •

edited

KyleFin commented Nov 24, 2021

alexschneider commented Nov 24, 2021

KyleFin commented Nov 29, 2021

Use text instead of OCR to get live transcribe content #9

Use text instead of OCR to get live transcribe content #9

Conversation

alexschneider commented Nov 23, 2021 • edited

KyleFin commented Nov 24, 2021

alexschneider commented Nov 24, 2021

KyleFin commented Nov 29, 2021

alexschneider commented Nov 23, 2021 •

edited