Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use text instead of OCR to get live transcribe content #9

Conversation

alexschneider
Copy link

@alexschneider alexschneider commented Nov 23, 2021

I noticed OCR was particularly buggy when trying to use this service listening to podcasts. I also noticed that you can actually get the text content of the live transcribe window using the accessibility service, so I just did that and took the last 10 or so words. That seemed to be enough/not too much in some limited testing to keep up with scrolling but there's probably some more tuning that could be done.

This will partially fix #8 too (at least as far as it can be fixed from this service's side - live caption still has to support the languages) since we're not depending on OCR anymore.

@alexschneider alexschneider force-pushed the alexschneider/use-text-instead-of-ocr branch from 11076b1 to 8c3e702 Compare November 23, 2021 04:45
@alexschneider alexschneider changed the title Use text intsead of OCR to get live transcribe content Use text instead of OCR to get live transcribe content Nov 23, 2021
@alexschneider alexschneider force-pushed the alexschneider/use-text-instead-of-ocr branch from e4de345 to 1dab4f9 Compare November 23, 2021 05:25
KyleFin added a commit that referenced this pull request Nov 24, 2021
@KyleFin
Copy link
Owner

KyleFin commented Nov 24, 2021

This is fantastic! Thanks @alexschneider! This is exactly what I was hoping for, but I failed to find how to do it.

I re-wrote the same in #11 (Please review and add any comments if you want.) The main things I changed:

  • Truncate caption text before sending it to scrollToText. (I was amazed how long the text is allowed to get! This is awesome for finding a good match, but we don't need to split then discard those huge stings.)
  • More thorough removal of OCR (gradle / accessibility_service_config.xml / AndroidManifest)
  • Added comments about numCaptionCharsToLookAt and captionViewScrollsThreshold.

Out of curiosity, could you describe the issues you saw with OCR and podcasts? I haven't noticed any OCR issues in my testing.

I wasn't expecting anyone to find/use this yet, especially since I haven't added any instructions. Thanks for checking it out and for the great contribution! I'm happy to hear any other feedback about your experience or suggestions. Thanks!

@alexschneider
Copy link
Author

Hey Kyle,

Thanks for the improvement. I'll review it separately on your PR.

The main issue I found is that often the longest word tended to be a mistranscription with OCR - it was garbled with the background text. Perhaps its because my Live Caption window is set up to be semitransparent. I didn't really debug it because I quickly found out that you can just get the text straight from the live caption window and that's a better solution than doing OCR. If you're really interested I could see if I could try replicating the issue but it'd probably have to wait until after Thanksgiving.

I definitely appreciate you writing this as it's something I've been meaning to write myself for years (in various iterations - the last one was a python webview utilizing Aeneas - a long audio aligner) but never got very far. As a deaf person I do really enjoy audio content but I need to pair it with a transcript in order to enjoy it fully, so something like this makes listening to podcasts much less tedious 😄.

The main feedback or suggestions I have are:

  • Highlighting current position would be nice (already mentioned in Improve user notifications #4 I think)
  • Figure out a way to hide the live caption window. Not even sure if this is technically possible
  • Figure out a way to replicate the functionality offline - probably just involves finding an app that can save websites and show them in a compatible web view Communicate to users which apps are supported #6

@KyleFin
Copy link
Owner

KyleFin commented Nov 29, 2021

Ah transparent background makes sense. I only tested with a black background.

Great suggestions! I added a note about hiding the live caption window and totally agree about highlighting current position. I was able to download a podcast webpage in Chrome and have it scroll offline.

I'm glad you find this project useful too! Thanks!

KyleFin added a commit that referenced this pull request Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for non-English languages
2 participants