New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use text instead of OCR to get live transcribe content #9
Use text instead of OCR to get live transcribe content #9
Conversation
11076b1
to
8c3e702
Compare
e4de345
to
1dab4f9
Compare
This is fantastic! Thanks @alexschneider! This is exactly what I was hoping for, but I failed to find how to do it. I re-wrote the same in #11 (Please review and add any comments if you want.) The main things I changed:
Out of curiosity, could you describe the issues you saw with OCR and podcasts? I haven't noticed any OCR issues in my testing. I wasn't expecting anyone to find/use this yet, especially since I haven't added any instructions. Thanks for checking it out and for the great contribution! I'm happy to hear any other feedback about your experience or suggestions. Thanks! |
Hey Kyle, Thanks for the improvement. I'll review it separately on your PR. The main issue I found is that often the longest word tended to be a mistranscription with OCR - it was garbled with the background text. Perhaps its because my Live Caption window is set up to be semitransparent. I didn't really debug it because I quickly found out that you can just get the text straight from the live caption window and that's a better solution than doing OCR. If you're really interested I could see if I could try replicating the issue but it'd probably have to wait until after Thanksgiving. I definitely appreciate you writing this as it's something I've been meaning to write myself for years (in various iterations - the last one was a python webview utilizing Aeneas - a long audio aligner) but never got very far. As a deaf person I do really enjoy audio content but I need to pair it with a transcript in order to enjoy it fully, so something like this makes listening to podcasts much less tedious 😄. The main feedback or suggestions I have are:
|
Ah transparent background makes sense. I only tested with a black background. Great suggestions! I added a note about hiding the live caption window and totally agree about highlighting current position. I was able to download a podcast webpage in Chrome and have it scroll offline. I'm glad you find this project useful too! Thanks! |
I noticed OCR was particularly buggy when trying to use this service listening to podcasts. I also noticed that you can actually get the text content of the live transcribe window using the accessibility service, so I just did that and took the last 10 or so words. That seemed to be enough/not too much in some limited testing to keep up with scrolling but there's probably some more tuning that could be done.
This will partially fix #8 too (at least as far as it can be fixed from this service's side - live caption still has to support the languages) since we're not depending on OCR anymore.