Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forvo scraping is dead (long live forvo?) aka: No results found #29

Open
Rascalov opened this issue Dec 17, 2023 · 0 comments
Open

Forvo scraping is dead (long live forvo?) aka: No results found #29

Rascalov opened this issue Dec 17, 2023 · 0 comments
Labels
enhancement New feature or request I have little time

Comments

@Rascalov
Copy link
Owner

Rascalov commented Dec 17, 2023

Before you read

There seems to have been a hopeful change since I wrote this post.
This Bot check that prevents the extension from working does not always occur.
Now that I have visited the site again through my actual browser, the check did not occur and the extension does not face a problem.

What happened?

As explained in #28 (comment), Forvo has made it very tedious to obtain audios in a smooth manner for the average Anki extension. You may notice a small "Cloudflare" check when you visit forvo.com

That check will go off without much of a hassle for you and your fully functional browser, but the scraper is a simple thing that cannot handle the dynamic nature of that browser check. Even if it could, it would not be able to automatically solve the captcha that might appear when cloudflare suspects a bot is trying to access the site.

Any way to still get audios?

The extension does support another website: lingualibre.org
If you go to your anki addon config, you can change this value to true:
image

Disclaimer

LinguaLibre is by far not a replacement of forvo, it has a very limited selection of words (at least, in Russian). Results may vary per language.

What now?

The only real path forward is to implement something I was recommended some time ago by #5. A dictionary of audios (scraped from Forvo) that can be called upon for audio lookups. This would mean that the extension would no longer use Forvo as a direct source for audios and would rely on an audio dictionary downloaded by the individual user.
Including all of these audio dictionaries inside the extension is super impractical, given 1 language can have multiple gigabytes of audios alone.

Will you actually do it?

Thing is, this is basically a rewrite of the entire extension, save for the user interface.
On top of that, it also forces the user to take extra actions, which is downloading the dump of forvo audios from (for example) this archive:
https://cloud.freemdict.com/index.php/s/pgKcDcbSDTCzXCs

Another major thing is time, I have not been around to fix much of anything related to this project in a while. I try to respond to most issues posted out of courtesy (and a sort of guilt), but that has been the extend of my participation.

I don't plan to kill the project, but I probably won't continue the Forvo route when I eventually have time to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request I have little time
Projects
None yet
Development

No branches or pull requests

1 participant