Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hitting CAPTCHA preventing paid fanclub discovery #107

Open
Aaeeschylus opened this issue Feb 20, 2023 · 7 comments
Open

Hitting CAPTCHA preventing paid fanclub discovery #107

Aaeeschylus opened this issue Feb 20, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@Aaeeschylus
Copy link

Aaeeschylus commented Feb 20, 2023

I thought I should just make a new issue about this as it is a bit different to the previous issue I made (#103) and I can't reopen it.

When running
fantiadl_v1.8.3.exe -c cookies.txt -p -t -r -m or fantiadl.py -c cookies.txt -p -t -r -m I get the output of:

Collecting paid fanclubs...
Collected 0 fanclubs.

When printing out the response_page in models.py, it is CAPTCHA that is being returned instead of the expected page with fanclub links.
responsePageOutput.txt

Sadly, after waiting a couple weeks, hoping it would resolve itself, it didn't. Weirdly enough as well, I can access the entirety of Fantia on both Chrome and Firefox and have never even seen the CAPTCHA page. It for some reason is only ever being hit by fantiadl. Even if I go to the exact link that gets hit resulting in the CAPTCHA (https://fantia.jp/mypage/users/plans?type=not_free&page={1}) on a browser, I still do not actually get given the CAPTCHA.

@Coffeelatte369
Copy link

I second this, i have the same issue.

@bitbybyte
Copy link
Owner

It appears to be reCAPTCHA v3, and we can see what this page actually does on demand from https://fantia.jp/recaptcha. I know yt-dlp and others in the past have found a way to take the <iframe> source, which allows you to paste the URL into a browser, solve the CAPTCHA, and then copy the response hash back to the command line. That might not be necessary depending on what Fantia has configured their CAPTCHA score threshold to be, since in most cases v3 presents no challenge: https://developers.google.com/recaptcha/docs/v3

reCAPTCHA v3 returns a score (1.0 is very likely a good interaction, 0.0 is very likely a bot). Based on the score, you can take variable action in the context of your site.

As you can see here this page presents a button that kicks off a set_recaptcha_response() call which retrieves a response token from POST https://www.recaptcha.net/recaptcha/api2/reload?k=6LfMBeEUAAAAAM0aMGySYnrhwQAx0tB-9Y1Tu_R1. Basically, submit form #recaptcha_verify (to https://fantia.jp/recaptcha/verify) after setting the recaptcha_site_key (as seen in the API call) and recaptcha_response from the API response back:

function set_recaptcha_response(e){window.event.preventDefault(),grecaptcha.ready(function(){var t=document.getElementById("recaptcha_site_key").value;grecaptcha.execute(t,{action:"contact"}).then(function(t){var i=document.getElementById("recaptchaResponse");i.value=t;var n="#"+e;$(n).unbind("submit").submit()})})}

Their backend I presume then calls out to https://www.recaptcha.net/recaptcha/api/siteverify and determines how to proceed based on their score threshold. For me a POST https://fantia.jp/recaptcha/verify with the requisite form data returns 302 and takes me to the homepage.

To be frank, I'm not too keen to handle this at the moment because it only prevents scraping the paid plans page and hasn't affected manual downloading yet, but this is at least probably a path forward. If our verification score isn't up to their standard (too low) we'll probably then have to deal with the <iframe> method I mentioned above. Probably whatever /recaptcha/verify responds with would make that fairly straightforward.

@bitbybyte bitbybyte added the enhancement New feature or request label Feb 27, 2023
@bitbybyte
Copy link
Owner

bitbybyte commented Mar 22, 2023

Another alternative: we can take our followed fanclubs from /api/v1/me/fanclubs, fetch each at /api/v1/fanclubs/, then iterate over all plans and check their status.

This should work, but potentially could be very slow with lots of clubs.

@KJHJason
Copy link

KJHJason commented Apr 9, 2023

Their API has been recently protected by reCAPTCHA as well.

API response if flagged as a bot:

{
    "redirect": "/recaptcha"
}

@bitbybyte
Copy link
Owner

I was able to encounter this using the same cookie as my browser session. From the browser, I solved the CAPTCHA at /recaptcha and fantiadl was able to proceed after.

Rather than dealing with the above, I think what I'll do is implement a prompt to solve the CAPTCHA in a browser using the same session, then prompt the user if it's okay to proceed with something like:

You must solve a CAPTCHA to continue. Please solve the CAPTCHA at https://fantia.jp/recaptcha using the same session you used to retrieve your session cookie value. When done, enter "Y" to continue:

@KJHJason
Copy link

I was able to encounter this using the same cookie as my browser session. From the browser, I solved the CAPTCHA at /recaptcha and fantiadl was able to proceed after.

Rather than dealing with the above, I think what I'll do is implement a prompt to solve the CAPTCHA in a browser using the same session, then prompt the user if it's okay to proceed with something like:

You must solve a CAPTCHA to continue. Please solve the CAPTCHA at https://fantia.jp/recaptcha using the same session you used to retrieve your session cookie value. When done, enter "Y" to continue:

I believe it's also possible to use selenium to solve the CAPTCHA automatically. At least that's what I did for my golang CLI program using chromedp.

You could make an option for the user if they would like to opt in for automatic CAPTCHA solver or manual solve in the event the selenium approach no longer works.

@bitbybyte
Copy link
Owner

Using a headless browser to click the button probably works in most cases, but I also don't know how resilient that is since I don't know if Fantia ever throws back an actual CAPTCHA for you to solve. It seems not at all, or rare, so I'll keep it in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants