Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web sites using Google Recaptcha V3 fail in testcafe-hammerhead #2223

Closed
adriancable opened this issue Feb 3, 2020 · 11 comments
Closed

Web sites using Google Recaptcha V3 fail in testcafe-hammerhead #2223

adriancable opened this issue Feb 3, 2020 · 11 comments
Labels
FREQUENCY: level 1 STATE: Auto-locked Issues that were automatically locked by the Lock bot

Comments

@adriancable
Copy link

Hi. The title summarises the situation well. This is using testcafe-hammerhead on its own (not via testcafe). Background: Google Recaptcha V3 is an interaction-less bot detector that's used as part of the login flow on many third party sites which does some logic on the client side to try and tell if the client is a human or a bot. It returns a confidence score between 0 (almost certainly bot) and 1 (almost certainly human). It's up to the website to decide what to do with the confidence score: most will block logins with a confidence score less than 0.3. Try for yourself by visiting this URL in your browser to see what your own score looks like to Recaptcha V3:

https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php

Now the issue is that pages served via the testcafe-hammerhead proxy always return a score of 0.1, no matter what. (So try the above URL through testcafe-hammerhead, or testcafe if that's easier.) This means that web sites that demand a higher score will block logins. For example, it is not possible to log into the Nest service (home.nest.com/login/nest) using a Nest Account over testcafe-hammerhead for this reason.

My understanding is that testcafe-hammerhead should be completely invisible to the web site being proxied. Clearly something is happening here to make that not true. Google doesn't document what tests Recaptcha V3 actually does on the browser, so it isn't immediately clear to know what might be failing. But the general question is: what behaviours are different (from the perspective of the web page) when proxying through hammerhead? How can the web site know?

@LavrovArtem
Copy link
Contributor

Hello,

I think that the difference between recaptcha with and without testcafe-hammerhead is that a browser sends a request through the http/2 protocol, but proxy through the https protocol. Thus, I suppose this issue is a duplicate for DevExpress/testcafe#7182.

@adriancable
Copy link
Author

@LavrovArtem - please do not be so eager to close this ticket.

It is not an HTTP/2 vs. HTTPS issue. If I run Chrome (or Chromium) with --disable-http2, and confirm that HTTP/2 is not being used, I see exactly the same behaviour. Score 0.7 without testcafe-hammerhead, compared with 0.1 when using testcafe-hammerhead.

Please try it for yourself.

https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php

@LavrovArtem
Copy link
Contributor

I will research it.

@thuey-nelnet
Copy link

I am experiencing a similar issue. For me, it's not enough to ignore the score on test environments. The issue is that when I request two captchas within the same session, I get back an error timeout-or-duplicate when I go to validate the token. I'm assuming testcafe-hammerhead is doing or swallowing something that results in Recaptcha producing identical tokens.

Have you found anything out in your research @LavrovArtem ?

@alexey-lin
Copy link
Contributor

Hi @thuey-nelnet,

Thank you for additional information. There are no news on this issue yet though. We'll update this thread once we have anything to share. Please stay tuned.

@Farfurix
Copy link
Contributor

We examined the issue in detail and decided not to change our internal logic so that the reCAPTCHA test is passed. These changes will affect our event logic and other parts of our sandbox.

Neither TestCafe nor other testing frameworks can pass the reCAPTCHA check, for example:
https://stackoverflow.com/questions/55501524/how-does-recaptcha-3-know-im-using-selenium-chromedriver
https://stackoverflow.com/questions/55493536/how-to-deal-with-the-captcha-when-doing-web-scraping-in-puppeteer

As a workaround, you can use the following recommendations from reCAPTCHA FAQ:

For reCAPTCHA v3, create a separate key for testing environments. Scores may not be accurate as reCAPTCHA v3 relies on seeing real traffic.

For reCAPTCHA v2, use the following test keys. You will always get No CAPTCHA and all verification requests will pass.

Site key: 6LeIxAcTAAAAAJcZVRqyHh71UMIEGNQ_MXjiZKhI
Secret key: 6LeIxAcTAAAAAGG-vFI1TnRWxMZNFuojJ4WifJWe
The reCAPTCHA widget will show a warning message to ensure it's not used for production traffic.

As for reCAPTCHA v3, you can change the "score" threshold in your development build or disable the reCAPTCHA check completely.

@adriancable
Copy link
Author

@Farfurix - thanks for your comment. I do understand that addressing this would take time and you feel it is better to address other issues in its place.

I do think you should amend the README for testcafe-hammerhead to help reduce confusion. Right now it says that the target web site doesn't know it's being opened under a proxy. This is clearly not true for certain common scenarios e.g. HTTP/2, or Recaptcha. I would like to see these listed so other people do not spend time scratching their heads why things do not work with the proxy as they do without.

Since you have examined the issue in detail, it also be helpful and interesting to hear from you why TestCafe doesn't currently pass the Recaptcha V3 check. I do not think the reason is the same as why Selenium and Puppeteer do not pass it. Recaptcha V3 includes a hard-coded check for objects like window.navigator.webdriver and will return a low score if found, which is what disqualifies Selenium and Puppeteer. But TestCafe does not have window.navigator.webdriver and I am almost certain Recaptcha V3 does not include a hard-coded check for window['%hammerhead%'].

@Farfurix
Copy link
Contributor

@adriancable

Hello,

We'll add information about this scenario to our documentation.

Since you have examined the issue in detail, it also be helpful and interesting to hear from you why TestCafe doesn't currently pass the Recaptcha V3 check.

We know some of the potential issues associated with our sandbox. Since there is the official reCAPTCHA recommendation, we have no plans to continue our research of this issue.

@thuey-nelnet
Copy link

@Farfurix Thanks for this insight. As I mentioned in my comment above, I don't think the official reCAPTCHA recommendation will work for me.

The issue I am experiencing is that if I obtain a reCAPTCHA twice in the same session, the second one will result in a timeout-or-duplicate error response from google when I go to validate it. It's not a matter of lowering the score threshold; I don't even get back a successful response. I have confirmed that the token is not expired, and I am definitely requesting a new token on the second action.

This only happens when I interact with my site through the proxy. The only explanation I can think of is that there's something about the way the proxy is proxying requests that results in an identical token being generated. Do you have any other thoughts about why I would be experiencing this issue? Alternatively, is there some way through TestCafe to not proxy the requests to a specific domain? That way, I could whitelist the google recaptcha domain.

@Farfurix
Copy link
Contributor

@thuey-nelnet

Hello,

Could you please create a new bug report with your sample project? We'll examine it and check for a suitable solution.

@lock
Copy link

lock bot commented Jun 24, 2020

This thread has been automatically locked since it is closed and there has not been any recent activity. Please open a new issue for related bugs or feature requests. We recommend you ask TestCafe API, usage and configuration inquiries on StackOverflow.

@lock lock bot added the STATE: Auto-locked Issues that were automatically locked by the Lock bot label Jun 24, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
@LavrovArtem LavrovArtem removed this from Need research in Categorization Nov 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FREQUENCY: level 1 STATE: Auto-locked Issues that were automatically locked by the Lock bot
Projects
None yet
Development

No branches or pull requests

5 participants