Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Captcha #729

Open
sut12 opened this issue Oct 12, 2016 · 45 comments

Comments

@sut12
Copy link

commented Oct 12, 2016

Hey,

i have the problem that google gives me captchas. Therefor i cant see google results. My Server is not blacklistet anywhere. So im wondering what i can do to get rid of the captchas. My instance is not hosted under https. Is that maybe a reason for the google captchas?

Thanks!

@dalf

This comment has been minimized.

Copy link
Collaborator

commented Oct 12, 2016

If you can, use a browser using the same IP than your server : google will throw some captchas at first, and will usually stop after few requests.
One way to do it : install Firefox on the server, then use ssh with X forwarding or VNC.

Not sure, if you need to check if your IP is blacklisted ( how to check : http://www.dnsbl.info/dnsbl-database-check.php not sure either that this is right service to use)

@sut12

This comment has been minimized.

Copy link
Author

commented Oct 13, 2016

Hello,
i used elinks. It is possible to save images. So i saved the captcha and downloaded it afterwards. For now it is working but i had to enter the captcha already more than once.

I am not listed on any blacklist except on spamcannibal:
generic/anonymous/un-named IP

I need to check how i can get rid of this message. But im not sure if it has something to do with the google problem.

@sut12

This comment has been minimized.

Copy link
Author

commented Oct 13, 2016

It is so wired. Yesterday evening it worked after i enterd the captchas. This morning it didnt work.

Now it works but i didnt enter any captchas since yesterday...

@cy8aer

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2016

Google is dead for me (as startpage).

@sut12

This comment has been minimized.

Copy link
Author

commented Oct 24, 2016

I need google. But its working sometimes and sometimes not. I even tried to analyze with google webmaster tools...

Right now its working - 2morrow maybe not.

@cy8aer

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2016

Seems that google sets GOOGLE_ABUSE_EXEMPTION cookie after captcha. And this cookie has a timeout date (sometimes). If there would be some proxying mechanism for the captcha for re-setting the cookie would be nice

@cy8aer

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2016

what about forwarding the captcha to the frontend and save the cookie in the backend then?

@sut12

This comment has been minimized.

Copy link
Author

commented Nov 3, 2016

I think that would be the best solution.

@ghost

This comment has been minimized.

Copy link

commented Nov 3, 2016

I wonder how searx.me(and other searx admin) handle google captcha.
Sure they process 1K+/day request to google without getting "scroogled".

@gszathmari

This comment has been minimized.

Copy link

commented Mar 26, 2017

I am having the same issue. It doesn't help if I solve the CAPTCHAs from a browser as the GOOGLE_ABUSE_EXEMPTION should be sent with my searx queries.

@kvch

This comment has been minimized.

Copy link
Collaborator

commented Mar 26, 2017

@gszathmari What do you mean? What is GOOGLE_ABUSE_EXEMPTION?

@gszathmari

This comment has been minimized.

Copy link

commented Mar 26, 2017

@sut12

This comment has been minimized.

Copy link
Author

commented Mar 26, 2017

Thats exactly the point. cy8aer also pointed in the right direction:

what about forwarding the captcha to the frontend and save the cookie in the backend then?

I also tried to admin my domain with google webmaster tools. So that goole doenst ask for captcha anymore. It didnt help.

@cy8aer

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2017

Yeah but I did not know the name of the cookie 'GOOGLE_ABUSE_EXEMPTION'. Thanks to gszathmari

@prolibre

This comment has been minimized.

Copy link

commented Dec 8, 2017

Engines cannot retrieve results:
google (unexpected crash: CAPTCHA required)

every day or almost google blocks me... it's getting unmanageable for me.

Anthony

@zwnk

This comment has been minimized.

Copy link

commented Jan 2, 2018

i recently updated to 0.13.1 and get the same error. went back to 0.12 and get google results.

@sut12

This comment has been minimized.

Copy link
Author

commented Jan 2, 2018

hmmm. I am on 0.12.0 and dont get google results. Maybe i should try to update xD.
Oh wait - rightn ow im getting results. But i am sure tomorrow it wont work again.

@kvch

This comment has been minimized.

Copy link
Collaborator

commented Jan 2, 2018

@zwnk have you tried the latest master? there were a few changes since the last release which fixed known google issues.

@zwnk

This comment has been minimized.

Copy link

commented Jan 2, 2018

i pulled today and had the google captcha error.

@Pofilo

This comment has been minimized.

Copy link
Collaborator

commented Jan 3, 2018

This is not the first person asking for the problem with google.
As it is solved with a commit since the last release, maybe we can do a new release ?

@asciimoo

This comment has been minimized.

Copy link
Owner

commented Jan 3, 2018

@Pofilo we can release a new version (0.13.2), but as I see these fixes don't solve the problem permanently for everybody.

@sut12

This comment has been minimized.

Copy link
Author

commented Jan 5, 2018

I was on 12.0 and it just worked for 3 days now. Today in the morning it didnt work again. No results from googel. So i did an update to 13.1. Now it works again. Changed nothing else. Lets see how long it works.

@Dominion0815

This comment has been minimized.

Copy link

commented Feb 5, 2018

same problem here with 0.13.1:

ERROR:searx.search:engine google : exception : CAPTCHA required
Traceback (most recent call last):
  File "/usr/local/searx/searx/search.py", line 104, in search_one_request_safe
    search_results = search_one_request(engine, query, request_params)
  File "/usr/local/searx/searx/search.py", line 87, in search_one_request
    return engine.response(response)
  File "/usr/local/searx/searx/engines/google.py", line 217, in response
    raise RuntimeWarning(gettext('CAPTCHA required'))
RuntimeWarning: CAPTCHA required

@sachaz

This comment has been minimized.

Copy link

commented Feb 13, 2018

Hi,

I just updated https://searx.aquilenet.fr to 13.1 and I still got the same issue as Dominion0815

@sachaz

This comment has been minimized.

Copy link

commented Feb 18, 2018

issue corrected, my instance was bugged by bots. Corrected with filtron:
https://asciimoo.github.io/searx/admin/filtron.html
https://github.com/asciimoo/filtron

@Pofilo

This comment has been minimized.

Copy link
Collaborator

commented Feb 18, 2018

Interesting, I will install filtron and give a feedback !

@steckerhalter

This comment has been minimized.

Copy link

commented Mar 10, 2018

I'm using filtron but still get the captcha. Any tips maybe on configuring filtron or something?

@dalf

This comment has been minimized.

Copy link
Collaborator

commented Mar 10, 2018

@steckerhalter : just in case, are you sure that the X-Forwarded-For header is set when the requests are forwarded to Filtron?

@steckerhalter

This comment has been minimized.

Copy link

commented Mar 10, 2018

@dalf ah maybe not, I'm using Apache. So let me check that, thanks.

@sachaz

This comment has been minimized.

Copy link

commented Mar 12, 2018

here is a link on what I made to make it work (for me):
https://atelier.aquilenet.fr/projects/services/wiki/Searx

@steckerhalter

This comment has been minimized.

Copy link

commented Mar 12, 2018

@sachaz thanks I'll try that

@steckerhalter

This comment has been minimized.

Copy link

commented May 5, 2018

for the record, I have configured filtron but google would still demand the captcha sometimes. the search is used by quite a few people. in the end we decided to disable google altogether.

@miicha

This comment has been minimized.

Copy link

commented Jun 12, 2018

I have the same problems with google: now I'm using filtron and a new IP for the searx instance, but after one or two weeks I get the same problem again.
Now I tried to copy all the google.com cookies (_ga, _gid, SNID, NID, DV, 1P_JAR, PAIDCONTENT,...) from firefox into my google.py (like in #1121) and google instantly works again (at least for a moment).
Therefore my question would it be an idea (privacy?) to set these cookies?

@Pofilo

This comment has been minimized.

Copy link
Collaborator

commented Jun 12, 2018

@miicha thanks for the feedback !

How did you add the cookies into google.py ? Can you give us a copy of the portion of the file ? (hidding all sensible values of course !!)

It seems we have a solution (I have some doubts about how long it will works, maybe the duration of the cookie), but it is not a solution we can fix with a PR. If we all have the same cookies, I think google will block them.

@miicha

This comment has been minimized.

Copy link

commented Jun 13, 2018

It seems to be enough to add these 3 lines (for example before: params['url'] = search_url.format(offset=offset,):
params['cookies']['_ga'] = 'xxxxxxxxxxxx' # 2 years | Used to distinguish users.
params['cookies']['_gid'] = 'xxxxxxxxxxxxxx' # 24 hours | Used to distinguish users.
params['cookies']['1P_JAR'] = '2018-6-13-17' # changes hourly
info about lifetime according to https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage and my own observation
maybe even 1 or 2 of these lines are sufficient...

@asciimoo

This comment has been minimized.

Copy link
Owner

commented Jun 14, 2018

This solution raises privacy questions. Session cookies can be used to build a user profile. It would be good to find a solution which doesn't increase the sent information about the instance.

@sachaz

This comment has been minimized.

Copy link

commented Jun 14, 2018

google (unexpected crash: CAPTCHA required)
is back again on our instance after some 4 months of happiness with Filtron. :(

@dadosch

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2018

what about using temporary ipv6 addresses for google? Would that help, so that google can't flag us because of many requests per IP (is it??)

@steckerhalter

This comment has been minimized.

Copy link

commented Aug 8, 2018

I'd say the solution is simple: just don't use google. alternatively you can enable startpage which gives you the same results. it's just not worth it trying to satisfy the big g algos.

@miicha

This comment has been minimized.

Copy link

commented Oct 29, 2018

As I wrote earlier, I usee filtron, but still got the google captcha crash.
I configured filtron according to some tutorial I found online (most probably https://asciimoo.github.io/searx/admin/filtron.html and other) but did not check if it was working. Now I found time to check it, and guess what, it wasn't working. Additionally, I turned on logging to investigate what was going on.

I found massive bot traffic on my domain, searching mostly for download stuff, requesting json output.
After removing "filters": ["Param:q", "Path=^(/|/search)$"], from my filtron config file, it was finally working and I could turn of the logging again.

Unfortunately I could not figure out, why the abovementioned filter was not working as intended, but I don't care to much.

If somebody is interested in the rules.json file I could post it somewhere (maybe in the filtron repo?)

@GitHubGeek

This comment has been minimized.

Copy link

commented Nov 7, 2018

Could it be Google tweaking their abuse algorithms to be super stringent? My instance is 100% private (login required) and I'm the only user. Still, the same error comes up after a handful of search.

@mohe2015

This comment has been minimized.

Copy link

commented Jun 20, 2019

@GitHubGeek If you are using a webhoster It could also be that Google blocks a whole ip-range because somebody in there is a spammer. For me that was the case for a mail server.

@unixfox

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

Hello searx public instance owners,

Since the end of May I have been working on an antibot solution to stop being blocked by Google because of the bad bots that try to do some ranking manipulation on my public instance.

Today I'm sharing with you an early version of the program. It is available here: https://github.com/unixfox/antibot-proxy
You can try it and giving me some feedback on the github repository (please open an issue instead of replying here if you find any problem). If you are currently using filtron you may remove it and use exclusively antibot-proxy for the protection between your main web server and the searx program because antibot-proxy is enough to remove around 99% of the bots.

The program have been running on my personal public searx instance for almost two months: https://searx.be and my searx instance have been almost never been blocked by Google since I deployed my antibot solution. It is not completely without bugs but should work just fine for almost every users.

@sachaz

This comment has been minimized.

Copy link

commented Aug 9, 2019

Yeah ! Thanks unixfox, this is a good idea !
Should we use it above Filtron ?

@unixfox

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

@sachaz no it's meant to be a better replacement of Filtron.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.