Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping Google in the client side #1608

Open
agnelvishal opened this issue May 27, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@agnelvishal
Copy link

commented May 27, 2019

Since Google introduces captcha when there are too many requests from an IP ( #729), it might be better to scrape Google in the client side. This will also decrease server load.

@unixfox

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

This seems to be quite difficult to implement and probably out of the scope of Searx.

It would require to when the page load, the client do a request with javascript to fetch the Google results then generate a new HTML code and finally inject it into the page.

I'm pretty sure it would add quite a lot of delay and complexity.

The main issue is not the captcha of Google but it's bots that abuse the public Searx instances. If we get rid of these bots we wouldn't have any issue with Google like I already do on my instance : https://searx.be

@eggercomputerchaos

This comment has been minimized.

Copy link

commented May 29, 2019

@unixfox
nice,
how did you did get rid of these bots?

@unixfox

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

@eggercomputerchaos I talked about it here: #1584 (comment) and here: #1034 (comment)

@eggercomputerchaos

This comment has been minimized.

Copy link

commented May 29, 2019

@unixfox
<Meanwhile, you can use my Searx instance: https://searx.be. It works by default with Google without any issue.>
thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.