Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Athlon1600 committed May 15, 2020
1 parent 699951d commit 7e81b20
Showing 1 changed file with 57 additions and 17 deletions.
74 changes: 57 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
![](https://img.shields.io/github/last-commit/Athlon1600/SerpScraper.svg)

SerpScraper
===========

The purpose of this library is to provide an easy, undetectable, and captcha resistant way to extract search results
from popular search engines like Google and Bing.

**Captcha solver status:**
still being built...

## Installation

The recommended way to install this is via Composer:
Expand All @@ -21,10 +20,12 @@ https://github.com/Athlon1600/SerpScraper/tree/2.x
## Extracting Search Results From Google

```php
<?php

use SerpScraper\Engine\GoogleSearch;

$page = 1;

$google = new GoogleSearch();

// all available preferences for Google
Expand All @@ -39,31 +40,71 @@ do {
$response = $google->search("how to scrape google", $page);

// error field must be empty otherwise query failed
if($response->error == false){
if(empty($response->error)){

$results = array_merge($results, $response->results);
$page++;

} else if($response->error == 'captcha'){

// assuming you have a subscription to this captcha solving service: http://www.deathbycaptcha.com
$status = $google->solveCaptcha("dbc_username", "dbc_password");

if($status){
$page++;
}

continue;

} else if($response->error == 'captcha'){

// read below
break;
}

} while ($response->has_next_page);
```


## Solve Google Search captchas automatically

For this to work, you will need to register for 2captcha.com services, and get an API key.
It is also highly recommended to use a proxy server.
Install a private proxy server on your own VPS here:
https://github.com/Athlon1600/useful#squid

```php
<?php

use SerpScraper\Engine\GoogleSearch;
use SerpScraper\GoogleCaptchaSolver;

$google = new GoogleSearch();

$browser = $google->getBrowser();
$browser->setProxy('PROXY:IP');

$solver = new GoogleCaptchaSolver($browser);

while(true){
$response = $google->search('famous people born in ' . mt_rand(1500, 2020));

if ($response->error == 'captcha') {

echo "Captcha detected!" . PHP_EOL;

$temp = $solver->solveUsingTwoCaptcha($response, '2CAPTCHA_API_KEY', 90);

if ($temp->status == 200) {
echo "Captcha solved successfully!" . PHP_EOL;
} else {
echo 'Solving captcha has failed...' . PHP_EOL;
}

} else {
echo "OK. ";
}

sleep(2);
}
```



## Extract Search Results from Bing

```php
<?php

use SerpScraper\Engine\BingSearch;

$bing = new BingSearch();
Expand All @@ -84,4 +125,3 @@ for($page = 1; $page < 10; $page++){
var_dump($results);

```

0 comments on commit 7e81b20

Please sign in to comment.