Skip to content
This repository has been archived by the owner on May 9, 2020. It is now read-only.

Commit

Permalink
Merge abcc968 into 8494039
Browse files Browse the repository at this point in the history
  • Loading branch information
codemanki committed Mar 7, 2019
2 parents 8494039 + abcc968 commit 78f5f54
Show file tree
Hide file tree
Showing 29 changed files with 1,606 additions and 810 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ node_modules
# Users Environment Variables
.lock-wscript

test.js
test.js
.nyc_output/
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ node_js:
- 8
- 6

sudo: false
sudo: false

after_success: npm run coverage
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
## Change Log

### v3.0.0 (07/03/2019)
- **BREAKING CHANGE**: `get/post` methods together with their signatures are aligned with corresponding methods from [request](https://github.com/request/request#requestmethod)
- **BREAKING CHANGE**: `cloudscraper.request` method is deprecated in favour of `cloudscraper(options)`
- Promise support has been added by using `request-promise`
- Error object are inherited from Error and have additional properties.
* `options` - The request options
* `cause` - An alias for `error`
* `response` - The request response
- Stacktraces are available in error objects
- `cloudflareTimeout` option can be defined to speed up waiting time
- Challenge evaluation is done in a sandbox to avoid potential secutiry issues
- Default [request methods](https://github.com/request/request#requestmethod) are available
- Custom cookie jar can now be passed [#103](https://github.com/codemanki/cloudscraper/issues/102)
- Proxies support [PR#101](https://github.com/codemanki/cloudscraper/pull/101)
- MIT license

### v2.0.1 (02/03/2019)
- Minor documentation changes

### v2.0.0 (09/12/2018)
- [#2943](https://github.com/codemanki/cloudscraper/pull/66) Support recursive challenge solving.
- **BREAKING CHANGE** Before this, when any error has been detected, the callback was called with an incorrect order: `callback(.., body, response);` instead of `return callback(..., response, body);`

20 changes: 0 additions & 20 deletions Gruntfile.js

This file was deleted.

136 changes: 113 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ Node.js library to bypass cloudflare's anti-ddos page.

[![js-semistandard-style](https://cdn.rawgit.com/flet/semistandard/master/badge.svg)](https://github.com/Flet/semistandard)

[![Build status](https://img.shields.io/travis/codemanki/cloudscraper/master.svg?style=flat-square)](https://travis-ci.org/codemanki/cloudscraper)
[![Coverage](https://img.shields.io/coveralls/codemanki/cloudscraper.svg?style=flat-square)](https://coveralls.io/r/codemanki/cloudscraper)

This library is a port of python module [cloudflare-scrape](https://github.com/Anorov/cloudflare-scrape) with couple enhancements and test cases ;)
. All grats to its author \m/

Expand All @@ -26,6 +29,46 @@ __Unfortunately, there is no support for handling a CAPTCHA, if the response con

If you notice that for some reason cloudscraper stopped to work, do not hesitate and get in touch with me ( by creating an issue here, for example), so i can update it.

Migration from v2 to v3
============
- Replace `cloudscraper.request(options)` with `cloudscraper(options)`
- `cloudscraper.get()` and `cloudscraper.post()` method signatures are aligned with corresponding methods from [request](https://github.com/request/request#requestmethod):
```
var options = {
uri: 'https://website.com/',
headers: {/*...*/}
};
cloudscraper.get(options, function(error, response, body) {
console.log(body);
});
```
or for **POST**
```
var options = {
uri: 'https://website.com/',
headers: {/*...*/},
formData: { field1: 'value', field2: 2 }
};
cloudscraper.post(options, function(error, response, body) {
console.log(body);
});
```
- If you are using custom promise support workarounds please remove them as cloudscraper now uses [request-promise](https://github.com/request/request-promise):

```
var cloudscraper = require('cloudscraper');
var options = {
uri: 'https://website.com/',
method: 'GET'
};
cloudscraper(options).then(function(body) {
console.log(body);
});
```

Install
============
```javascript
Expand All @@ -37,7 +80,7 @@ Usage
```javascript
var cloudscraper = require('cloudscraper');

cloudscraper.get('http://website.com/', function(error, response, body) {
cloudscraper.get('https://website.com/', function(error, response, body) {
if (error) {
console.log('Error occurred');
} else {
Expand All @@ -49,51 +92,95 @@ cloudscraper.get('http://website.com/', function(error, response, body) {
or for `POST` action:

```javascript
cloudscraper.post('http://website.com/', {field1: 'value', field2: 2}, function(error, response, body) {
...
var options = {
uri: 'https://website.com/',
formData: { field1: 'value', field2: 2 }
};

cloudscraper.post(options, function(error, response, body) {
console.log(body);
});
```

A generic request can be made with `cloudscraper.request(options, callback)`. The options object should follow [request's options](https://www.npmjs.com/package/request#request-options-callback). Not everything is supported however, for example http methods other than GET and POST. If you wanted to request an image in binary data you could use the encoding option:
A generic request can be made with `cloudscraper(options, callback)`. The options object should follow [request's options](https://www.npmjs.com/package/request#request-options-callback). Not everything is supported however, for example http methods other than GET and POST. If you wanted to request an image in binary data you could use the encoding option:

```javascript
cloudscraper.request({method: 'GET',
url:'http://website.com/image',
encoding: null,
challengesToSolve: 3, // optional, if CF returns challenge after challenge, how many to solve before failing
followAllRedirects: true, // mandatory for successful challenge solution
}, function(err, response, body) {
//body is now a buffer object instead of a string
var options = {
method: 'GET',
url:'http://website.com/',
};

cloudscraper(options, function(err, response, body) {
console.log(response)
});
```

## Error object
Error object has following structure:
```
var error = {errorType: 0, error:...};
## Advanced usage
Cloudscraper wraps request and request-promise, so using cloudscraper is pretty much like using those two libraries.
- Cloudscraper exposes [the same request methods as request](https://github.com/request/request#requestmethod):
`cloudscraper.get(options, callback)`
`cloudscraper.post(options, callback)`
`cloudscraper(uri)`
Please refer to request's documentation for further instructions
- Cloudscraper uses request-promise, promise chaining is done exactly the same as described in [docs](https://github.com/request/request-promise#cheat-sheet):
```
cloudscraper(options)
.then(function (htmlString) {
})
.catch(function (err) {
});
```

## Default options
Cloudscraper exposes following options that areq required by default but might be changed. Please note that default options increase chances of correct work.

```
var options = {
uri: 'https://website',
jar: requestModule.jar(), // Custom cookie jar
headers: {
// User agent, Cache Control and Accept headers are required
'User-Agent': 'Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36',
'Cache-Control': 'private',
'Accept': 'application/xml,application/xhtml+xml,text/html;q=0.9, text/plain;q=0.8,image/png,*/*;q=0.5'
},
// Cloudflare requires a delay of 5 seconds, so wait for at least 6.
cloudflareTimeout: 6000,
// followAllRedirects - follow non-GET HTTP 3xx responses as redirects
followAllRedirects: true,
// Support only this max challenges in row. If CF returns more, throw an error
challengesToSolve: 3
};
cloudscraper(options, function(error, response, body) {
console.log(body)
});
```
## Error object
Cliudscraper error object inherits from `Error` has following fields:
* `name` - `RequestError`/`CaptchaError`/`CloudflareError`/`ParserError`
* `options` - The request options
* `cause` - An alias for `error`
* `response` - The request response
* `errorType` - Custom error code
Where `errorType` can be following:
- `0` if request to page failed due to some native reason as bad url, http connection or so. `error` in this case will be error [event](http://nodejs.org/api/http.html#http_class_http_server)
- `1` cloudflare returned captcha. Nothing to do here. Bad luck
- `2` cloudflare returned page with some inner error. `error` will be `Number` within this range `1012, 1011, 1002, 1000, 1004, 1010, 1006, 1007, 1008`. See more [here](https://support.cloudflare.com/hc/en-us/sections/200038216-CloudFlare-Error-Messages)
- `3` this error is returned when library failed to parse and solve js challenge. `error` will be `String` with some details. :warning: :warning: __Most likely it means that cloudflare have changed their js challenge.__
- `4` CF went into a loop and started to return challenge after challenge. If number of solved challenges is greater than `3` and another challenge is returned, throw an error


Running tests
============
Clone this repo, do `npm install` and then just `grunt`

### Unknown error? Library stopped working? ###
Let me know, by opening [issue](https://github.com/codemanki/cloudscraper/issues) in this repo and i will update library asap. Please, provide url and body of page where cloudscraper failed.


CloudScraper uses [Request](https://github.com/request/request) to perform requests.

WAT
===========
Current cloudflare implementation requires browser to respect the timeout of 5 seconds and cloudscraper mimics this behaviour. So everytime you call `cloudscraper.get` you should expect it to return result after min 6 seconds.
Current cloudflare implementation requires browser to respect the timeout of 5 seconds and cloudscraper mimics this behaviour. So everytime you call `cloudscraper.get/post` you should expect it to return result after minimum 6 seconds. If you want to change this behaviour, you would need to make a generic request as desceribed in above and pass `cloudflareTimeout` options with your value. But be aware that cloudflare might track this timeout and use it against you ;)

## TODO
- [x] Check for recaptcha
Expand All @@ -102,17 +189,20 @@ Current cloudflare implementation requires browser to respect the timeout of 5 s
- [x] Add proper testing
- [x] Remove manual 302 processing, replace with `followAllRedirects` param
- [ ] Parse out the timeout from chalenge page
- [ ] Reoder the arguments in get/post/request methods and allow custom options to be passed in
- [x] Reoder the arguments in get/post/request methods and allow custom options to be passed in
- [ ] Expose solve methods to use them independently
- [ ] Support recaptcha solving
- [ ] Promisification
- [x] Promisification

## Kudos to contributors
- [Dwayne](https://github.com/pro-src) by himself rewrote the whole library, closed bunch of issues and feature requests. Praise him for 3.0.0 version ❤️
- [roflmuffin](https://github.com/roflmuffin)
- [Colecf](https://github.com/Colecf)
- [Jeongbong Seo](https://github.com/jngbng)
- [Kamikadze4GAME](https://github.com/Kamikadze4GAME)

## Dependencies
* request https://github.com/request/request
* [request](https://github.com/request/request)
* [request-promise](https://github.com/request/request-promise)


90 changes: 90 additions & 0 deletions errors.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
'use strict';

// The purpose of this library is two-fold.
// 1. Have errors consistent with request/promise-core
// 2. Prevent request/promise core from wrapping our errors

// There are two differences between these errors and the originals.
// 1. There is a non-enumerable errorType attribute.
// 2. The error constructor is hidden from the stacktrace.

var EOL = require('os').EOL;
var BUG_REPORT = format([
'### Cloudflare may have changed their technique, or there may be a bug.',
'### Bug Reports: https://github.com/codemanki/cloudscraper/issues',
'### Check the detailed exception message that follows for the cause.'
]);

var original = require('request-promise-core/errors');
var OriginalError = original.RequestError;

var RequestError = create('RequestError', 0);
var CaptchaError = create('CaptchaError', 1);
var CloudflareError = create('CloudflareError', 2);
var ParserError = create('ParserError', 3);
// errorType 4 is a CloudflareError so that constructor is reused.

// The following errors originate from promise-core and it's dependents.
// Give them an errorType for consistency.
original.StatusCodeError.prototype.errorType = 5;
original.TransformError.prototype.errorType = 6;

// This replaces the RequestError for all libraries using request/promise-core
// and prevents silent failure.
Object.defineProperty(original, 'RequestError', {
configurable: true,
enumerable: true,
writable: true,
value: RequestError
});

// Export our custom errors along with StatusCodeError, etc.
Object.assign(module.exports, original, {
RequestError: RequestError,
CaptchaError: CaptchaError,
ParserError: ParserError,
CloudflareError: CloudflareError
});

function create(name, errorType) {
function CustomError(cause, options, response) {

// This prevents nasty things e.g. `error.cause.error` and
// is why replacing the original RequestError is necessary.
if (cause instanceof OriginalError) {
return cause;
}

OriginalError.apply(this, arguments);

// Change the name to match this constructor
this.name = name;

if (this instanceof ParserError) {
this.message = BUG_REPORT + this.message;
}

if (Error.captureStackTrace) { // required for non-V8 environments
// Provide a proper stack trace that hides this constructor
Error.captureStackTrace(this, CustomError);
}
}

CustomError.prototype = Object.create(OriginalError.prototype);
CustomError.prototype.constructor = CustomError;
// Keeps things stealthy by defining errorType on the prototype.
// This makes it non-enumerable and safer to add.
CustomError.prototype.errorType = errorType;

Object.setPrototypeOf(CustomError, Object.getPrototypeOf(OriginalError));
Object.defineProperty(CustomError, 'name', {
configurable: true,
value: name
});

return CustomError;
}

function format(lines) {
return EOL + lines.join(EOL) + EOL + EOL;
}
Loading

0 comments on commit 78f5f54

Please sign in to comment.