Skip to content
This repository has been archived by the owner on May 9, 2020. It is now read-only.

Randomize UA #133

Merged
merged 2 commits into from
Mar 30, 2019
Merged

Randomize UA #133

merged 2 commits into from
Mar 30, 2019

Conversation

codemanki
Copy link
Owner

Fixes #77

@pro-src i first created a randomizeUserAgent flag on defaultParams, but then decided to just randomize the UA by default, as that flag was brining more confusion than actual value

@ghost
Copy link

ghost commented Mar 10, 2019

@codemanki I think that if this is going to be on by default then there should be at least a minor version bump. There also needs to be a warning in the README to inform people that sending a cookie that was acquired with a different user agent is a bad idea. The user agent should always be stored with the session cookie. This could very well be a breaking change if anybody is already using v3 and is storing session cookies for reuse.

@codemanki
Copy link
Owner Author

codemanki commented Mar 11, 2019

@pro-src so you are saying that it should be actually a flag that is off by default?
I'm not sure how to store it together with a cookie. Then it should be really explicitly described in the readme that "if the flag is true, you have to manually acquire the UA from the headers, and then pass it with the next requests". And this makes things really dirty 🤔

@ghost
Copy link

ghost commented Mar 11, 2019

@codemanki, what about adding a method to apply a random user agent?

// Somewhere in the custom defaults function
cloudscraper.setUserAgent = setUserAgent;

// Pseudo definition
function setUserAgent(ua) {
  if (ua === 0) {
    ua = this.defaultParam.headers['User-Agent'];
    var old = ua;
    while (ua === old) ua = randomUA();
  }

  if (!ua || typeof ua !== 'string') {
    throw new TypeError('Expected userAgent to be a non-empty string, got ' +
      typeof (ua) + ' instead.';
  }

  this.defaultParam.headers['User-Agent'] = ua;
}

// Usage
var cloudscraper = require('cloudscraper');
cloudscraper.setUserAgent(0);

@ghost
Copy link

ghost commented Mar 11, 2019

@codemanki

so you are saying that it should be actually a flag that is off by default?

I'm saying it should be off by default but it doesn't necessarily have to be a flag. At least a minor bump, preferably a major bump before we switch the default behavior. Maybe set a milestone for v4 that is to change this to on by default?

@codemanki
Copy link
Owner Author

codemanki commented Mar 11, 2019

I'm saying it should be off by default but it doesn't necessarily have to be a flag.

Then I would just leave it as it is and add a notice to the readme, saying that UA randomization is a nice to have but should be handled manually by user.
If user wants to use UA headers in more advanced way, they would want to use something like https://www.npmjs.com/package/random-ua
The suggested by you random UA filling looks like a workaround, that is not very obvious and is limited to the defined list of 10-15 UAs.

@ghost
Copy link

ghost commented Mar 11, 2019

@codemanki Actually, I'd rather have this on by default and release it now in the patch version then to not have it all, lol. No convenience methods necessary. However we could just document the method as only randomly returning one of the ~5 most popular user agents. Because they won't be aware of the method unless they read the readme anyway...

@codemanki
Copy link
Owner Author

@pro-src but if it is on by default, then users should also keep a track of selected UA. But on other hand, if cloudscraper is used in non-advanced way, users won't probably save cookies to reuse them and etc.

So should it be a method as you suggested, or a flag? :)

@ghost
Copy link

ghost commented Mar 11, 2019

@codemanki I think a method would better so we can change the user agent on demand. If getting a captcha, you might be able to circumvent it just by changing the UA. See this issue: Anorov/cloudflare-scrape#190

So, we only really need either a flag or method but if you wanted to you could add both. I'm neutral on it.

@ghost
Copy link

ghost commented Mar 13, 2019

@codemanki Did you decide on what to do with this?

@codemanki
Copy link
Owner Author

@pro-src oh snap. yes, i want to continue with a flag, as it looks like we are currently using this kind of strategy for configuration. 👍 ?

@ghost
Copy link

ghost commented Mar 15, 2019

@codemanki 🤔 If a flag is introduced, the UA would have to be set during the first performRequest and kept for all future calls to cloudscraper? Or would it be the case that you could only enable/disable this by using the defaults function?

@ghost
Copy link

ghost commented Mar 15, 2019

@codemanki See my #62 (comment) about adding a flag that should apply to all requests.

@ghost
Copy link

ghost commented Mar 15, 2019

I'm just noting this here since changing the UA can potentially bypass a captcha.

var cloudscraper = require('cloudscraper');
var errors = require('cloudscraper/errors');

cloudscraper.get('http://example-site.dev')
  .catch(function(error) {
    if (error instanceof errors.CaptchaError) {
      cloudscraper.defaultParams.headers['User-Agent'] = 'new UA';
      return cloudscraper.get('http://example-site.dev');
    }

    throw error;
  })
  .then(doSomethingWithBody)

@codemanki
Copy link
Owner Author

@pro-src hey. sorry, i was out for some time, now i'm back. I will take a look at your comments and will adjust my PR

@ghost
Copy link

ghost commented Mar 19, 2019

@codemanki Thanks for the update!

@codemanki
Copy link
Owner Author

To reopen this PR. Do we want to set a new UA on errors? Or on each request?

@ghost
Copy link

ghost commented Mar 24, 2019

Do we want to set a new UA on errors?

I like the idea of having an option that causes the UA to be changed if we get a captcha. However, you can already do that using the current API.

Or on each request?

Changing the UA on a per-request basis is definitely a bad idea.

I really don't want to delay this PR any further, can we just merge it as is?

I do have a couple concerns:

  • This is changing the default behavior which is not an issue if users have to explicitly enable it.
  • The user should reuse the user-agent if cookies are reused in later sessions which complicates things.

The first concern could normally be resolved by a major version bump but users have to update to the latest version otherwise cloudscraper won't work for the latest challenge. Once the parsing methods are separated from the main library, we should consider backporting so breaking changes aren't forced on users.

The second concern would be addressed by updating the README but maybe later we could add a method to get the cookies and UA together?

@codemanki codemanki force-pushed the feature/randomize_user_agent branch from eff7f15 to 2874089 Compare March 30, 2019 10:43
@codemanki codemanki merged commit 47cd80c into master Mar 30, 2019
@codemanki codemanki deleted the feature/randomize_user_agent branch March 30, 2019 10:48
@ghost
Copy link

ghost commented Apr 29, 2019

@codemanki I just realized that we haven't updated the project markdown to mention saving the UA with the cookies. Probably should do that...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Randomize default useragent
1 participant