Skip to content
This repository was archived by the owner on May 9, 2020. It is now read-only.

(WIP) Add Accept-Encoding header#175

Closed
ghost wants to merge 1 commit intomasterfrom
unknown repository
Closed

(WIP) Add Accept-Encoding header#175
ghost wants to merge 1 commit intomasterfrom
unknown repository

Conversation

@ghost
Copy link

@ghost ghost commented Mar 31, 2019

Builds on #174 by adding the 'Accept-Encoding' header for domains that require it. Supports gzipped encoded responses by default and prevents CAPTCHA.

@ghost
Copy link
Author

ghost commented Mar 31, 2019

@ghost ghost changed the title Add Accept-Encoding header (WIP) Add Accept-Encoding header Apr 1, 2019
@ghost
Copy link
Author

ghost commented Apr 1, 2019

See #176 this PR isn't a 100% fix.

@codemanki
Copy link
Owner

@pro-src the 100% fix would be faking UA along with Accept headers?

@ghost
Copy link
Author

ghost commented Apr 2, 2019

Yup, for now at least

@codemanki
Copy link
Owner

@pro-src what about using some library like https://www.npmjs.com/package/useragent-generator with a set of hardcoded Accept headers from https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation/List_of_default_Accept_values per browser?

@ghost
Copy link
Author

ghost commented Apr 3, 2019

@codemanki I like the API in useragent-generator but I'm afraid that if we just plug random numbers in that it could produce some very uncommon user agent strings and that could be a problem. Using popular user agents has it's advantages and If we're going to use specific UA's then the additional dependency isn't warranted. I would think that adding those Accept headers from that mdn article to be a good solution though. One thing for sure is that article shows how easy it is for Cloudflare to ruin our day. e.g. If we were required to change the accept header depending on the type of resource being requested.

@ghost
Copy link
Author

ghost commented Apr 3, 2019

@codemanki 🤔 Maybe we shouldn't even try to cover this and demand that user's supply their own headers if they encounter trouble. Maybe point them to a web service that will show their headers so they can copy and paste them. This one really catches my eye as they can specify the resource to request: https://www.whatsmyip.org/http-headers/ (Hint: Disable follow-redirects under advanced options) It uses custom request headers. There are many similar services for this though.

On our end, besides updating the README, we could adjust the accept headers for types specified on that mdn page. e.g. cloudscraper.get({ uri, image: true }). We'd have to parse the Accept header to know how to adjust it and it's probably not even necessary right now.

@ghost
Copy link
Author

ghost commented Apr 3, 2019

@VeNoMouS what do you think?

@VeNoMouS
Copy link

VeNoMouS commented Apr 3, 2019

Well what i found... re the UA... you only need to match the UA on requesting and delivering the secret... past that point as long as your cookie is set CF just doesn't care.. you can use what ever UA you want...

and yes, you need to disable redirects on the secret other wise you end up in this continuous loop with your code if it doesn't verify the challenge least that was my experience..

@ghost
Copy link
Author

ghost commented Apr 3, 2019

@VeNoMouS Good point, thx for the chime in.

@ghost
Copy link
Author

ghost commented Apr 3, 2019

@codemanki After looking at the code of useragent-generator, I'm not too worried about the UA's that it generates. I'll give a 👍 for it because it's smart.

@ghost
Copy link
Author

ghost commented Apr 3, 2019

Related to what @VeNoMouS said: #177 (comment)

@ghost
Copy link
Author

ghost commented Apr 9, 2019

I haven't forgot about this. I'm collecting some data for it. Afterwards, I'll see about what needs to be done to fix this PR or close it in preference of a new one. 💡

@ghost
Copy link
Author

ghost commented Apr 12, 2019

I have the data and that wikipedia article is trash.
So what I've been able to determine is that the header order changes depending on the UA.
The headers that are present change depending on the UA.
The values of default headers change depending on the UA.
etc. etc.
The accept headers change depending on the browser version where as the article implies that it depends on the vendor.

Thus, if you want to properly spoof a browser then you must have the exact headers of that browser version and send them in the exact same order that it would send them.

I have the data that we need to do UA randomization properly and will be sending a new PR.

Edit: If you want a copy of the data, just let me know.

@ghost
Copy link
Author

ghost commented Apr 13, 2019

Closing in preference of #196

@ghost ghost closed this Apr 13, 2019
@ghost ghost deleted the 170_fix_build branch April 13, 2019 08:32
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants