ReferenceError: atob is not defined #215

Krylanc3lo · 2019-03-29T14:59:59Z

Hello,

I got the below error since a couple of days: js2py.internals.simplex.JsException: ReferenceError: atob is not defined

File "/home/maxx/.local/lib/python3.6/site-packages/js2py/base.py", line 1074, in get
return self.prototype.get(prop, throw)
File "/home/maxx/.local/lib/python3.6/site-packages/js2py/base.py", line 1079, in get
raise MakeError('ReferenceError', '%s is not defined' % prop)
js2py.internals.simplex.JsException: ReferenceError: atob is not defined

Anyone is experiencing the same ?

Thank you!

pawliczka · 2019-03-29T16:04:07Z

I think that it is releated to #212

pawliczka · 2019-03-29T16:29:30Z

i think that we have to add at the beginning of the node -e command

if (typeof atob === 'undefined') {
  global.atob = function (b64Encoded) {
    return new Buffer(b64Encoded, 'base64').toString('binary');
  };
}

`$ node -e "global.Buffer = global.Buffer || require('buffer').Buffer;if (typeof atob === 'undefined') {global.atob = function (str) {return new Buffer(str, 'base64').toString('binary');};}console.log(atob('Hello'));"

ée
`

Krylanc3lo · 2019-03-29T17:08:34Z

Thank you Pawel, I will try it

You mean adding this code in base.py script ?

pawliczka · 2019-03-29T17:13:15Z

We have to edit this line
js = "console.log(require('vm').runInNewContext('%s', Object.create(null), {timeout: 5000}));" % js
I will create pull request when I back home

lukastribus · 2019-03-29T17:13:48Z

js2py has not been used for a long time. First of all, update your code please (and install node).

Krylanc3lo · 2019-03-29T17:18:49Z

Thanks Lukas for the suggestion, using node & updating the code led to the same error but at least I am using the latest version:

ReferenceError: atob is not defined
at evalmachine.:1:609
at evalmachine.:1:908
at ContextifyScript.Script.runInContext (vm.js:59:29)
at ContextifyScript.Script.runInNewContext (vm.js:65:15)
at Object.runInNewContext (vm.js:135:38)
at [eval]:1:27
at ContextifyScript.Script.runInThisContext (vm.js:50:33)
at Object.runInThisContext (vm.js:139:38)
at Object. ([eval]-wrapper:6:22)
at Module._compile (module.js:652:30)
ERROR:root:Error executing Cloudflare IUAM Javascript. Cloudflare may have changed their technique, or there may be a bug in the script.

I will try to implement Pawel's suggestion

pawliczka · 2019-03-29T20:44:48Z

It is more complicated than i thought before. But we can replace content between atob("ZG9jdW1l") and atob("aW5uZXJIVE1M") ('document.getElementById(k).innerHTML') with the text defined under html element with id defined by k variable

pawliczka · 2019-03-29T22:20:39Z

#212 is coused by this function which returns ASCI code of letter at t[p]

`(function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}(+((!+[]+!![]+!![]+[])))) `

And we got 'Cannot read property 'charCodeAt' of undefined' because we are not passing t variable to nodejs call

pawliczka · 2019-03-29T23:34:09Z

Ok. I finally got it. I will provide code tomorrow.

Krylanc3lo · 2019-03-29T23:36:59Z

Great! looking forward to it.

Thanks again Pawel!

VeNoMouS · 2019-03-29T23:51:02Z

@pawliczka have you got any pseudo code that we can implement in the mean time for those with projects that relay on bypassing cf?

pawliczka · 2019-03-30T00:19:54Z

To solve the problem with undefined atob you have to replace atob("ZG9jdW1l")....atob("aW5uZXJIVE1M") for me:
atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")."+atob("aW5uZXJIVE1M")
with data under element defined by variable k (for me k = 'cf-dn-lZTYtMjTTnWU';) and
<div style="display:none;visibility:hidden;" id="cf-dn-lZTYtMjTTnWU">+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((+!![]+[])+(+!![])+(+[])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))</div>
Also you have to solve the problem with TypeError: Cannot read property 'charCodeAt' of undefined and a.value you can do it this way:
js = js.replace('a.value','a')
js = js.replace("; 121",'')
js = "console.log(require('vm').runInNewContext('var a; var t = \"%s\";%s', Object.create(null), {timeout: 5000}));" % (domain, js)
Now it should work fine. I think that cf provide new challenge algorithm only for some part of users. Some domains still using old challenge algorithm as in #212

Krylanc3lo · 2019-03-30T00:26:56Z

In which file do you find atob function ?

VeNoMouS · 2019-03-30T00:27:59Z

atob is part of node https://www.npmjs.com/package/atob, @Krylanc3lo personally i back port most of the changes and still use js2py and avoid node at all costs..

Krylanc3lo · 2019-03-30T00:30:19Z

Thanks @VeNoMouS. Do you know what I have to update on js2py side ?

pawliczka · 2019-03-30T00:31:36Z

atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0] = document
+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")." = .getElementById(k).
+atob("aW5uZXJIVE1M") = innerHTML
document.getElementById(k).innerHTML

VeNoMouS · 2019-03-30T00:32:17Z

@Krylanc3lo looking into it myself

pawliczka · 2019-03-30T00:34:47Z

@VeNoMouS Could you please send me a diff or link for your fork when you are done? I'm going sleep now.

VeNoMouS · 2019-03-30T00:36:08Z

@pawliczka sweet as mate :)

VeNoMouS · 2019-03-30T05:04:13Z

lol this jsfuck is really annoying when trying to work out what its attempting to do...

ghost · 2019-03-30T07:50:01Z

@VeNoMouS I just wrote some code to take the pain out of it. codemanki/cloudscraper#170 (comment)

VeNoMouS · 2019-03-30T07:54:04Z

@pro-src im just doing the same in python ;P nice job :)

ghost · 2019-03-30T07:54:07Z

@VeNoMouS Also here is a node based definition for atob.

function atob(str) {
  return Buffer.from(str, 'base64').toString('binary');
}

VeNoMouS · 2019-03-30T07:54:40Z

ah thanks, i ended up just replacing it with regex base64'd till i got it all working

pawliczka · 2019-03-30T08:10:00Z

#206 ^^ @VeNoMouS could you please share your solution?

ghost · 2019-03-30T08:34:23Z

https://www.npmjs.com/package/cf-debug

VeNoMouS · 2019-03-30T09:17:25Z

so.... my rewrite produced this... its a bit of a hack atm...

how ever it breaks under js2py .. , im trying to work that out.

  File "/usr/local/lib/python2.7/dist-packages/js2py/base.py", line 1001, in callprop
    '%s is not a function' % cand.typeof())
js2py.internals.simplex.JsException: TypeError: 'undefined' is not a function

VeNoMouS · 2019-03-30T09:59:02Z

Ok ... i found one of the root causes of the "undefined" but still got another issue i think...

("")["italics"]() is str.italics()... js2py doesn't know how to handle it..

VeNoMouS · 2019-03-30T10:23:48Z

import logging
import random
import re
from pprint import pprint
from base64 import b64decode

from copy import deepcopy
from time import sleep

#from lib import js2py
import js2py
from lib.requests.sessions import Session

try:
    from urlparse import urlparse
except ImportError:
    from urllib.parse import urlparse

__version__ = "1.9.5"

# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""

ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""

class CloudflareScraper(Session):
    def __init__(self, *args, **kwargs):
        self.delay = kwargs.pop("delay", 8)
        super(CloudflareScraper, self).__init__(*args, **kwargs)

        if "requests" in self.headers["User-Agent"]:
            # Set a random User-Agent if no custom User-Agent has been set
            self.headers["User-Agent"] = DEFAULT_USER_AGENT

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.content
            and b"jschl_answer" in resp.content
        )

    def request(self, method, url, *args, **kwargs):
        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)

        # Check if Cloudflare anti-bot is on
        if self.is_cloudflare_challenge(resp):
            resp = self.solve_cf_challenge(resp, **kwargs)

        return resp

    def solve_cf_challenge(self, resp, **original_kwargs):
        sleep(self.delay)  # Cloudflare requires a delay before solving the challenge

        body = resp.text
        
        
        rq = re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>', body,re.MULTILINE | re.DOTALL)
        
        body = re.sub(
            r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
            "{};".format(rq.group(2)),
            body
        )
        

        parsed_url = urlparse(resp.url)
        domain = parsed_url.netloc
        
        submit_url = "%s://%s/cdn-cgi/l/chk_jschl" % (parsed_url.scheme, domain)

        cloudflare_kwargs = deepcopy(original_kwargs)
        params = cloudflare_kwargs.setdefault("params", {})
        headers = cloudflare_kwargs.setdefault("headers", {})
        headers["Referer"] = resp.url

        try:
            params["jschl_vc"] = re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)
            params["pass"] = re.search(r'name="pass" value="(.+?)"', body).group(1)
            params["s"] = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')

        except Exception as e:
            # Something is wrong with the page.
            # This may indicate Cloudflare has changed their anti-bot
            # technique. If you see this and are running the latest version,
            # please open a GitHub issue so I can update the code accordingly.
            raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))

        # Solve the Javascript challenge
        params["jschl_answer"] = self.solve_challenge(body, domain)
        pprint(params)

        # Requests transforms any request into a GET after a redirect,
        # so the redirect has to be handled manually here to allow for
        # performing other types of requests even as the first request.
        method = resp.request.method
        cloudflare_kwargs["allow_redirects"] = False
        redirect = self.request(method, submit_url, **cloudflare_kwargs)
        pprint(redirect.content)
        #exit()

        redirect_location = urlparse(redirect.headers["Location"])
        if not redirect_location.netloc:
            redirect_url = "%s://%s%s" % (parsed_url.scheme, domain, redirect_location.path)
            return self.request(method, redirect_url, **original_kwargs)
        return self.request(method, redirect.headers["Location"], **original_kwargs)

    def solve_challenge(self, body, domain):
        try:
            js = re.search(r"setTimeout\(function\(\){\s+(var "
                        "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
            
        except Exception:
            raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)

        js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
        js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))

        js = js.replace('; 121', '')

        js = js.replace('function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}', 't.charCodeAt')

    
        # Strip characters that could be used to exit the string context
        # These characters are not currently used in Cloudflare's arithmetic snippet
        js = re.sub(r"[\n\\']", "", js)
    
        if "toFixed" not in js:
            raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)

        try:
            js = "a = {}; t = \"" + domain + "\";" + js
            result = js2py.eval_js(js)
        
        except Exception:
            logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
            raise

        try:
            float(result)
        except Exception:
            raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)

        return result

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """
        scraper = cls(**kwargs)

        if sess:
            attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
            for attr in attrs:
                val = getattr(sess, attr, None)
                if val:
                    setattr(scraper, attr, val)

        return scraper


    ## Functions for integrating cloudflare-scrape with other applications and scripts

    @classmethod
    def get_tokens(cls, url, user_agent=None, **kwargs):
        scraper = cls.create_scraper()
        if user_agent:
            scraper.headers["User-Agent"] = user_agent

        try:
            resp = scraper.get(url, **kwargs)
            resp.raise_for_status()
        except Exception as e:
            logging.error("'%s' returned an error. Could not collect tokens." % url)
            raise

        domain = urlparse(resp.url).netloc
        cookie_domain = None

        for d in scraper.cookies.list_domains():
            if d.startswith(".") and d in ("." + domain):
                cookie_domain = d
                break
        else:
            raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")

        return ({
                    "__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
                    "cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
                },
                scraper.headers["User-Agent"]
               )

    @classmethod
    def get_cookie_string(cls, url, user_agent=None, **kwargs):
        """
        Convenience function for building a Cookie HTTP header value.
        """
        tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
        return "; ".join("=".join(pair) for pair in tokens.items()), user_agent

create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string

I dunno... it seems like it works... but something is wrong...

VeNoMouS · 2019-03-31T21:33:07Z

Ah... OK... so my code does work... for some reason it doen't always work on the first go... CF returns 503... but if i leave it going... it eventually gets it... weird.

Sometimes it gets it on the first go..

VeNoMouS · 2019-03-31T22:00:51Z

@Krylanc3lo ... This is purely for research.... and this is in no way ready for proper release...

But my current development is here

Krylanc3lo · 2019-03-31T22:03:23Z

Thanks a lot @VeNoMouS, I really appreciate. I will have a look

VeNoMouS · 2019-03-31T22:05:39Z

@Krylanc3lo

Request is

response = cf.get(
   'https://www.deepmovie.ch/tt1187043/',
    headers={
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'en-US,en;q=0.9',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1'
    },
    timeout=30,
    verify=False
)

Krylanc3lo · 2019-04-01T00:37:26Z

Thank you.

Where should I put the headers ?

VeNoMouS · 2019-04-01T00:37:51Z

Ok... i'm happy with the following code... this does not need the added headers... etc.. just call as you always have..

import logging
import random
import re

from copy import deepcopy
from time import sleep
from collections import OrderedDict

from lib import js2py
from requests.sessions import Session

try:
    from urlparse import urlparse
    from urlparse import urlunparse
except ImportError:
    from urllib.parse import urlparse
    from urllib.parse import urlunparse

__version__ = "1.9.5"

# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""

ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""

class CloudflareScraper(Session):
    def __init__(self, *args, **kwargs):
        self.delay = kwargs.pop("delay", 8)
        super(CloudflareScraper, self).__init__(*args, **kwargs)

        if "requests" in self.headers["User-Agent"]:
            # Set a random User-Agent if no custom User-Agent has been set
            self.headers["User-Agent"] = DEFAULT_USER_AGENT

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.text
            and b"jschl_answer" in resp.text
        )

    def request(self, method, url, *args, **kwargs):
        self.headers['Accept-Encoding'] = 'gzip, deflate'
        self.headers['Accept-Language'] = 'en-US,en;q=0.9'
        self.headers['DNT'] = '1'
        
        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
        
        # Check if Cloudflare anti-bot is on
        if self.is_cloudflare_challenge(resp):
            resp = self.solve_cf_challenge(resp, **kwargs)

        return resp

    def solve_cf_challenge(self, resp, **original_kwargs):
        body = resp.text

        self.delay = float(re.search(r"submit\(\);\r?\n\s*},\s*([0-9]+)", body).group(1)) / float(1000)
        sleep(self.delay)  # Cloudflare requires a delay before solving the challenge

        parsed_url = urlparse(resp.url)
        domain = parsed_url.netloc
        submit_url = "{}://{}/cdn-cgi/l/chk_jschl".format(parsed_url.scheme, domain)

        cloudflare_kwargs = deepcopy(original_kwargs)
        headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})

        try:
            params = cloudflare_kwargs.setdefault(
                "params", OrderedDict(
                    [
                        ('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
                        ('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
                        ('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
                    ]
                )
            )

        except Exception as e:
            # Something is wrong with the page.
            # This may indicate Cloudflare has changed their anti-bot
            # technique. If you see this and are running the latest version,
            # please open a GitHub issue so I can update the code accordingly.
            raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))

        # Solve the Javascript challenge
        params["jschl_answer"] = self.solve_challenge(body, domain)

        # Requests transforms any request into a GET after a redirect,
        # so the redirect has to be handled manually here to allow for
        # performing other types of requests even as the first request.
        method = resp.request.method
        
        cloudflare_kwargs["allow_redirects"] = False
        
        redirect = self.request(method, submit_url, **cloudflare_kwargs)
        redirect_location = urlparse(redirect.headers["Location"])

        if not redirect_location.netloc:
            redirect_url = urlunparse(
                (
                    parsed_url.scheme,
                    domain,
                    redirect_location.path,
                    redirect_location.params,
                    redirect_location.query,
                    redirect_location.fragment
                )
            )
            return self.request(method, redirect_url, **original_kwargs)
        
        return self.request(method, redirect.headers["Location"], **original_kwargs)

    def solve_challenge(self, body, domain):
        try:
            body = re.sub(
                r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!'
                '\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]'
                '\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"'
                '\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]'
                '\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
                '{};'.format(
                    re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>',
                    body,
                    re.MULTILINE | re.DOTALL).group(2)
                ),
                body
            )
            js = re.search(r"setTimeout\(function\(\){\s+(var "
                        "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
        except Exception:
            raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)

        js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
        js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))

        js = js.replace('; 121', '')

        js = js.replace(
            'function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")'
            '[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+'
            '(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}',
            't.charCodeAt'
        )

        # Strip characters that could be used to exit the string context
        # These characters are not currently used in Cloudflare's arithmetic snippet
        js = re.sub(r"[\n\\']", "", js)

        if "toFixed" not in js:
            raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)

        try:
            js = 'a = {{}}; t = "{}";{}'.format(domain, js)
            result = js2py.eval_js(js)
        except Exception:
            logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
            raise

        try:
            float(result)
        except Exception:
            raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)

        return result

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """
        scraper = cls(**kwargs)

        if sess:
            attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
            for attr in attrs:
                val = getattr(sess, attr, None)
                if val:
                    setattr(scraper, attr, val)

        return scraper


    ## Functions for integrating cloudflare-scrape with other applications and scripts

    @classmethod
    def get_tokens(cls, url, user_agent=None, **kwargs):
        scraper = cls.create_scraper()
        if user_agent:
            scraper.headers["User-Agent"] = user_agent

        try:
            resp = scraper.get(url, **kwargs)
            resp.raise_for_status()
        except Exception as e:
            logging.error("'%s' returned an error. Could not collect tokens." % url)
            raise

        domain = urlparse(resp.url).netloc
        cookie_domain = None

        for d in scraper.cookies.list_domains():
            if d.startswith(".") and d in ("." + domain):
                cookie_domain = d
                break
        else:
            raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")

        return (
            {
                "__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
                "cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
            },
            scraper.headers["User-Agent"]
        )

    @classmethod
    def get_cookie_string(cls, url, user_agent=None, **kwargs):
        """
        Convenience function for building a Cookie HTTP header value.
        """
        tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
        return "; ".join("=".join(pair) for pair in tokens.items()), user_agent

create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string

Krylanc3lo · 2019-04-01T00:39:47Z

Thank you. I am testing right now :)

Krylanc3lo · 2019-04-01T00:43:57Z

I got this error message (I am using python 3.6):

krylancelo:~/bin/dl_scripts$ python3 test.py
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print(scraper.get("https://www.xxxx.to/lecture-en-ligne/xxxx/271/").content)
  File "/home/maxx/.local/lib/python3.6/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/__init__.py", line 73, in request
    if self.is_cloudflare_challenge(resp):
  File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/__init__.py", line 61, in is_cloudflare_challenge
    and b"jschl_vc" in resp.text
TypeError: 'in <string>' requires string as left operand, not bytes

VeNoMouS · 2019-04-01T00:44:11Z

@Krylanc3lo let me know how you go :)

Krylanc3lo · 2019-04-01T00:45:11Z

It is working well with most sites except the one I tested in the above sample. very weird.

But at least it is working for everything else

VeNoMouS · 2019-04-01T00:45:17Z

@Krylanc3lo can you show me test.py?

Krylanc3lo · 2019-04-01T00:46:48Z

Sure, for the moment very simple, trying to test the new code:

import cfscrape

scraper = cfscrape.create_scraper()  # returns a CloudflareScraper instance

Working with gktorrent, not with this site

VeNoMouS · 2019-04-01T00:51:26Z

@Krylanc3lo worked fine for me on that website...

_cloudFlare() requested URL - https://www.japscan.to/lecture-en-ligne/giant-killing/271/, encounted CloudFlare DDOS Protection.. Bypassing.
test() CloudFlare DDOS Protection.. Bypassed successfully.
<!DOCTYPE html>
<html lang="fr">
<head>
        <title>Giant Killing 271 VF - Lecture en ligne | JapScan</title>
        <meta charset="utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />

        <script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
                <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0/js/bootstrap.min.js"></script>
        <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">

#!/usr/bin/python

from lib import cfscrape
from lib import requests
from pprint import pprint
import os
import sys
import re
from base64 import b64decode
class Test():
    def __init__(self):
            self.session = requests.session()
            self.funcName = lambda n=0: sys._getframe(n + 1).f_code.co_name + "()"
    
            # Load our cached session data to bypass CF delay on every request
            if os.path.exists('session.data'):
                self.session = pickle.load(open('session.data', 'rb'))
    
    
    def _cloudFlare(self, response):
            cf = cfscrape.create_scraper(sess=self.session)
    
            if cf.is_cloudflare_challenge(response):
                print("{} requested URL - {}, encounted CloudFlare DDOS Protection.. Bypassing.".format(self.funcName(), response.url))
    
                response = cf.get('https://www.japscan.to/lecture-en-ligne/giant-killing/271/', timeout=30)
            
                if not cf.is_cloudflare_challenge(response):
                    return (True, True)
    
                return (True, False)
    
            return (False, True)
        
        
    def test(self):
        ret = self.session.get('https://www.japscan.to/lecture-en-ligne/giant-killing/271/', timeout=30)
        if (True, True) == self._cloudFlare(ret):
            print("{} CloudFlare DDOS Protection.. Bypassed successfully.".format(self.funcName()))
            ret = self.session.get('https://www.japscan.to/lecture-en-ligne/giant-killing/271/', timeout=30)
            print ret.content


Test().test()

I setup the session normally and continue using the same session and cookies for all the requests..

Krylanc3lo · 2019-04-01T00:53:46Z

I will use your script and test the results. I will share them with you, of course :)

Krylanc3lo · 2019-04-01T00:56:17Z

is it possible that it is due to python3 ?

for example I got this compilation:
"Undefined variable 'pickle'"

VeNoMouS · 2019-04-01T00:57:16Z

Sorry just delete

# Load our cached session data to bypass CF delay on every request
            if os.path.exists('session.data'):
                self.session = pickle.load(open('session.data', 'rb'))

I was using it for something else... its not needed... its from one of my kodi addon's i wrote ;P so it doesn't hit CF every detection all the time.

Krylanc3lo · 2019-04-01T01:02:47Z

Thanks, here is the result:

krylancelo:~/bin/dl_scripts$ python3 test.py
Traceback (most recent call last):
  File "test.py", line 40, in <module>
    Test().test()
  File "test.py", line 34, in test
    if (True, True) == self._cloudFlare(ret):
  File "test.py", line 19, in _cloudFlare
    if cf.is_cloudflare_challenge(response):
  File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/__init__.py", line 61, in is_cloudflare_challenge
    and b"jschl_vc" in resp.text
TypeError: 'in <string>' requires string as left operand, not bytes

So I copied your first script in /home/maxx/.local/lib/python3.6/site-packages/cfscrape/init.py
and used the second one in test.py.

I must have done something incorrectly

VeNoMouS · 2019-04-01T01:04:28Z

Can you add a print resp.content before the return in is_cloudflare_challenge, so we can see what the content is please..

Krylanc3lo · 2019-04-01T01:08:36Z

Of course. here is the result (Thanks again for all your help). Maybe my IP is blocked from their side ?

b'\n\n\n \n \n \n \n \n <title>Just a moment...</title>\n <style type="text/css">\n html, body {width: 100%; height: 100%; margin: 0; padding: 0;}\n body {background-color: #ffffff; font-family: Helvetica, Arial, sans-serif; font-size: 100%;}\n h1 {font-size: 1.5em; color: #404040; text-align: center;}\n p {font-size: 1em; color: #404040; text-align: center; margin: 10px 0 0 0;}\n #spinner {margin: 0 auto 30px auto; display: block;}\n .attribution {margin-top: 20px;}\n @-webkit-keyframes bubbles { 33%: { -webkit-transform: translateY(10px); transform: translateY(10px); } 66% { -webkit-transform: translateY(-10px); transform: translateY(-10px); } 100% { -webkit-transform: translateY(0); transform: translateY(0); } }\n @Keyframes bubbles { 33%: { -webkit-transform: translateY(10px); transform: translateY(10px); } 66% { -webkit-transform: translateY(-10px); transform: translateY(-10px); } 100% { -webkit-transform: translateY(0); transform: translateY(0); } }\n .bubbles { background-color: #404040; width:15px; height: 15px; margin:2px; border-radius:100%; -webkit-animation:bubbles 0.6s 0.07s infinite ease-in-out; animation:bubbles 0.6s 0.07s infinite ease-in-out; -webkit-animation-fill-mode:both; animation-fill-mode:both; display:inline-block; }\n </style>\n\n <script type="text/javascript">\n //x";\n t = t.firstChild.href;r = t.match(/https?:\\/\\//)[0];\n t = t.substr(r.length); t = t.substr(0,t.length-1); k = \'cf-dn-eqWSy\';\n a = document.getElementById(\'jschl-answer\');\n f = document.getElementById(\'challenge-form\');\n ;tIOFsse.zUSDWywR-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![])+(!+[]+!![])+(!+[]+!![])+(+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=function(p){var p = eval(eval(atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")."+atob("aW5uZXJIVE1M"))); return +(p)}();tIOFsse.zUSDWywR-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+[])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+[])+(+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+[])+(+[])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]))/(+(+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])))+(function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}(+((+!![]+[])+(+!![])))));tIOFsse.zUSDWywR-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(+[])+(+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((+!![]+[])+(+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));a.value = (+tIOFsse.zUSDWywR).toFixed(10); \'; 121\'\n f.action += location.hash;\n f.submit();\n }, 4000);\n }, false);\n })();\n //]]>\n</script>\n\n\n\n\n

\n \n \n \n \n

\n \n Please turn JavaScript on and reload the page. \n \n \n \n \n \n \n \n Checking your browser before accessing xxxx.to. \n \n This process is automatic. Your browser will redirect to your requested content shortly. \n Please allow up to 5 seconds… \n \n \n \n \n \n \n \n \n \n +((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![])) \n \n \n\n \n \n DDoS protection by Cloudflare\n \n Ray ID: 4c06a97d2cffccc4\n \n

\n\n\n'
Traceback (most recent call last):
File "test.py", line 40, in
Test().test()
File "test.py", line 34, in test
if (True, True) == self._cloudFlare(ret):
File "test.py", line 19, in _cloudFlare
if cf.is_cloudflare_challenge(response):
File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/init.py", line 63, in is_cloudflare_challenge
and b"jschl_vc" in resp.text
TypeError: 'in ' requires string as left operand, not bytes

VeNoMouS · 2019-04-01T01:10:08Z

Sorry I was playing with keep-alives ages ago and i changed the content to text try change is_cloudflare_challenge to the following..

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.content
            and b"jschl_answer" in resp.content
        )

Krylanc3lo · 2019-04-01T01:12:30Z

Working like a charm! thanks a lot @VeNoMouS. it is amazing :)

I will test it in my script and confirm everything is working well.

VeNoMouS · 2019-04-01T01:14:42Z

Ok those changes are as follows... latest "production" code from my rewrite..

import logging
import random
import re

from copy import deepcopy
from time import sleep
from collections import OrderedDict

import js2py
from requests.sessions import Session

try:
    from urlparse import urlparse
    from urlparse import urlunparse
except ImportError:
    from urllib.parse import urlparse
    from urllib.parse import urlunparse

__version__ = "1.9.5"

# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""

ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""

class CloudflareScraper(Session):
    def __init__(self, *args, **kwargs):
        self.delay = kwargs.pop("delay", 8)
        super(CloudflareScraper, self).__init__(*args, **kwargs)

        if "requests" in self.headers["User-Agent"]:
            # Set a random User-Agent if no custom User-Agent has been set
            self.headers["User-Agent"] = DEFAULT_USER_AGENT

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.content
            and b"jschl_answer" in resp.content
        )

    def request(self, method, url, *args, **kwargs):
        self.headers['Accept-Encoding'] = 'gzip, deflate'
        self.headers['Accept-Language'] = 'en-US,en;q=0.9'
        self.headers['DNT'] = '1'
        
        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
        
        # Check if Cloudflare anti-bot is on
        if self.is_cloudflare_challenge(resp):
            resp = self.solve_cf_challenge(resp, **kwargs)

        return resp

    def solve_cf_challenge(self, resp, **original_kwargs):
        body = resp.text

        self.delay = float(re.search(r"submit\(\);\r?\n\s*},\s*([0-9]+)", body).group(1)) / float(1000)
        sleep(self.delay)  # Cloudflare requires a delay before solving the challenge

        parsed_url = urlparse(resp.url)
        domain = parsed_url.netloc
        submit_url = "{}://{}/cdn-cgi/l/chk_jschl".format(parsed_url.scheme, domain)

        cloudflare_kwargs = deepcopy(original_kwargs)
        headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})
        
        try:
            params = cloudflare_kwargs.setdefault(
                "params", OrderedDict(
                    [
                        ('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
                        ('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
                        ('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
                    ]
                )
            )

        except Exception as e:
            # Something is wrong with the page.
            # This may indicate Cloudflare has changed their anti-bot
            # technique. If you see this and are running the latest version,
            # please open a GitHub issue so I can update the code accordingly.
            raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))

        # Solve the Javascript challenge
        params["jschl_answer"] = self.solve_challenge(body, domain)

        # Requests transforms any request into a GET after a redirect,
        # so the redirect has to be handled manually here to allow for
        # performing other types of requests even as the first request.
        method = resp.request.method
        
        cloudflare_kwargs["allow_redirects"] = False
        
        redirect = self.request(method, submit_url, **cloudflare_kwargs)
        redirect_location = urlparse(redirect.headers["Location"])

        if not redirect_location.netloc:
            redirect_url = urlunparse(
                (
                    parsed_url.scheme,
                    domain,
                    redirect_location.path,
                    redirect_location.params,
                    redirect_location.query,
                    redirect_location.fragment
                )
            )
            return self.request(method, redirect_url, **original_kwargs)
        
        return self.request(method, redirect.headers["Location"], **original_kwargs)

    def solve_challenge(self, body, domain):
        try:
            body = re.sub(
                r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!'
                '\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]'
                '\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"'
                '\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]'
                '\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
                '{};'.format(
                    re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>',
                    body,
                    re.MULTILINE | re.DOTALL).group(2)
                ),
                body
            )
            js = re.search(r"setTimeout\(function\(\){\s+(var "
                        "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
        except Exception:
            raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)

        js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
        js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))

        js = js.replace('; 121', '')

        js = js.replace(
            'function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")'
            '[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+'
            '(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}',
            't.charCodeAt'
        )

        # Strip characters that could be used to exit the string context
        # These characters are not currently used in Cloudflare's arithmetic snippet
        js = re.sub(r"[\n\\']", "", js)

        if "toFixed" not in js:
            raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)

        try:
            js = 'a = {{}}; t = "{}";{}'.format(domain, js)
            result = js2py.eval_js(js)
        except Exception:
            logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
            raise

        try:
            float(result)
        except Exception:
            raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)

        return result

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """
        scraper = cls(**kwargs)

        if sess:
            attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
            for attr in attrs:
                val = getattr(sess, attr, None)
                if val:
                    setattr(scraper, attr, val)

        return scraper


    ## Functions for integrating cloudflare-scrape with other applications and scripts

    @classmethod
    def get_tokens(cls, url, user_agent=None, **kwargs):
        scraper = cls.create_scraper()
        if user_agent:
            scraper.headers["User-Agent"] = user_agent

        try:
            resp = scraper.get(url, **kwargs)
            resp.raise_for_status()
        except Exception as e:
            logging.error("'%s' returned an error. Could not collect tokens." % url)
            raise

        domain = urlparse(resp.url).netloc
        cookie_domain = None

        for d in scraper.cookies.list_domains():
            if d.startswith(".") and d in ("." + domain):
                cookie_domain = d
                break
        else:
            raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")

        return (
            {
                "__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
                "cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
            },
            scraper.headers["User-Agent"]
        )

    @classmethod
    def get_cookie_string(cls, url, user_agent=None, **kwargs):
        """
        Convenience function for building a Cookie HTTP header value.
        """
        tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
        return "; ".join("=".join(pair) for pair in tokens.items()), user_agent

create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string

ghost · 2019-04-01T02:13:05Z

@VeNoMouS This issue has a lot of comments and whats worse is you're not using a pastebin. I see some issues with your code and I could comment on specific lines if would fork this repo to share a link or created a gist.

You've done a great job getting this to work with js2py which makes your library a perfect solution for a lot of projects that are out there. 💯

Do you intend on sending a PR and/or maintaining a fork?

VeNoMouS · 2019-04-01T02:23:21Z

I'll do a repo for cloudflare-scape using js2py :)

VeNoMouS · 2019-04-01T02:35:16Z

Ok guys... for the time being i've created This repo to split off from this thread.

Krylanc3lo · 2019-04-01T02:43:53Z

Thanks again! I am closing this thread then.

THANK YOU ALL for your help :)

pawliczka mentioned this issue Mar 29, 2019

TypeError: Cannot read property 'charCodeAt' of undefined #212

Closed

ghost mentioned this issue Mar 31, 2019

(WIP) Add Accept-Encoding header codemanki/cloudscraper#175

Closed

Krylanc3lo mentioned this issue Apr 1, 2019

ReferenceError: atob is not defined pavlodvornikov/aiocfscrape#15

Closed

Krylanc3lo closed this as completed Apr 1, 2019

VeNoMouS mentioned this issue Apr 1, 2019

Pure Python CF parser #218

Closed

ReferenceError: atob is not defined #215

ReferenceError: atob is not defined #215

Comments

Krylanc3lo commented Mar 29, 2019

pawliczka commented Mar 29, 2019

pawliczka commented Mar 29, 2019 • edited

Krylanc3lo commented Mar 29, 2019

pawliczka commented Mar 29, 2019 • edited

lukastribus commented Mar 29, 2019 • edited

Krylanc3lo commented Mar 29, 2019

pawliczka commented Mar 29, 2019 • edited

pawliczka commented Mar 29, 2019

pawliczka commented Mar 29, 2019

Krylanc3lo commented Mar 29, 2019

VeNoMouS commented Mar 29, 2019

pawliczka commented Mar 30, 2019 • edited

Krylanc3lo commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019 • edited

Krylanc3lo commented Mar 30, 2019

pawliczka commented Mar 30, 2019 • edited

VeNoMouS commented Mar 30, 2019

pawliczka commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019 • edited

ghost commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019

ghost commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019

pawliczka commented Mar 30, 2019 • edited

ghost commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019 • edited

VeNoMouS commented Mar 30, 2019

VeNoMouS commented Mar 30, 2019

VeNoMouS commented Mar 31, 2019 • edited

VeNoMouS commented Mar 31, 2019

Krylanc3lo commented Mar 31, 2019

VeNoMouS commented Mar 31, 2019 • edited

Krylanc3lo commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019 • edited

Krylanc3lo commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019 • edited

VeNoMouS commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019 • edited

VeNoMouS commented Apr 1, 2019 • edited

Krylanc3lo commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019 • edited

Krylanc3lo commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019 • edited

Please turn JavaScript on and reload the page.

Checking your browser before accessing xxxx.to.

VeNoMouS commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019

ghost commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019

VeNoMouS commented Apr 1, 2019

Krylanc3lo commented Apr 1, 2019

pawliczka commented Mar 29, 2019 •

edited

pawliczka commented Mar 29, 2019 •

edited

lukastribus commented Mar 29, 2019 •

edited

pawliczka commented Mar 29, 2019 •

edited

pawliczka commented Mar 30, 2019 •

edited

VeNoMouS commented Mar 30, 2019 •

edited

pawliczka commented Mar 30, 2019 •

edited

VeNoMouS commented Mar 30, 2019 •

edited

pawliczka commented Mar 30, 2019 •

edited

VeNoMouS commented Mar 30, 2019 •

edited

VeNoMouS commented Mar 31, 2019 •

edited

VeNoMouS commented Mar 31, 2019 •

edited

VeNoMouS commented Apr 1, 2019 •

edited

Krylanc3lo commented Apr 1, 2019 •

edited

Krylanc3lo commented Apr 1, 2019 •

edited

VeNoMouS commented Apr 1, 2019 •

edited

VeNoMouS commented Apr 1, 2019 •

edited

Krylanc3lo commented Apr 1, 2019 •

edited