Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReferenceError: atob is not defined #215

Closed
Krylanc3lo opened this issue Mar 29, 2019 · 115 comments
Closed

ReferenceError: atob is not defined #215

Krylanc3lo opened this issue Mar 29, 2019 · 115 comments

Comments

@Krylanc3lo
Copy link

Hello,

I got the below error since a couple of days: js2py.internals.simplex.JsException: ReferenceError: atob is not defined

File "/home/maxx/.local/lib/python3.6/site-packages/js2py/base.py", line 1074, in get
return self.prototype.get(prop, throw)
File "/home/maxx/.local/lib/python3.6/site-packages/js2py/base.py", line 1079, in get
raise MakeError('ReferenceError', '%s is not defined' % prop)
js2py.internals.simplex.JsException: ReferenceError: atob is not defined

Anyone is experiencing the same ?

Thank you!

@pawliczka
Copy link

I think that it is releated to #212

@pawliczka
Copy link

pawliczka commented Mar 29, 2019

i think that we have to add at the beginning of the node -e command

if (typeof atob === 'undefined') {
  global.atob = function (b64Encoded) {
    return new Buffer(b64Encoded, 'base64').toString('binary');
  };
}

`$ node -e "global.Buffer = global.Buffer || require('buffer').Buffer;if (typeof atob === 'undefined') {global.atob = function (str) {return new Buffer(str, 'base64').toString('binary');};}console.log(atob('Hello'));"

ée
`

@Krylanc3lo
Copy link
Author

Thank you Pawel, I will try it

You mean adding this code in base.py script ?

@pawliczka
Copy link

pawliczka commented Mar 29, 2019

We have to edit this line
js = "console.log(require('vm').runInNewContext('%s', Object.create(null), {timeout: 5000}));" % js
I will create pull request when I back home

@lukastribus
Copy link
Contributor

lukastribus commented Mar 29, 2019

js2py has not been used for a long time. First of all, update your code please (and install node).

@Krylanc3lo
Copy link
Author

Thanks Lukas for the suggestion, using node & updating the code led to the same error but at least I am using the latest version:

ReferenceError: atob is not defined
at evalmachine.:1:609
at evalmachine.:1:908
at ContextifyScript.Script.runInContext (vm.js:59:29)
at ContextifyScript.Script.runInNewContext (vm.js:65:15)
at Object.runInNewContext (vm.js:135:38)
at [eval]:1:27
at ContextifyScript.Script.runInThisContext (vm.js:50:33)
at Object.runInThisContext (vm.js:139:38)
at Object. ([eval]-wrapper:6:22)
at Module._compile (module.js:652:30)
ERROR:root:Error executing Cloudflare IUAM Javascript. Cloudflare may have changed their technique, or there may be a bug in the script.

I will try to implement Pawel's suggestion

@pawliczka
Copy link

pawliczka commented Mar 29, 2019

It is more complicated than i thought before. But we can replace content between atob("ZG9jdW1l") and atob("aW5uZXJIVE1M") ('document.getElementById(k).innerHTML') with the text defined under html element with id defined by k variable

@pawliczka
Copy link

#212 is coused by this function which returns ASCI code of letter at t[p]

`(function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}(+((!+[]+!![]+!![]+[])))) `

And we got 'Cannot read property 'charCodeAt' of undefined' because we are not passing t variable to nodejs call

@pawliczka
Copy link

Ok. I finally got it. I will provide code tomorrow.

@Krylanc3lo
Copy link
Author

Great! looking forward to it.

Thanks again Pawel!

@VeNoMouS
Copy link

@pawliczka have you got any pseudo code that we can implement in the mean time for those with projects that relay on bypassing cf?

@pawliczka
Copy link

pawliczka commented Mar 30, 2019

To solve the problem with undefined atob you have to replace atob("ZG9jdW1l")....atob("aW5uZXJIVE1M") for me:
atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")."+atob("aW5uZXJIVE1M")
with data under element defined by variable k (for me k = 'cf-dn-lZTYtMjTTnWU';) and
<div style="display:none;visibility:hidden;" id="cf-dn-lZTYtMjTTnWU">+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((+!![]+[])+(+!![])+(+[])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))</div>
Also you have to solve the problem with TypeError: Cannot read property 'charCodeAt' of undefined and a.value you can do it this way:
js = js.replace('a.value','a')
js = js.replace("; 121",'')
js = "console.log(require('vm').runInNewContext('var a; var t = \"%s\";%s', Object.create(null), {timeout: 5000}));" % (domain, js)
Now it should work fine. I think that cf provide new challenge algorithm only for some part of users. Some domains still using old challenge algorithm as in #212

@Krylanc3lo
Copy link
Author

In which file do you find atob function ?

@VeNoMouS
Copy link

VeNoMouS commented Mar 30, 2019

atob is part of node https://www.npmjs.com/package/atob, @Krylanc3lo personally i back port most of the changes and still use js2py and avoid node at all costs..

@Krylanc3lo
Copy link
Author

Thanks @VeNoMouS. Do you know what I have to update on js2py side ?

@pawliczka
Copy link

pawliczka commented Mar 30, 2019

atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0] = document
+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")." = .getElementById(k).
+atob("aW5uZXJIVE1M") = innerHTML
document.getElementById(k).innerHTML

@VeNoMouS
Copy link

@Krylanc3lo looking into it myself

@pawliczka
Copy link

@VeNoMouS Could you please send me a diff or link for your fork when you are done? I'm going sleep now.

@VeNoMouS
Copy link

@pawliczka sweet as mate :)

@VeNoMouS
Copy link

VeNoMouS commented Mar 30, 2019

lol this jsfuck is really annoying when trying to work out what its attempting to do...

@ghost
Copy link

ghost commented Mar 30, 2019

@VeNoMouS I just wrote some code to take the pain out of it. codemanki/cloudscraper#170 (comment)

@VeNoMouS
Copy link

@pro-src im just doing the same in python ;P nice job :)

@ghost
Copy link

ghost commented Mar 30, 2019

@VeNoMouS Also here is a node based definition for atob.

function atob(str) {
  return Buffer.from(str, 'base64').toString('binary');
}

@VeNoMouS
Copy link

ah thanks, i ended up just replacing it with regex base64'd till i got it all working

@pawliczka
Copy link

pawliczka commented Mar 30, 2019

#206 ^^ @VeNoMouS could you please share your solution?

@ghost
Copy link

ghost commented Mar 30, 2019

https://www.npmjs.com/package/cf-debug

@VeNoMouS
Copy link

VeNoMouS commented Mar 30, 2019

so.... my rewrite produced this... its a bit of a hack atm...

image

how ever it breaks under js2py .. , im trying to work that out.

  File "/usr/local/lib/python2.7/dist-packages/js2py/base.py", line 1001, in callprop
    '%s is not a function' % cand.typeof())
js2py.internals.simplex.JsException: TypeError: 'undefined' is not a function

@VeNoMouS
Copy link

Ok ... i found one of the root causes of the "undefined" but still got another issue i think...

("")["italics"]() is str.italics()... js2py doesn't know how to handle it..

@VeNoMouS
Copy link

import logging
import random
import re
from pprint import pprint
from base64 import b64decode

from copy import deepcopy
from time import sleep

#from lib import js2py
import js2py
from lib.requests.sessions import Session

try:
    from urlparse import urlparse
except ImportError:
    from urllib.parse import urlparse

__version__ = "1.9.5"

# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""

ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""

class CloudflareScraper(Session):
    def __init__(self, *args, **kwargs):
        self.delay = kwargs.pop("delay", 8)
        super(CloudflareScraper, self).__init__(*args, **kwargs)

        if "requests" in self.headers["User-Agent"]:
            # Set a random User-Agent if no custom User-Agent has been set
            self.headers["User-Agent"] = DEFAULT_USER_AGENT

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.content
            and b"jschl_answer" in resp.content
        )

    def request(self, method, url, *args, **kwargs):
        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)

        # Check if Cloudflare anti-bot is on
        if self.is_cloudflare_challenge(resp):
            resp = self.solve_cf_challenge(resp, **kwargs)

        return resp

    def solve_cf_challenge(self, resp, **original_kwargs):
        sleep(self.delay)  # Cloudflare requires a delay before solving the challenge

        body = resp.text
        
        
        rq = re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>', body,re.MULTILINE | re.DOTALL)
        
        body = re.sub(
            r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
            "{};".format(rq.group(2)),
            body
        )
        

        parsed_url = urlparse(resp.url)
        domain = parsed_url.netloc
        
        submit_url = "%s://%s/cdn-cgi/l/chk_jschl" % (parsed_url.scheme, domain)

        cloudflare_kwargs = deepcopy(original_kwargs)
        params = cloudflare_kwargs.setdefault("params", {})
        headers = cloudflare_kwargs.setdefault("headers", {})
        headers["Referer"] = resp.url

        try:
            params["jschl_vc"] = re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)
            params["pass"] = re.search(r'name="pass" value="(.+?)"', body).group(1)
            params["s"] = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')

        except Exception as e:
            # Something is wrong with the page.
            # This may indicate Cloudflare has changed their anti-bot
            # technique. If you see this and are running the latest version,
            # please open a GitHub issue so I can update the code accordingly.
            raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))

        # Solve the Javascript challenge
        params["jschl_answer"] = self.solve_challenge(body, domain)
        pprint(params)

        # Requests transforms any request into a GET after a redirect,
        # so the redirect has to be handled manually here to allow for
        # performing other types of requests even as the first request.
        method = resp.request.method
        cloudflare_kwargs["allow_redirects"] = False
        redirect = self.request(method, submit_url, **cloudflare_kwargs)
        pprint(redirect.content)
        #exit()

        redirect_location = urlparse(redirect.headers["Location"])
        if not redirect_location.netloc:
            redirect_url = "%s://%s%s" % (parsed_url.scheme, domain, redirect_location.path)
            return self.request(method, redirect_url, **original_kwargs)
        return self.request(method, redirect.headers["Location"], **original_kwargs)

    def solve_challenge(self, body, domain):
        try:
            js = re.search(r"setTimeout\(function\(\){\s+(var "
                        "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
            
        except Exception:
            raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)

        js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
        js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))

        js = js.replace('; 121', '')

        js = js.replace('function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}', 't.charCodeAt')

    
        # Strip characters that could be used to exit the string context
        # These characters are not currently used in Cloudflare's arithmetic snippet
        js = re.sub(r"[\n\\']", "", js)
    
        if "toFixed" not in js:
            raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)

        try:
            js = "a = {}; t = \"" + domain + "\";" + js
            result = js2py.eval_js(js)
        
        except Exception:
            logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
            raise

        try:
            float(result)
        except Exception:
            raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)

        return result

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """
        scraper = cls(**kwargs)

        if sess:
            attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
            for attr in attrs:
                val = getattr(sess, attr, None)
                if val:
                    setattr(scraper, attr, val)

        return scraper


    ## Functions for integrating cloudflare-scrape with other applications and scripts

    @classmethod
    def get_tokens(cls, url, user_agent=None, **kwargs):
        scraper = cls.create_scraper()
        if user_agent:
            scraper.headers["User-Agent"] = user_agent

        try:
            resp = scraper.get(url, **kwargs)
            resp.raise_for_status()
        except Exception as e:
            logging.error("'%s' returned an error. Could not collect tokens." % url)
            raise

        domain = urlparse(resp.url).netloc
        cookie_domain = None

        for d in scraper.cookies.list_domains():
            if d.startswith(".") and d in ("." + domain):
                cookie_domain = d
                break
        else:
            raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")

        return ({
                    "__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
                    "cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
                },
                scraper.headers["User-Agent"]
               )

    @classmethod
    def get_cookie_string(cls, url, user_agent=None, **kwargs):
        """
        Convenience function for building a Cookie HTTP header value.
        """
        tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
        return "; ".join("=".join(pair) for pair in tokens.items()), user_agent

create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string

I dunno... it seems like it works... but something is wrong...

@VeNoMouS
Copy link

VeNoMouS commented Mar 31, 2019

Ah... OK... so my code does work... for some reason it doen't always work on the first go... CF returns 503... but if i leave it going... it eventually gets it... weird.

Sometimes it gets it on the first go..

@VeNoMouS
Copy link

@Krylanc3lo ... This is purely for research.... and this is in no way ready for proper release...

But my current development is here

@Krylanc3lo
Copy link
Author

Thanks a lot @VeNoMouS, I really appreciate. I will have a look

@VeNoMouS
Copy link

VeNoMouS commented Mar 31, 2019

@Krylanc3lo

Request is

response = cf.get(
   'https://www.deepmovie.ch/tt1187043/',
    headers={
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'en-US,en;q=0.9',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1'
    },
    timeout=30,
    verify=False
)

@Krylanc3lo
Copy link
Author

Thank you.

Where should I put the headers ?

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

Ok... i'm happy with the following code... this does not need the added headers... etc.. just call as you always have..

import logging
import random
import re

from copy import deepcopy
from time import sleep
from collections import OrderedDict

from lib import js2py
from requests.sessions import Session

try:
    from urlparse import urlparse
    from urlparse import urlunparse
except ImportError:
    from urllib.parse import urlparse
    from urllib.parse import urlunparse

__version__ = "1.9.5"

# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""

ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""

class CloudflareScraper(Session):
    def __init__(self, *args, **kwargs):
        self.delay = kwargs.pop("delay", 8)
        super(CloudflareScraper, self).__init__(*args, **kwargs)

        if "requests" in self.headers["User-Agent"]:
            # Set a random User-Agent if no custom User-Agent has been set
            self.headers["User-Agent"] = DEFAULT_USER_AGENT

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.text
            and b"jschl_answer" in resp.text
        )

    def request(self, method, url, *args, **kwargs):
        self.headers['Accept-Encoding'] = 'gzip, deflate'
        self.headers['Accept-Language'] = 'en-US,en;q=0.9'
        self.headers['DNT'] = '1'
        
        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
        
        # Check if Cloudflare anti-bot is on
        if self.is_cloudflare_challenge(resp):
            resp = self.solve_cf_challenge(resp, **kwargs)

        return resp

    def solve_cf_challenge(self, resp, **original_kwargs):
        body = resp.text

        self.delay = float(re.search(r"submit\(\);\r?\n\s*},\s*([0-9]+)", body).group(1)) / float(1000)
        sleep(self.delay)  # Cloudflare requires a delay before solving the challenge

        parsed_url = urlparse(resp.url)
        domain = parsed_url.netloc
        submit_url = "{}://{}/cdn-cgi/l/chk_jschl".format(parsed_url.scheme, domain)

        cloudflare_kwargs = deepcopy(original_kwargs)
        headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})

        try:
            params = cloudflare_kwargs.setdefault(
                "params", OrderedDict(
                    [
                        ('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
                        ('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
                        ('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
                    ]
                )
            )

        except Exception as e:
            # Something is wrong with the page.
            # This may indicate Cloudflare has changed their anti-bot
            # technique. If you see this and are running the latest version,
            # please open a GitHub issue so I can update the code accordingly.
            raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))

        # Solve the Javascript challenge
        params["jschl_answer"] = self.solve_challenge(body, domain)

        # Requests transforms any request into a GET after a redirect,
        # so the redirect has to be handled manually here to allow for
        # performing other types of requests even as the first request.
        method = resp.request.method
        
        cloudflare_kwargs["allow_redirects"] = False
        
        redirect = self.request(method, submit_url, **cloudflare_kwargs)
        redirect_location = urlparse(redirect.headers["Location"])

        if not redirect_location.netloc:
            redirect_url = urlunparse(
                (
                    parsed_url.scheme,
                    domain,
                    redirect_location.path,
                    redirect_location.params,
                    redirect_location.query,
                    redirect_location.fragment
                )
            )
            return self.request(method, redirect_url, **original_kwargs)
        
        return self.request(method, redirect.headers["Location"], **original_kwargs)

    def solve_challenge(self, body, domain):
        try:
            body = re.sub(
                r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!'
                '\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]'
                '\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"'
                '\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]'
                '\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
                '{};'.format(
                    re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>',
                    body,
                    re.MULTILINE | re.DOTALL).group(2)
                ),
                body
            )
            js = re.search(r"setTimeout\(function\(\){\s+(var "
                        "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
        except Exception:
            raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)

        js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
        js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))

        js = js.replace('; 121', '')

        js = js.replace(
            'function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")'
            '[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+'
            '(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}',
            't.charCodeAt'
        )

        # Strip characters that could be used to exit the string context
        # These characters are not currently used in Cloudflare's arithmetic snippet
        js = re.sub(r"[\n\\']", "", js)

        if "toFixed" not in js:
            raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)

        try:
            js = 'a = {{}}; t = "{}";{}'.format(domain, js)
            result = js2py.eval_js(js)
        except Exception:
            logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
            raise

        try:
            float(result)
        except Exception:
            raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)

        return result

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """
        scraper = cls(**kwargs)

        if sess:
            attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
            for attr in attrs:
                val = getattr(sess, attr, None)
                if val:
                    setattr(scraper, attr, val)

        return scraper


    ## Functions for integrating cloudflare-scrape with other applications and scripts

    @classmethod
    def get_tokens(cls, url, user_agent=None, **kwargs):
        scraper = cls.create_scraper()
        if user_agent:
            scraper.headers["User-Agent"] = user_agent

        try:
            resp = scraper.get(url, **kwargs)
            resp.raise_for_status()
        except Exception as e:
            logging.error("'%s' returned an error. Could not collect tokens." % url)
            raise

        domain = urlparse(resp.url).netloc
        cookie_domain = None

        for d in scraper.cookies.list_domains():
            if d.startswith(".") and d in ("." + domain):
                cookie_domain = d
                break
        else:
            raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")

        return (
            {
                "__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
                "cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
            },
            scraper.headers["User-Agent"]
        )

    @classmethod
    def get_cookie_string(cls, url, user_agent=None, **kwargs):
        """
        Convenience function for building a Cookie HTTP header value.
        """
        tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
        return "; ".join("=".join(pair) for pair in tokens.items()), user_agent

create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string

@Krylanc3lo
Copy link
Author

Thank you. I am testing right now :)

@Krylanc3lo
Copy link
Author

Krylanc3lo commented Apr 1, 2019

I got this error message (I am using python 3.6):

krylancelo:~/bin/dl_scripts$ python3 test.py
Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print(scraper.get("https://www.xxxx.to/lecture-en-ligne/xxxx/271/").content)
  File "/home/maxx/.local/lib/python3.6/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/__init__.py", line 73, in request
    if self.is_cloudflare_challenge(resp):
  File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/__init__.py", line 61, in is_cloudflare_challenge
    and b"jschl_vc" in resp.text
TypeError: 'in <string>' requires string as left operand, not bytes

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

@Krylanc3lo let me know how you go :)

@Krylanc3lo
Copy link
Author

It is working well with most sites except the one I tested in the above sample. very weird.

But at least it is working for everything else

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

@Krylanc3lo can you show me test.py?

@Krylanc3lo
Copy link
Author

Krylanc3lo commented Apr 1, 2019

Sure, for the moment very simple, trying to test the new code:

import cfscrape

scraper = cfscrape.create_scraper()  # returns a CloudflareScraper instance

Working with gktorrent, not with this site

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

@Krylanc3lo worked fine for me on that website...

_cloudFlare() requested URL - https://www.japscan.to/lecture-en-ligne/giant-killing/271/, encounted CloudFlare DDOS Protection.. Bypassing.
test() CloudFlare DDOS Protection.. Bypassed successfully.
<!DOCTYPE html>
<html lang="fr">
<head>
        <title>Giant Killing 271 VF - Lecture en ligne | JapScan</title>
        <meta charset="utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />

        <script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
        <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
                <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0/js/bootstrap.min.js"></script>
        <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
#!/usr/bin/python

from lib import cfscrape
from lib import requests
from pprint import pprint
import os
import sys
import re
from base64 import b64decode
class Test():
    def __init__(self):
            self.session = requests.session()
            self.funcName = lambda n=0: sys._getframe(n + 1).f_code.co_name + "()"
    
            # Load our cached session data to bypass CF delay on every request
            if os.path.exists('session.data'):
                self.session = pickle.load(open('session.data', 'rb'))
    
    
    def _cloudFlare(self, response):
            cf = cfscrape.create_scraper(sess=self.session)
    
            if cf.is_cloudflare_challenge(response):
                print("{} requested URL - {}, encounted CloudFlare DDOS Protection.. Bypassing.".format(self.funcName(), response.url))
    
                response = cf.get('https://www.japscan.to/lecture-en-ligne/giant-killing/271/', timeout=30)
            
                if not cf.is_cloudflare_challenge(response):
                    return (True, True)
    
                return (True, False)
    
            return (False, True)
        
        
    def test(self):
        ret = self.session.get('https://www.japscan.to/lecture-en-ligne/giant-killing/271/', timeout=30)
        if (True, True) == self._cloudFlare(ret):
            print("{} CloudFlare DDOS Protection.. Bypassed successfully.".format(self.funcName()))
            ret = self.session.get('https://www.japscan.to/lecture-en-ligne/giant-killing/271/', timeout=30)
            print ret.content


Test().test()

I setup the session normally and continue using the same session and cookies for all the requests..

@Krylanc3lo
Copy link
Author

I will use your script and test the results. I will share them with you, of course :)

@Krylanc3lo
Copy link
Author

is it possible that it is due to python3 ?

for example I got this compilation:
"Undefined variable 'pickle'"

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

Sorry just delete

# Load our cached session data to bypass CF delay on every request
            if os.path.exists('session.data'):
                self.session = pickle.load(open('session.data', 'rb'))

I was using it for something else... its not needed... its from one of my kodi addon's i wrote ;P so it doesn't hit CF every detection all the time.

@Krylanc3lo
Copy link
Author

Thanks, here is the result:

krylancelo:~/bin/dl_scripts$ python3 test.py
Traceback (most recent call last):
  File "test.py", line 40, in <module>
    Test().test()
  File "test.py", line 34, in test
    if (True, True) == self._cloudFlare(ret):
  File "test.py", line 19, in _cloudFlare
    if cf.is_cloudflare_challenge(response):
  File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/__init__.py", line 61, in is_cloudflare_challenge
    and b"jschl_vc" in resp.text
TypeError: 'in <string>' requires string as left operand, not bytes

So I copied your first script in /home/maxx/.local/lib/python3.6/site-packages/cfscrape/init.py
and used the second one in test.py.

I must have done something incorrectly

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

Can you add a print resp.content before the return in is_cloudflare_challenge, so we can see what the content is please..

@Krylanc3lo
Copy link
Author

Krylanc3lo commented Apr 1, 2019

Of course. here is the result (Thanks again for all your help). Maybe my IP is blocked from their side ?

b'\n\n\n \n \n \n \n \n <title>Just a moment...</title>\n <style type="text/css">\n html, body {width: 100%; height: 100%; margin: 0; padding: 0;}\n body {background-color: #ffffff; font-family: Helvetica, Arial, sans-serif; font-size: 100%;}\n h1 {font-size: 1.5em; color: #404040; text-align: center;}\n p {font-size: 1em; color: #404040; text-align: center; margin: 10px 0 0 0;}\n #spinner {margin: 0 auto 30px auto; display: block;}\n .attribution {margin-top: 20px;}\n @-webkit-keyframes bubbles { 33%: { -webkit-transform: translateY(10px); transform: translateY(10px); } 66% { -webkit-transform: translateY(-10px); transform: translateY(-10px); } 100% { -webkit-transform: translateY(0); transform: translateY(0); } }\n @Keyframes bubbles { 33%: { -webkit-transform: translateY(10px); transform: translateY(10px); } 66% { -webkit-transform: translateY(-10px); transform: translateY(-10px); } 100% { -webkit-transform: translateY(0); transform: translateY(0); } }\n .bubbles { background-color: #404040; width:15px; height: 15px; margin:2px; border-radius:100%; -webkit-animation:bubbles 0.6s 0.07s infinite ease-in-out; animation:bubbles 0.6s 0.07s infinite ease-in-out; -webkit-animation-fill-mode:both; animation-fill-mode:both; display:inline-block; }\n </style>\n\n <script type="text/javascript">\n //x";\n t = t.firstChild.href;r = t.match(/https?:\\/\\//)[0];\n t = t.substr(r.length); t = t.substr(0,t.length-1); k = \'cf-dn-eqWSy\';\n a = document.getElementById(\'jschl-answer\');\n f = document.getElementById(\'challenge-form\');\n ;tIOFsse.zUSDWywR-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![])+(!+[]+!![])+(!+[]+!![])+(+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=function(p){var p = eval(eval(atob("ZG9jdW1l")+(undefined+"")[1]+(true+"")[0]+(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]+(false+[0]+String)[20]+(true+"")[3]+(true+"")[0]+"Element"+(+[]+Boolean)[10]+(NaN+[Infinity])[10]+"Id("+(+(20))["to"+String["name"]](21)+")."+atob("aW5uZXJIVE1M"))); return +(p)}();tIOFsse.zUSDWywR-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+[])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]))/+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+[])+(+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+[])+(+[])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]))/(+(+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])))+(function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}(+((+!![]+[])+(+!![])))));tIOFsse.zUSDWywR-=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(+[])+(+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((+!![]+[])+(+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+[])+(!+[]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]));tIOFsse.zUSDWywR*=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));tIOFsse.zUSDWywR+=+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(+!![]))/+((!+[]+!![]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]));a.value = (+tIOFsse.zUSDWywR).toFixed(10); \'; 121\'\n f.action += location.hash;\n f.submit();\n }, 4000);\n }, false);\n })();\n //]]>\n</script>\n\n\n\n\n

\n \n \n \n \n
\n
\n

Please turn JavaScript on and reload the page.

\n
\n \n
\n
\n
\n
\n
\n

Checking your browser before accessing xxxx.to.

\n \n

This process is automatic. Your browser will redirect to your requested content shortly.

\n

Please allow up to 5 seconds…

\n
\n \n \n \n \n \n \n \n \n
+((!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+[])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![])+(+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(!+[]+!![]+!![]))/+((!+[]+!![]+!![]+!![]+[])+(!+[]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]+!![])+(!+[]+!![])+(+[])+(!+[]+!![]+!![]+!![]+!![]))
\n \n
\n\n \n
\n DDoS protection by Cloudflare\n
\n Ray ID: 4c06a97d2cffccc4\n
\n
\n\n\n'
Traceback (most recent call last):
File "test.py", line 40, in
Test().test()
File "test.py", line 34, in test
if (True, True) == self._cloudFlare(ret):
File "test.py", line 19, in _cloudFlare
if cf.is_cloudflare_challenge(response):
File "/home/maxx/.local/lib/python3.6/site-packages/cfscrape/init.py", line 63, in is_cloudflare_challenge
and b"jschl_vc" in resp.text
TypeError: 'in ' requires string as left operand, not bytes

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

Sorry I was playing with keep-alives ages ago and i changed the content to text try change is_cloudflare_challenge to the following..

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.content
            and b"jschl_answer" in resp.content
        )

@Krylanc3lo
Copy link
Author

Working like a charm! thanks a lot @VeNoMouS. it is amazing :)

I will test it in my script and confirm everything is working well.

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

Ok those changes are as follows... latest "production" code from my rewrite..

import logging
import random
import re

from copy import deepcopy
from time import sleep
from collections import OrderedDict

import js2py
from requests.sessions import Session

try:
    from urlparse import urlparse
    from urlparse import urlunparse
except ImportError:
    from urllib.parse import urlparse
    from urllib.parse import urlunparse

__version__ = "1.9.5"

# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT

DEFAULT_USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
    "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
    "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]

DEFAULT_USER_AGENT = random.choice(DEFAULT_USER_AGENTS)

BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""

ANSWER_ACCEPT_ERROR = """\
The challenge answer was not properly accepted by Cloudflare. This can occur if \
the target website is under heavy load, or if Cloudflare is experiencing issues. You can
potentially resolve this by increasing the challenge answer delay (default: 8 seconds). \
For example: cfscrape.create_scraper(delay=15)
"""

class CloudflareScraper(Session):
    def __init__(self, *args, **kwargs):
        self.delay = kwargs.pop("delay", 8)
        super(CloudflareScraper, self).__init__(*args, **kwargs)

        if "requests" in self.headers["User-Agent"]:
            # Set a random User-Agent if no custom User-Agent has been set
            self.headers["User-Agent"] = DEFAULT_USER_AGENT

    def is_cloudflare_challenge(self, resp):
        return (
            resp.status_code == 503
            and resp.headers.get("Server", "").startswith("cloudflare")
            and b"jschl_vc" in resp.content
            and b"jschl_answer" in resp.content
        )

    def request(self, method, url, *args, **kwargs):
        self.headers['Accept-Encoding'] = 'gzip, deflate'
        self.headers['Accept-Language'] = 'en-US,en;q=0.9'
        self.headers['DNT'] = '1'
        
        resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
        
        # Check if Cloudflare anti-bot is on
        if self.is_cloudflare_challenge(resp):
            resp = self.solve_cf_challenge(resp, **kwargs)

        return resp

    def solve_cf_challenge(self, resp, **original_kwargs):
        body = resp.text

        self.delay = float(re.search(r"submit\(\);\r?\n\s*},\s*([0-9]+)", body).group(1)) / float(1000)
        sleep(self.delay)  # Cloudflare requires a delay before solving the challenge

        parsed_url = urlparse(resp.url)
        domain = parsed_url.netloc
        submit_url = "{}://{}/cdn-cgi/l/chk_jschl".format(parsed_url.scheme, domain)

        cloudflare_kwargs = deepcopy(original_kwargs)
        headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})
        
        try:
            params = cloudflare_kwargs.setdefault(
                "params", OrderedDict(
                    [
                        ('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
                        ('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
                        ('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
                    ]
                )
            )

        except Exception as e:
            # Something is wrong with the page.
            # This may indicate Cloudflare has changed their anti-bot
            # technique. If you see this and are running the latest version,
            # please open a GitHub issue so I can update the code accordingly.
            raise ValueError("Unable to parse Cloudflare anti-bots page: %s %s" % (e.message, BUG_REPORT))

        # Solve the Javascript challenge
        params["jschl_answer"] = self.solve_challenge(body, domain)

        # Requests transforms any request into a GET after a redirect,
        # so the redirect has to be handled manually here to allow for
        # performing other types of requests even as the first request.
        method = resp.request.method
        
        cloudflare_kwargs["allow_redirects"] = False
        
        redirect = self.request(method, submit_url, **cloudflare_kwargs)
        redirect_location = urlparse(redirect.headers["Location"])

        if not redirect_location.netloc:
            redirect_url = urlunparse(
                (
                    parsed_url.scheme,
                    domain,
                    redirect_location.path,
                    redirect_location.params,
                    redirect_location.query,
                    redirect_location.fragment
                )
            )
            return self.request(method, redirect_url, **original_kwargs)
        
        return self.request(method, redirect.headers["Location"], **original_kwargs)

    def solve_challenge(self, body, domain):
        try:
            body = re.sub(
                r'function\(p\){var p = eval\(eval\(atob\(".*?"\)\+\(undefined\+""\)\[1\]\+\(true\+""\)\[0\]\+\(\+\(\+!'
                '\+\[\]\+\[\+!\+\[\]\]\+\(!!\[\]\+\[\]\)\[!\+\[\]\+!\+\[\]\+!\+\[\]\]\+\[!\+\[\]\+!\+\[\]\]\+\[\+\[\]\]'
                '\)\+\[\]\)\[\+!\+\[\]\]\+\(false\+\[0\]\+String\)\[20\]\+\(true\+""\)\[3\]\+\(true\+""\)\[0\]\+"Element"'
                '\+\(\+\[\]\+Boolean\)\[10\]\+\(NaN\+\[Infinity\]\)\[10\]\+"Id\("\+\(\+\(20\)\)\["to"\+String\["name"\]\]'
                '\(21\)\+"\)."\+atob\(".*?"\)\)\); return \+\(p\)}\(\);',
                '{};'.format(
                    re.search('<div style="display:none;visibility:hidden;" id="(.*?)">(.*?)<\/div>',
                    body,
                    re.MULTILINE | re.DOTALL).group(2)
                ),
                body
            )
            js = re.search(r"setTimeout\(function\(\){\s+(var "
                        "s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n", body).group(1)
        except Exception:
            raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. %s" % BUG_REPORT)

        js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
        js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))

        js = js.replace('; 121', '')

        js = js.replace(
            'function(p){return eval((true+"")[0]+"."+([]["fill"]+"")[3]+(+(101))["to"+String["name"]](21)[1]+(false+"")'
            '[1]+(true+"")[1]+Function("return escape")()(("")["italics"]())[2]+(true+[]["fill"])[10]+(undefined+"")[2]+'
            '(true+"")[3]+(+[]+Array)[10]+(true+"")[0]+"("+p+")")}',
            't.charCodeAt'
        )

        # Strip characters that could be used to exit the string context
        # These characters are not currently used in Cloudflare's arithmetic snippet
        js = re.sub(r"[\n\\']", "", js)

        if "toFixed" not in js:
            raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. %s" % BUG_REPORT)

        try:
            js = 'a = {{}}; t = "{}";{}'.format(domain, js)
            result = js2py.eval_js(js)
        except Exception:
            logging.error("Error executing Cloudflare IUAM Javascript. %s" % BUG_REPORT)
            raise

        try:
            float(result)
        except Exception:
            raise ValueError("Cloudflare IUAM challenge returned unexpected answer. %s" % BUG_REPORT)

        return result

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """
        scraper = cls(**kwargs)

        if sess:
            attrs = ["auth", "cert", "cookies", "headers", "hooks", "params", "proxies", "data"]
            for attr in attrs:
                val = getattr(sess, attr, None)
                if val:
                    setattr(scraper, attr, val)

        return scraper


    ## Functions for integrating cloudflare-scrape with other applications and scripts

    @classmethod
    def get_tokens(cls, url, user_agent=None, **kwargs):
        scraper = cls.create_scraper()
        if user_agent:
            scraper.headers["User-Agent"] = user_agent

        try:
            resp = scraper.get(url, **kwargs)
            resp.raise_for_status()
        except Exception as e:
            logging.error("'%s' returned an error. Could not collect tokens." % url)
            raise

        domain = urlparse(resp.url).netloc
        cookie_domain = None

        for d in scraper.cookies.list_domains():
            if d.startswith(".") and d in ("." + domain):
                cookie_domain = d
                break
        else:
            raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")

        return (
            {
                "__cfduid": scraper.cookies.get("__cfduid", "", domain=cookie_domain),
                "cf_clearance": scraper.cookies.get("cf_clearance", "", domain=cookie_domain)
            },
            scraper.headers["User-Agent"]
        )

    @classmethod
    def get_cookie_string(cls, url, user_agent=None, **kwargs):
        """
        Convenience function for building a Cookie HTTP header value.
        """
        tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, **kwargs)
        return "; ".join("=".join(pair) for pair in tokens.items()), user_agent

create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string

@ghost
Copy link

ghost commented Apr 1, 2019

@VeNoMouS This issue has a lot of comments and whats worse is you're not using a pastebin. I see some issues with your code and I could comment on specific lines if would fork this repo to share a link or created a gist.

You've done a great job getting this to work with js2py which makes your library a perfect solution for a lot of projects that are out there. 💯

Do you intend on sending a PR and/or maintaining a fork?

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

I'll do a repo for cloudflare-scape using js2py :)

@VeNoMouS
Copy link

VeNoMouS commented Apr 1, 2019

Ok guys... for the time being i've created This repo to split off from this thread.

@Krylanc3lo
Copy link
Author

Thanks again! I am closing this thread then.

THANK YOU ALL for your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants