Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudflare #371

Open
4 tasks
rnrnstar2 opened this issue Jun 16, 2020 · 1 comment
Open
4 tasks

cloudflare #371

rnrnstar2 opened this issue Jun 16, 2020 · 1 comment
Labels

Comments

@rnrnstar2
Copy link

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

  • I've upgraded cfscrape with pip install -U cfscrape
  • I'm using Node version 10 or higher
  • The site protection I'm having issues with is from Cloudflare
  • I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:


cfscrape version number

Run pip show cfscrape and paste the output below:


Code snippet involved with the issue

2020-06-16 18:42:03 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: scraping)
2020-06-16 18:42:03 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.7.7 (default, May  6 2020, 04:59:01) - [Clang 4.0.1 (tags/RELEASE_401/final)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 2.9.2, Platform Darwin-19.5.0-x86_64-i386-64bit
2020-06-16 18:42:03 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'scraping', 'CONCURRENT_REQUESTS': 32, 'CONCURRENT_REQUESTS_PER_DOMAIN': 32, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 2, 'DOWNLOAD_TIMEOUT': 600, 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'FEED_FORMAT': 'csv', 'FEED_URI': 'results/%(name)s_%(time)s.csv', 'HTTPCACHE_ENABLED': True, 'HTTPCACHE_EXPIRATION_SECS': 43200, 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage', 'NEWSPIDER_MODULE': 'scraping.spiders', 'SPIDER_MODULES': ['scraping.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}
2020-06-16 18:42:03 [scrapy.extensions.telnet] INFO: Telnet Password: e179fe629b29425b
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
>>>>>>>>>>>>>>>>>__init__(MODES)<<<<<<<<<<<<<<<<<
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy_crawlera.CrawleraMiddleware',
 'scrapy_splash.SplashCookiesMiddleware',
 'scrapy_splash.SplashMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats',
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy_splash.SplashDeduplicateArgsMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-06-16 18:42:03 [scrapy.middleware] INFO: Enabled item pipelines:
['scraping.pipelines.ScrapingPipeline']
2020-06-16 18:42:03 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.modes.com:443
2020-06-16 18:42:04 [urllib3.connectionpool] DEBUG: https://www.modes.com:443 "GET /jp/shopping/woman HTTP/1.1" 503 None
Unhandled error in Deferred:
2020-06-16 18:42:04 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 172, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 176, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 81, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/Users/rnrnstar/github/Spiders/scraping/spiders/modes.py", line 41, in start_requests
    data = scraper.get("https://www.modes.com/jp/shopping/woman").content
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 207, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 299, in solve_challenge
    % BUG_REPORT
builtins.ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

2020-06-16 18:42:04 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 259, in solve_challenge
    javascript, flags=re.S
AttributeError: 'NoneType' object has no attribute 'groups'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/scrapy/crawler.py", line 81, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/Users/rnrnstar/github/Spiders/scraping/spiders/modes.py", line 41, in start_requests
    data = scraper.get("https://www.modes.com/jp/shopping/woman").content
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 129, in request
    resp = self.solve_cf_challenge(resp, **kwargs)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 207, in solve_cf_challenge
    answer, delay = self.solve_challenge(body, domain)
  File "/Users/rnrnstar/opt/anaconda3/envs/python_modules/lib/python3.7/site-packages/cfscrape/__init__.py", line 299, in solve_challenge
    % BUG_REPORT
ValueError: Unable to identify Cloudflare IUAM Javascript on website. Cloudflare may have changed their technique, or there may be a bug in the script.

Please read https://github.com/Anorov/cloudflare-scrape#updates, then file a bug report at https://github.com/Anorov/cloudflare-scrape/issues."

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)


URL of the Cloudflare-protected page

[LINK GOES HERE]

URL of Pastebin/Gist with HTML source of protected page

[LINK GOES HERE]

@rnrnstar2 rnrnstar2 added the bug label Jun 16, 2020
@Sraq-Zit
Copy link

Try this #373
I tested it with your link and it worked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants