You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What I wanted: (Describe briefly what you want to achieve here)
Using Quota option for wpull to stop after X amount of bytes have been downloaded to warc file What I expect: (Describe briefly how you think the program/feature will work)
It stops crawling when it hits the quota, but leaves Wpull hanging (see output below):
The command or website causes the problem: (Copy the options provided to Wpull here) wpull http://www.techcrunch.com --quota 1000000 --warc-file techcrunch.com -rH --warc-cdx --level 2 -Dtechcrunch.com --no-check-certificate
Reducing the quota amount below this amount works as expected, for example: wpull http://www.techcrunch.com --quota 100000 --warc-file techcrunch.com -rH --warc-cdx --level 2 -Dtechcrunch.com --no-check-certificate (one less 0) Operating system: (Write your OS name here such as Windows 10/Ubuntu Linux 14.04 32-bit/OS X 10.10)
10.11.6 Python version: (What does python --version say?)
3.6.1 Wpull version: (What does wpull --version say?)
2.0.1 Log/Output:
[ O ] 6.0 B 0:01:49 -1.1 KiB/s
INFO Fetched ‘https://techcrunch.com/2017/03/28/uber-restarts-self-driving-passenger-pilots-in-arizona-and-pittsburgh/?ncid=mobilenavtrend’: 200 OK. Length: unspecified [text/html; charset=UTF-8]. ```
The text was updated successfully, but these errors were encountered:
What I wanted: (Describe briefly what you want to achieve here)
Using Quota option for wpull to stop after X amount of bytes have been downloaded to warc file
What I expect: (Describe briefly how you think the program/feature will work)
It stops crawling when it hits the quota, but leaves Wpull hanging (see output below):
The command or website causes the problem: (Copy the options provided to Wpull here)
wpull http://www.techcrunch.com --quota 1000000 --warc-file techcrunch.com -rH --warc-cdx --level 2 -Dtechcrunch.com --no-check-certificate
Reducing the quota amount below this amount works as expected, for example:
wpull http://www.techcrunch.com --quota 100000 --warc-file techcrunch.com -rH --warc-cdx --level 2 -Dtechcrunch.com --no-check-certificate
(one less 0)Operating system: (Write your OS name here such as Windows 10/Ubuntu Linux 14.04 32-bit/OS X 10.10)
10.11.6
Python version: (What does
python --version
say?)3.6.1
Wpull version: (What does
wpull --version
say?)2.0.1
Log/Output:
The text was updated successfully, but these errors were encountered: