Releases
v0.3.0
binux
released this
11 Jan 05:38
A lot of bug fixed.
Make pyspider as a single top-level package. (thanks to zbb, iamtew and fmueller from HN)
Python 3 support!
Use click to create a better command line interface.
Postgresql Supported via SQLAlchemy (with the power of SQLAlchemy, pyspider also support Oracle, SQL Server, etc).
Benchmark test.
Documentation & tutorial: http://docs.pyspider.org/
Flake8 cleanup (thanks to @jtwaleson )
Base
Use messagepack instead of pickle in message queue.
JSON data will encoding as base64 string when content is binary.
Rabbitmq lazy limit for better performance.
Scheduler
Never re-crawl a task with a negative age.
Fetcher
proxy
parameter support ip:port
format.
increase default fetcher poolsize to 100.
PhantomJS will return JS script result in Response.js_script_result
.
Processor
Put multiple new tasks in one package. performance for rabbitmq.
Not store all of the headers when success.
Script
Add an interface to generate taskid with task object. get_taskid
Task would be de-duplicated by project and taskid.
Webui
Project list sortable.
Return 404 page when dump a not exists project.
Web preview support image
You can’t perform that action at this time.