Skip to content

First PyPI Release

Compare
Choose a tag to compare
@binux binux released this 11 Jan 05:38
· 831 commits to master since this release
  • A lot of bug fixed.
  • Make pyspider as a single top-level package. (thanks to zbb, iamtew and fmueller from HN)
  • Python 3 support!
  • Use click to create a better command line interface.
  • Postgresql Supported via SQLAlchemy (with the power of SQLAlchemy, pyspider also support Oracle, SQL Server, etc).
  • Benchmark test.
  • Documentation & tutorial: http://docs.pyspider.org/
  • Flake8 cleanup (thanks to @jtwaleson)

Base

  • Use messagepack instead of pickle in message queue.
  • JSON data will encoding as base64 string when content is binary.
  • Rabbitmq lazy limit for better performance.

Scheduler

  • Never re-crawl a task with a negative age.

Fetcher

  • proxy parameter support ip:port format.
  • increase default fetcher poolsize to 100.
  • PhantomJS will return JS script result in Response.js_script_result.

Processor

  • Put multiple new tasks in one package. performance for rabbitmq.
  • Not store all of the headers when success.

Script

  • Add an interface to generate taskid with task object. get_taskid
  • Task would be de-duplicated by project and taskid.

Webui

  • Project list sortable.
  • Return 404 page when dump a not exists project.
  • Web preview support image