Skip to content
A Powerful Spider(Web Crawler) System in Python.
Branch: master
Clone or download
binux Merge pull request #868 from vinsec/patch-1
fix bug: support python 3.7 and solve travis-ci problem
Latest commit 3fccfab Mar 10, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github
data add test for scheduler Mar 6, 2014
docs Fixed typo Apr 5, 2018
pyspider
tests 1. python2.7 image is different when using metrix Feb 24, 2019
tools tools/migrate.py Sep 30, 2015
.coveragerc
.gitignore
.travis.yml
Dockerfile add puppeteer fetcher Feb 14, 2019
LICENSE
MANIFEST.in
README.md Grammar Changes Jun 15, 2017
mkdocs.yml
requirements.txt
run.py move run.py to pyspider Nov 24, 2014
setup.py
tox.ini change dockerfile mysql-connector-python curl Jan 16, 2017

README.md

pyspider Build Status Coverage Status Try

A Powerful Spider(Web Crawler) System in Python. TRY IT NOW!

Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases

Sample Code

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

Demo

Installation

WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth for webui.

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/

Contribute

TODO

v0.4.0

  • a visual scraping interface like portia

License

Licensed under the Apache License, Version 2.0

You can’t perform that action at this time.