Skip to content

jmoiron/arachne

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arachne

Arachne is meant to be a next generation version of hiispider, a flexible web spider written at hiidef for flavors.me. It features a very similar high level architecture, but implements them differently to achieve a few important objectives:

  • HTTP interfaces should be rich and easily extendable
  • Plugins should be easy to run synchronously
  • DRY-ness in the resultant plugin code
  • Should not depend on undocumented architectural decisions

Asynchronicity is achieved with gevent, which should be patched by users of arachne. Without patching, arachne behaves synchronously and nearly all of its clients and libraries are usable from the python shell.

architectural overview

Arachne is split up into 3 major pieces:

  • A scheduler which puts jobs on a queue
  • A worker which executes scheduled jobs
  • An interface which runs jobs on demand via HTTP

Jobs are all tied to methods implemented in plugins. Arachne makes certain basic assumptions and decisions, and will take care of these problems:

  • Mapping URLs to plugin methods
  • Basic plugin execution and result storage
  • Registration and lookup for available plugins
  • Associating a run-interval (every n seconds) with each plugin method
  • Daemonization, start/stop/restart & pidfiles

You will have to decide:

  • What a "job" looks like coming on and off the queue
  • Where and how to store plugin results
  • How to schedule those jobs
  • How to store data necessary to run the jobs

batteries

Arachne comes with a number of batteries included:

  • a simple no-magic configuration management system
  • a rich http library, based on requests with:
    • header caching on a pluggable backend (eg. memcached)
    • header-based json/xml parsing with forced overrides
    • OAuth 1.0a helpers (via requests-oauth)
    • alternate session style helpers w/ with base-url support
  • a memcached wrapper based on ultramemcache
  • a mysql wrapper based on ultramysql
  • an AMQP client based on kombu and amqplib
  • a cassandra client based on pycassa

All of these clients will attempt to auto-configure with arachne's configuration management system.

About

a complex but scalable web spider

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages