A python module which support an other project named bububa.Lego provide several advance web scrape functions.
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bububa
.gitignore
README.txt
setup.py

README.txt

= About SuperMario =
    SuperMario is an advance web cralwer library written in python. It
provides a number of methods to mine data from kinds of sites.

== License ==
BSD License
See 'LICENSE' for details.

== Requirements ==
Platform: *nix like system (Unix, Linux, Mac OS X, etc.)
Python: 2.5+
Storage: mongodb
Some other python models:
    - simplejson
    - BeautifulSoup
    - eventlet 
    - PIL
    - pycurl
    - chardet
    - feedparser
    - mongokit
    - templatemaker
    - flickrapi
    - pyyaml
    - MySQLdb
    - dateutil

== Features ==
  + robots.txt protocol supported;
  + cache URL 's HTML;
  + normalize URL;
  + convert all content into unicode;
  + extract MainText from HTML by specific a * link-threshold *
  + convert partial RSS feed to full RSS feed;
  + proxies list support;
  + cookie keep support;
  + login support;