Skip to content

A python module which support an other project named bububa.Lego provide several advance web scrape functions.

Notifications You must be signed in to change notification settings

bububa/SuperMario

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

= About SuperMario =
    SuperMario is an advance web cralwer library written in python. It
provides a number of methods to mine data from kinds of sites.

== License ==
BSD License
See 'LICENSE' for details.

== Requirements ==
Platform: *nix like system (Unix, Linux, Mac OS X, etc.)
Python: 2.5+
Storage: mongodb
Some other python models:
    - simplejson
    - BeautifulSoup
    - eventlet 
    - PIL
    - pycurl
    - chardet
    - feedparser
    - mongokit
    - templatemaker
    - flickrapi
    - pyyaml
    - MySQLdb
    - dateutil

== Features ==
  + robots.txt protocol supported;
  + cache URL 's HTML;
  + normalize URL;
  + convert all content into unicode;
  + extract MainText from HTML by specific a * link-threshold *
  + convert partial RSS feed to full RSS feed;
  + proxies list support;
  + cookie keep support;
  + login support;

About

A python module which support an other project named bububa.Lego provide several advance web scrape functions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages