-
Notifications
You must be signed in to change notification settings - Fork 1
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
bububa/SuperMario
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
= About SuperMario = SuperMario is an advance web cralwer library written in python. It provides a number of methods to mine data from kinds of sites. == License == BSD License See 'LICENSE' for details. == Requirements == Platform: *nix like system (Unix, Linux, Mac OS X, etc.) Python: 2.5+ Storage: mongodb Some other python models: - simplejson - BeautifulSoup - eventlet - PIL - pycurl - chardet - feedparser - mongokit - templatemaker - flickrapi - pyyaml - MySQLdb - dateutil == Features == + robots.txt protocol supported; + cache URL 's HTML; + normalize URL; + convert all content into unicode; + extract MainText from HTML by specific a * link-threshold * + convert partial RSS feed to full RSS feed; + proxies list support; + cookie keep support; + login support;
About
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published