Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
10 lines (6 sloc) 471 Bytes

Crawling - html to import

transmogrify.webcrawler will crawl html to extract pages and files as a source for your transmogrifier pipeline. transmogrify.webcrawler.typerecognitor aids in setting '_type' based on the crawled mimetype. transmogrify.webcrawler.cache helps speed up crawling and reduce memory usage by storing items locally.

These blueprints are designed to work with the funnelweb pipeline but can be used independently.