Crawling - html to import

transmogrify.webcrawler will crawl html to extract pages and files as a source for your transmogrifier pipeline. transmogrify.webcrawler.typerecognitor aids in setting '_type' based on the crawled mimetype. transmogrify.webcrawler.cache helps speed up crawling and reduce memory usage by storing items locally.

These blueprints are designed to work with the funnelweb pipeline but can be used independently.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
docs		docs
transmogrify		transmogrify
.gitignore		.gitignore
README.rst		README.rst
bootstrap.py		bootstrap.py
buildout.cfg		buildout.cfg
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawling - html to import

About

Releases

Packages

Contributors 6

Languages

collective/transmogrify.webcrawler

Folders and files

Latest commit

History

Repository files navigation

Crawling - html to import

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages