Helpful transmogrifier blueprints to extract text or html out of html content. transmogrify.htmlcontentextractor extracts fields from html using XPATH or TAL expressions. transmogrify.htmlcontentextractor.auto tries to automate this process by using cluster analysis and finding where sets of pages differ to determine their title, description and body.
This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis
collective/transmogrify.htmlcontentextractor
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This blueprint extracts out title, description and body from html either via xpath or by automatic cluster analysis
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published