Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Added configurable TextExtractor to JSoupParserBolt #678
Provides better text extraction than the ContentFilter, which agglutinated tokens found within a restricted section, plus allows to exclude portions of text.
The TextExtractor class is used within JSoupParserBolt only.
The archetype has been modified so that the ContentFilter is replaced by the TextExtractor with a similar configuration.