Skip to content

NUTCH-2611: Add line-breaks when parsing HTML block-level elements#354

Merged
sebastian-nagel merged 2 commits intoapache:masterfrom
YossiTamari:NUTCH-2611
Jun 28, 2018
Merged

NUTCH-2611: Add line-breaks when parsing HTML block-level elements#354
sebastian-nagel merged 2 commits intoapache:masterfrom
YossiTamari:NUTCH-2611

Conversation

@YossiTamari
Copy link
Copy Markdown
Contributor

When the configuration property parser.html.line.separators contains a list of tags, a newline is added before and after the text content of this tag.

YossiTamari and others added 2 commits June 25, 2018 13:28
When the configuration property parser.html.line.separators contains a list of tags, a newline is added before and after the text content of this tag.
@sebastian-nagel
Copy link
Copy Markdown
Contributor

sebastian-nagel commented Jun 26, 2018

+1 lgtm. The plain-text layout is now indeed more readable - line breaks after head lines, <p>, etc. Will commit soon if there are no objections. Thanks, @YossiTamari!

@sebastian-nagel sebastian-nagel merged commit 3ada351 into apache:master Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants