Fix initial crowded <pre> output #63

merged 3 commits into from Nov 19, 2012


None yet
2 participants

wking commented Nov 18, 2012

html2text has problems when the HTML to parse starts off with:


It works fine with


This problem was acknowledged in #9
#9 (comment)

html2text's parsing procedure is a bit opaque to me, so this may not be the cleanest fix, but it does work.

wking added some commits Nov 18, 2012

Remove extra newline from before list <pre> blocks
My crowded-pre fix broke <pre> blocks in lists:

  $ diff -u
  @@ -1,5 +1,6 @@
     * Run this command:

           ls -l *.html

     * ?

There is a fair amount of trailing whitespace in html2text output, and
I'm not sure where it all comes from.  This patch removes the extra
newline (fixing the test), but it also tweaks the amount of trailing
whitespace in the expected blank line (probably not a problem).

wking commented Nov 18, 2012

I think a proper fix for this issue would be to restructure the whole output framework to be more line-based (to make it easier to figure out where preceding whitespace comes from, and make it easier to strip trailing whitespace), but that's too big a task for me to commit to at the moment.

aaronsw added a commit that referenced this pull request Nov 19, 2012

Merge pull request #63 from wking/master
Fix initial crowded <pre> output

@aaronsw aaronsw merged commit 8ae9193 into aaronsw:master Nov 19, 2012

1 check passed

default The Travis build passed

pombredanne pushed a commit to pombredanne/html2text that referenced this pull request Oct 10, 2015

Merge pull request #64 from alawibaba/master
Proper handling of anchors with content that starts with tags that
Fixes #63 #24

Thanks Ali Mohammad @alawibaba

jsbackus pushed a commit to rss2email/rss2email that referenced this pull request Sep 9, 2017

test:gmane: add Gmane feed for RSS testing
The weird indentation is because Gmane wraps the descriptions in <pre>
tags (to preserve formatting in the initial email).  html2text is
converting the description to Markdown, so it inserts a leading 4
spaces for preformatted blocks.  The unindented initial line and
following blank are due to a bug in html2text, for which I've

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment