Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Fix initial crowded <pre> output #63

Merged
merged 3 commits into from

2 participants

@wking

html2text has problems when the HTML to parse starts off with:

<pre>stuff...

It works fine with

<pre>
stuff...

This problem was acknowledged in #9
#9 (comment)

html2text's parsing procedure is a bit opaque to me, so this may not be the cleanest fix, but it does work.

wking added some commits
@wking wking test/pre: add test showing poor handling of initial crowded <pre> da88a6d
@wking wking Fix initial crowded <pre> output cc02194
@wking wking Remove extra newline from before list <pre> blocks
My crowded-pre fix broke <pre> blocks in lists:

  $ diff -u preformatted_in_list.md preformatted_in_list-module_output.md
  --- preformatted_in_list.md
  +++ preformatted_in_list-module_output.md
  @@ -1,5 +1,6 @@
     * Run this command:

  +
           ls -l *.html

     * ?

There is a fair amount of trailing whitespace in html2text output, and
I'm not sure where it all comes from.  This patch removes the extra
newline (fixing the test), but it also tweaks the amount of trailing
whitespace in the expected blank line (probably not a problem).
eb09b6d
@wking

I think a proper fix for this issue would be to restructure the whole output framework to be more line-based (to make it easier to figure out where preceding whitespace comes from, and make it easier to strip trailing whitespace), but that's too big a task for me to commit to at the moment.

@aaronsw aaronsw merged commit 8ae9193 into aaronsw:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 18, 2012
  1. @wking
  2. @wking
  3. @wking

    Remove extra newline from before list <pre> blocks

    wking authored
    My crowded-pre fix broke <pre> blocks in lists:
    
      $ diff -u preformatted_in_list.md preformatted_in_list-module_output.md
      --- preformatted_in_list.md
      +++ preformatted_in_list-module_output.md
      @@ -1,5 +1,6 @@
         * Run this command:
    
      +
               ls -l *.html
    
         * ?
    
    There is a fair amount of trailing whitespace in html2text output, and
    I'm not sure where it all comes from.  This patch removes the extra
    newline (fixing the test), but it also tweaks the amount of trailing
    whitespace in the expected blank line (probably not a problem).
This page is out of date. Refresh to see the latest.
View
12 html2text.py
@@ -593,17 +593,25 @@ def o(self, data, puredata=0, force=0):
if self.startpre:
#self.out(" :") #TODO: not output when already one there
- self.startpre = 0
+ if not data.startswith("\n"): # <pre>stuff...
+ data = "\n" + data
bq = (">" * self.blockquote)
if not (force and data and data[0] == ">") and self.blockquote: bq += " "
if self.pre:
- bq += " "
+ if not self.list:
+ bq += " "
+ #else: list content is already partially indented
for i in xrange(len(self.list)):
bq += " "
data = data.replace("\n", "\n"+bq)
+ if self.startpre:
+ self.startpre = 0
+ if self.list:
+ data = data.lstrip("\n") # use existing initial indentation
+
if self.start:
self.space = 0
self.p_p = 0
View
13 test/pre.html
@@ -0,0 +1,13 @@
+<html>
+ <head>
+ <title>initial crowsed pre handling test #1</title>
+ </head>
+ <body>
+<pre>a
+b
+c</pre>
+
+ <p>Ensure that HTML that starts with a crowded <code>&lt;pre&gt;</code>
+ is converted to reasonable Markdown.</p>
+ </body>
+</html>
View
8 test/pre.md
@@ -0,0 +1,8 @@
+
+ a
+ b
+ c
+
+Ensure that HTML that starts with a crowded `<pre>` is converted to reasonable
+Markdown.
+
View
2  test/preformatted_in_list.md
@@ -1,5 +1,5 @@
* Run this command:
-
+
ls -l *.html
* ?
Something went wrong with that request. Please try again.