Skip to content

Commit

Permalink
Ignore <p> and <div> inside a table
Browse files Browse the repository at this point in the history
  • Loading branch information
peterkajan committed Aug 18, 2022
1 parent a3ed67b commit ad4300a
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 1 deletion.
2 changes: 1 addition & 1 deletion ChangeLog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ UNRELEASED
* Don't wrap tables by default and add a ``--wrap-tables`` config option
* Fix #320 padding empty tables and tables with no </tr> tags.
* Add ``ignore_mailto_links`` config option to ignore ``mailto:`` style links.

* Feature #198: Ignore ``<p>`` and ``<div>`` tags inside table rows.

2020.1.16
=========
Expand Down
2 changes: 2 additions & 0 deletions html2text/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,8 @@ def handle_tag(
self.soft_br()
elif self.astack:
pass
elif self.split_next_td:
pass
else:
self.p()

Expand Down
12 changes: 12 additions & 0 deletions test/no_p_in_table.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!DOCTYPE html> <html>
<head lang="en"> <meta charset="UTF-8"> <title></title> </head>
<body> <h1>This is a test document</h1> With some text, <code>code</code>, <b>bolds</b> and <i>italics</i>. <h2>This is second header</h2> <p style="display: none">Displaynone text</p>
<table>
<tr> <th>Header 1</th> <th>Header 2</th> <th>Header 3</th> </tr>
<tr> <td><p>Content 1</p></td> <td><p>2</p></td> <td><img src="http://lorempixel.com/200/200" alt="200"/> Image!</td> </tr>
<tr> <td><p>Content 1 longer</p></td> <td><p>Content 2</p></td> <td><p>blah</p></td> </tr>
<tr> <td><p>Content </p></td> <td><p>Content 2</p></td> <td><p>blah</p></td> </tr>
<tr> <td><p>t </p></td> <td><p>Content 2</p></td> <td><p>blah blah blah</p></td> </tr>
</table>

</body> </html>
15 changes: 15 additions & 0 deletions test/no_p_in_table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This is a test document

With some text, `code`, **bolds** and _italics_.

## This is second header

Displaynone text

Header 1 | Header 2 | Header 3
---|---|---
Content 1 | 2 | ![200](http://lorempixel.com/200/200) Image!
Content 1 longer | Content 2 | blah
Content | Content 2 | blah
t | Content 2 | blah blah blah

0 comments on commit ad4300a

Please sign in to comment.