BeautifulSoup invocation erroneously alters the resulting html #56

akhmerov · 2016-10-27T20:04:47Z

this bit of code seems to be harmful.

I have observed it producing from this:

line with a line break (so ending with a double space)  
line without a line break

the following erroneous html:

line with a line break (so ending with a double space)<br>
line without a line break</br>

(note the </br> tag). This results in large extra empty space added to a text with a lot of line breaks.

The text was updated successfully, but these errors were encountered:

ischurov · 2016-11-27T11:10:29Z

Agree. Moreover: it makes almost impossible to include HTML code inside code blocks as soup.decode(formatter=None) replaces all entities (including < &rt;) to its corresponding symbols. So if I have something like print("<b>") in the source I get actual <b> tag in the output. And this is unavoidable in general, as BeautulSoup converts entities to corresponding Unicode symbols on parsing and therefore losses some information.

This part of code is used to remove all cells with #ignore text. Personally, I'm willing just to comment it out as this feature is not crucial for me. Nevertheless, I'm not sure how to solve this problem better.

ischurov · 2016-11-27T11:24:47Z

Most probably it is better not to tweak with HTML tree but to remove the corresponding cells from ipynb JSON file before invoking HTMLExporter.

akhmerov · 2016-11-27T18:30:28Z

Seems correct, an nbconvert Preprocessor is the way to go. I didn't need the feature, so I just removed it entirely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BeautifulSoup invocation erroneously alters the resulting html #56

BeautifulSoup invocation erroneously alters the resulting html #56

akhmerov commented Oct 27, 2016

ischurov commented Nov 27, 2016 •

edited

ischurov commented Nov 27, 2016

akhmerov commented Nov 27, 2016

BeautifulSoup invocation erroneously alters the resulting html #56

BeautifulSoup invocation erroneously alters the resulting html #56

Comments

akhmerov commented Oct 27, 2016

ischurov commented Nov 27, 2016 • edited

ischurov commented Nov 27, 2016

akhmerov commented Nov 27, 2016

ischurov commented Nov 27, 2016 •

edited