Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

BeautifulSoup invocation erroneously alters the resulting html #56

Open
akhmerov opened this issue Oct 27, 2016 · 3 comments
Open

BeautifulSoup invocation erroneously alters the resulting html #56

akhmerov opened this issue Oct 27, 2016 · 3 comments

Comments

@akhmerov
Copy link

this bit of code seems to be harmful.

I have observed it producing from this:

line with a line break (so ending with a double space)  
line without a line break

the following erroneous html:

line with a line break (so ending with a double space)<br>
line without a line break</br>

(note the </br> tag). This results in large extra empty space added to a text with a lot of line breaks.

@ischurov
Copy link

ischurov commented Nov 27, 2016

Agree. Moreover: it makes almost impossible to include HTML code inside code blocks as soup.decode(formatter=None) replaces all entities (including < &rt;) to its corresponding symbols. So if I have something like print("<b>") in the source I get actual <b> tag in the output. And this is unavoidable in general, as BeautulSoup converts entities to corresponding Unicode symbols on parsing and therefore losses some information.

This part of code is used to remove all cells with #ignore text. Personally, I'm willing just to comment it out as this feature is not crucial for me. Nevertheless, I'm not sure how to solve this problem better.

@ischurov
Copy link

Most probably it is better not to tweak with HTML tree but to remove the corresponding cells from ipynb JSON file before invoking HTMLExporter.

@akhmerov
Copy link
Author

Seems correct, an nbconvert Preprocessor is the way to go. I didn't need the feature, so I just removed it entirely.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants