New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completes pull #1043 which fixes #1040 #1044
Conversation
Looks good. There is one more test case we should probably include:
Specifically a
I suspect that might cover the scenario there is no test coverage for in your patch. |
Yeah, I'll add that test. I just wanted to make sure the direction was fine first. |
I will most likely address #1045 here as well. I'll just make this the "fix HTML parsing" pull. |
Your approach for handling tails is a little different than mine. In the core parser, I track the tail by setting markdown/markdown/htmlparser.py Lines 150 to 155 in 607a091
... and then checking the status of markdown/markdown/htmlparser.py Lines 123 to 126 in 607a091
The status gets reset on the first newline. markdown/markdown/htmlparser.py Lines 166 to 167 in 607a091
However, you are simply adding a blank line after the end of the block, which is a simpler solution. I don't recall why I didn't use that solution myself. I might not have considered it, or perhaps it was mucking up some edge cases. In any event, in the md_in_html extension I'm less concerned with altering insignificant whitespace than in the default behavior. One thing I do wonder about is what if the blank lines you are checking for also contain whitespace. For example, you check for |
Fine by me. Regarding that issue. I believe the problem is that I never really fully tested a mix of |
I'm going to have to work things out on We aren't tracking We seem to store raw HTML data in separate structures as other data. I've got some tricky things to work through. |
We are going to probably need to refactor The base class doesn't use So fairly good refactor is in order to handle the complexity of this nested content. We may have to override more of the base class in This definitely won't be as quick a fix as I hoped. |
The order of things in markdown/markdown/extensions/md_in_html.py Lines 230 to 232 in 607a091
However, it sounds like you are suggesting that some elements are being reordered before we ever get that far. Not sure how that is happening. |
It is only happening when I try to fix We keep processing HTML under a block element with |
Okay, I get what treebuilder does now, we were processing start and end tags with I slightly altered the case because I wanted to make sure Markdown parsed under the import markdown
md = markdown.Markdown(extensions=["markdown.extensions.md_in_html"])
test = """
<div markdown="1">
**test**
<div>
**test**
<img src=""/>
<code>Test</code>
<span>**test**</span>
<p>Test 2</p>
</div>
</div>
"""
print(md.convert(test)) Results:
Unfortunately, it broke other cases 🤦 . |
I have a working fix now. I'm not convinced the I'll still need more coverage cases, and I'd like to run some checks on some more corner cases, but this is a good start I think. |
There shouldn't be elements in the cleandoc list. Those may exist in the stash, but won't turn up directly in cleandoc.
Coverage is now being met. Before merging, I'd like to do some more investigation. I've flagged this pull as a "work in progress" until I've had a chance to gain more confidence in this area. I think I'm quickly coming up to speed on how all the inner workings come together in the new parser. |
I finally had a chance to look at this. Looking good. 👍 |
@waylan So, I added a whitespace case that mixes newlines with spaces. I can't see a need (at least in |
That test seems fine to me. If that’s passing as-is, then that should be fine. Sent with GitHawk |
Cool. I'll try and wrap up my testing by the end of the weekend. If I'm not done by then. I'm probably done anyways 🙃. From what I can tell, the big holes seemed filled now. |
I'm going to call this done for now. I'm generally feeling good about the changes. They solve the current issues and don't break existing tests. I can't think of specific cases that I'm not covering, but if more are found, I gladly take another look. Probably won't have much more time to look at this over the weekend, and I don't want to hold up a fix. |
Completes the tests in #1043 which fixes #1040