-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bold/Italic bug #792
Comments
I can confirm I am seeing the same behavior in 3.0.1. However, this behavior does not exist in the latest code here on GitHub and will be available in the next release. >>> import markdown
>>> markdown.version
'3.1.dev0'
>>> markdown.markdown('This is text ***bold italic** italic* more text')
'<p>This is text <em><strong>bold italic</strong> italic</em> more text</p>' |
@waylan, I don't think you tested is case properly. It wasn't |
But I will explain a little bit why this happens. In the pattern The 3rd party >>> markdown.Markdown(extensions=["pymdownx.betterem"], extension_configs={"pymdownx.betterem": {"smart_enable": "all"}}).convert("This is text **bold *italic bold*** with more text")
'<p>This is text <strong>bold <em>italic bold</em></strong> with more text</p>' Without smart enabled for >>> markdown.Markdown(extensions=["pymdownx.betterem"]).convert("This is text **bold *italic bold*** with more text")
'<p>This is text <strong>bold *italic bold</strong>* with more text</p>' Based on how Python Markdown works, this behavior is as I would expect. One sure fire way to avoid such confusion is to use syntax like this:
If not, something like |
If we find Python Markdown's behavior acceptable here, then we should just close this. |
Oops, apparently I copied the wrong example when I tested that. And Babelmark clearly indicates we are in the minority here. This is a bug. |
I'm not sure how the parser works for the markdown as I've not been in the code but I'm currently writing a desktop browser/editor that uses markdown in the documents for a document store. I added formatting to my editor as is typically done. It just makes the editor code more readable. I had my bold/italics formatting regex base assumptions that bold is always the inside two asterisks when working together with italics. I'm not sure if that makes sense for the markdown package or not but I'm parsing flawless formatting in my editor at the moment in terms of italics/bold and using an asterisk. I've yet to implement and support the underscore for formatting in the editor. I'll take a look at the betterem options to resolve the issue since I'm already using the package anyways. Thank you for the feedback @facelessuser Either way, if the bug exists, we probably want to stomp on it. if it doesn't, I apologize for wasting your time. Thank you to the whole team for the effort you put into this project. |
That is not a safe assumption to make. That's why @facelessuser suggests usign
|
I like the mixed idea (using As I said I have not written code to format my editor based on the Bold regex Italics regex I'm very new with regex so I'll take the feedback criticism no problem. |
Basically, to fix this, we need to handle the opposite case of I think even
@Dave-ts , there is also whitespace considerations when determining if References: |
Thank you for the references I'll have a peak right away. |
Right. Of course, Perhaps we can better handle the last group at the end. Currently in the strong pattern I was thinking we might do something like In other words, when matching the closing strong token, given a string of two or more asterisks, we must match the last two in the string. |
Another way to solve this might be to simply add another pattern which special cases |
I haven't played with this yet, but I don't think it is completely indistinguishable. It's a matter of order I believe. If you scan for Now I could be completely wrong here. Unfortunately, I don't remember if I tried this before and ran into issues, or like Python Markdown, just didn't think to put this case in.
This might work. I guess we'd have to run some tests to see. |
So I believe I've solved this issue over in I opted for having pattern for |
I think something like this will work for Python Markdown: # **strong*em***
EM_STRONG2_RE = r'(\*)\1(?!\1)(.+?)\1(?!\1)(.+?)\1{3}'
# __strong _em___
SMART_EM_STRONG2_RE = r'(?<!\w)(\_)\1(?!\1)(.+?)(?<!\w)\1(?!\1)(.+?)\1{3}(?!\w)' Here we are basically requiring that the content of each doesn't start with the token, so if we had something like With underscore, we have smart enabled by default, so we have the additional requirement that the nested It seems to work with basic testing. I'll upload a pull request once I've tested it more. |
@waylan creating these new rules will of course throw off the current rule alignment we have going were each rule is done on a number divisible by 10: 90, 80, 70, etc. We would have to insert between, is that okay? Or do we want to consolidate some of the rules? |
What did you have in mind? |
Yes, that is what I would expect. |
I'm not sure if it will cause problems to reduce all the We've given the impression that all the standard patterns are on boundaries of I figure we could use the new Pattern format, and in a single pattern loop through our different regex for |
One problem would be that some people still use the deprecated method of inserting relative to a name, so if we consolidated all patterns, we might break some extensions. If we just append this new pattern to an existing pattern, you'd at least not break parity. |
You just reminded me that the deprecation path for the old That said, I understand the concern about breaking the "step by 10" model we have now. In 3.0 we significantly altered the way inline parsing can work, while in practice we made very few actual changes. If you are suggesting taking this to the next step, then that seems like a reasonable approach. I like the idea that all strong processors are combined into one, if that is reasonable to accomplish. |
We could take it to the next step. I was being more conservative, but if we want to go all in and combine them, that could be done quite easily. Initially, we'd just use the new format and loop through our regular expression patterns and output the appropriate element based on what pattern matches. If we wanted to in the future, we could even rewrite to functionally parse the patterns (if there was some advantage), but I see no need to completely rewrite everything. I think the current patterns are probably fine for now, we can just group them into one pattern step. |
Okay, it seems like the best way to consolidate patterns, if we were to go that direction, would be to maybe take a more functional approach. Trigger off I may give this a shot as it may make a lot more sense and probably clean things up as well. |
I didn't notice this before, but it looks like we don't properly apply smart logic always to underscores:
To fix this, we'd need even more patterns, extras for underscore that employ smart logic. We may need to do this in a better way...or just add more patterns that might conflict with plugins. |
Interesting edge case. For the record, we only apply "smart logic" to single underscores, not double or triple. Presumably the thinking is that single underscores are common enough in plain text that we should special case them, but double and triple underscores are not common enough and we always treat them the same as asterisks. Of course, the specific edge case above is handled by a doubt/triple regex, so it doesn't follow the "smart" underscore logic. And in the end, I'm not too worried about it as it only behaves that way because there are double and triple underscores with it. The point is that we only use the "smart underscores" when we are only dealing with all single underscore text. I can't say for certain if that was a conscious decision of an accident that it turned out that way. I suppose a way to determine if it was intended behavior would be to find a test for it. If no such test exists, then it was an unforeseen side effect. Either way, I'm okay with it staying the way it is. |
I've actually worked out logistics and have implemented a prototype over in the BetterEm extension: https://github.com/facelessuser/pymdown-extensions/pull/516/files#diff-81791fadc501f2d0d876f760472c81b4R98. Essentially, a similar inline processor will be used here: 1 for asterisks and 1 for underscores. All of the logic will be consolidated into a single processors for each. |
This issue was fixed by #805 |
Some of the recent releases of Python-Markdown, along with some of the recent fixes merged in from release-3.0.x, resulted in some regressions in the expected Markdown rendering results in unit tests. These regressions are, for the most part, not something users will need to worry about. Some of the regressions were in the XSS unit tests. These aren't security regressions, just expected string regressions. Python Markdown 3.0 changed some of the parsing, supporting `"title"` in link URLs, and these captured strings in JavaScript code, turning them into `title=` attributes. Tests were updated for the new strings. Python Markdown 3.2 added `<code>` elements to each line in a code block, which new tests from release-3.0.x weren't accounting for. Since we support 3.1 and 3.3+ (depending on the Python version), these testsn ow check for the right string for the right version. And finally, the one user-facing change is that nested bold/italic (in the form of `*this is **a** test*` renders entirely as a series of italic blocks on Python Markdown 3.2. The working alternative is `*this is __a__ test*`. Tests were updated for this as well. Unfortunately, there doesn't seem to be a way to restore prior formatting, but hopefully this doesn't affect many people. See Python-Markdown/markdown#792 for the bold/italic changes in Python Markdown. Testing Done: Unit tests pass on all supported versions of Python, with all supported versions of Python Markdown. Reviewed at https://reviews.reviewboard.org/r/11611/
I think I'm running up against another bold/italics bug. I did some quick searches and it looks like the other issues were considered resolved, sorry if I'm re-reporting on something already fixed that hasn't made it upstream yet.
Installed from pip current Python-Markdown version 3.0.1
The raw markdown line that breaks is:
The output I'm getting is as follows:
However, the following format does seem to work correctly.
The output is
The text was updated successfully, but these errors were encountered: