Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartyPants: Apostrophes at the start of leading contractions #1305

Closed
feasgal opened this issue Nov 16, 2022 · 7 comments · Fixed by #1348
Closed

SmartyPants: Apostrophes at the start of leading contractions #1305

feasgal opened this issue Nov 16, 2022 · 7 comments · Fixed by #1348
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request. extension Related to one or more of the included extensions.

Comments

@feasgal
Copy link

feasgal commented Nov 16, 2022

As mentioned in the original project here, "SmartyPants will turn the apostrophe into an opening single-quote, when in fact it should be a closing one."

This is specifically presenting an issue for me when smarty collides with abbr. For instance, this code:

Each section of the editor controls an access group's ability to view, and actions within, the listed pages.
Select the checkboxes to allow the appropriate actions.

*[access group]: Access groups allow admin users to<br>assign custom permissions to all<br>users assigned to that group.

is rendering like this:

image

because when access group gets rendered by abbr, the ' is no longer recognized as being mid-word and instead renders as a leading single quote for the possessive s.

I've implemented a stopgap via javascript, but there are a lot of different contraction situations and it would be great to have a real fix.

@mitya57
Copy link
Collaborator

mitya57 commented Nov 17, 2022

Yes, unfortunately smarty does not play well with any markup. I would suggest to include everything, including 's, into abbr, that should help.

@facelessuser
Copy link
Collaborator

Oh, I had not originally noticed that the contraction was split across multiple HTML elements when this was brought up in the MkDocs Material issues. Yes, in order for smarty to deduce that a quote is in a contraction, it must be contained within the same element. It's just the way the Python Markdown parser works.

@waylan
Copy link
Member

waylan commented Nov 17, 2022

I was curious what the reference implementation did so I checked with the filter set to "SmartyPants". The following input:

an <abbr title="...">access group</abbr>'s ability

Results in the correct output:

an <abbr title="...">access group</abbr>’s ability

So this is clearly a bug. However, this is a limitation of building SmartyPants into the Markdown parser directly. The text content of each element is processed separately and we are not always able to take into consideration the larger context.

It occurs to me that perhaps we could special case a quote at the start of a "tail" of an inline element. Although, I don't recall off-hand if all of that info is available or not. But, more importantly, I wonder if this would introduce errors in other edge cases.

@waylan waylan added bug Bug report. extension Related to one or more of the included extensions. confirmed Confirmed bug report or approved feature request. labels Nov 17, 2022
@waylan
Copy link
Member

waylan commented Nov 17, 2022

So I got curios and checked what the reference implementation does with this:

an <abbr title="...">access group </abbr>'s ability

Note that I added a space as the last character of the content of the abbr tag. One would expect that this would cause the quote to not get curled, or to get curled to the right. However, it is curled to the left just like the previous example. This suggests that the reference implementation is not very smart about this scenario and assumes that the first character of the tail of an inline element does not have whitespace to the left of it. That seems like it is perhaps an assumption we could make as well.

@ferdnyc
Copy link
Contributor

ferdnyc commented May 3, 2023

@waylan

This suggests that the reference implementation is not very smart about this scenario and assumes that the first character of the tail of an inline element does not have whitespace to the left of it. That seems like it is perhaps an assumption we could make as well.

I think actually it's (at least trying) to be particularly smart about 's, specifically. Because, while you get the correct apostrophe from all of these:

That's the wrong apostrophe
<b>That</b>'s the wrong apostrophe
That<b>'s</b> the wrong apostrophe

And both of these:

I <b>don't</b> want wrong apostrophes
<i>That'd</i> be a good apostrophe

all of these result in a wrong-way apostrophe:

I <b>don</b>'t want wrong apostrophes
I <b>don </b>'t want wrong apostrophes
<i>That</i>'d be a good apostrophe
That<i>'d</i> be a good apostrophe

It seems 's at the start of an element has been special-cased. (Which probably makes sense, even if it creates inconsistency. 's cases are crazy common, and I can't imagine a situation where you don't want to end up with a close-quote apostrophe even in <tag>foo</tag>'s or <tag>foo </tag>'s cases.)

Edit: (In fact, that special-casing is mentioned as having "always" been there, in the SmartyPants 1.3 blurb from its version history.)

@waylan
Copy link
Member

waylan commented May 3, 2023

@ferdnyc thanks for taking a closer look as this. The link to the reference implementation's change log is especially helpful. Apparently, it has always has a special case for 's. It's interesting that 't and 'd do not behave the same way, which verifies that 's is special cased. Seems reasonable that we should include the same special case. A PR is welcome.

mitya57 added a commit to mitya57/markdown that referenced this issue May 17, 2023
When 's is not preceded by anything, it probably means that it comes
after some HTML tag, so it should be converted to a closing quote.

The reference Perl implementation makes the close_class optional and
adds a lookahead check for (\s|s\b) when close_class was not matched.

Let's copy that behavior by removing closeClass lookbehind check from
closingSingleQuotesRegex2.

Fixes Python-Markdown#1305.
mitya57 added a commit to mitya57/markdown that referenced this issue May 17, 2023
When 's is not preceded by anything, it probably means that it comes
after some HTML tag, so it should be converted to a closing quote.

The reference Perl implementation makes the close_class optional and
adds a lookahead check for (\s|s\b) when close_class was not matched.

Let's copy that behavior by removing closeClass lookbehind check from
closingSingleQuotesRegex2.

Fixes Python-Markdown#1305.
@mitya57
Copy link
Collaborator

mitya57 commented May 17, 2023

I have created PR #1348 which fixes this issue. I hope I didn't break any other cases, but ideas what to add to tests are welcome.

waylan pushed a commit that referenced this issue May 22, 2023
When 's is not preceded by anything, it probably means that it comes
after some HTML tag, so it should be converted to a closing quote.

The reference Perl implementation makes the close_class optional and
adds a lookahead check for (\s|s\b) when close_class was not matched.

Let's copy that behavior by removing closeClass lookbehind check from
closingSingleQuotesRegex2.

Fixes #1305.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request. extension Related to one or more of the included extensions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants