Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartyPants is very slow #1103

Open
kevinrenskers opened this issue Feb 6, 2021 · 6 comments
Open

SmartyPants is very slow #1103

kevinrenskers opened this issue Feb 6, 2021 · 6 comments
Labels
extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request.

Comments

@kevinrenskers
Copy link

When I am parsing 64 Markdown articles with Python-Markdown, it was taking a little bit over 4 seconds, 3.1 of which was from the SmartyPants extension. It's so much slower than any other extension. Not sure if this is known or not, if you consider it a bug, but wanted to bring it to your attention and ask if this is expected.

@waylan
Copy link
Member

waylan commented Feb 6, 2021

This does not surprise me. I always objected to Smartypants being a Markdown extension instead of an external post processor. It was never designed to be performant. You have 2 options:

  1. Do the work yourself to improve it and submit a PR.
  2. Use an external post processor which runs on the output of Markdown.

@oprypin
Copy link
Contributor

oprypin commented Mar 31, 2021

Can you provide an example site on which it is slow?

@kevinrenskers
Copy link
Author

No, I ended up removing Python-Markdown because of this problem. But it should be rather trivial to reproduce: try to parse a substantial (2000 word) Markdown document 60 times, and time it with and without using the SmartyPants extension.

@oprypin
Copy link
Contributor

oprypin commented Apr 4, 2021

I have looked into this, and the implementation is indeed fundamentally slow. Some small things can be done, but no real solution in sight.

  1. It takes the whole paragraph, splits it into 3 parts (before, substitution, after), then reconstructs it once per each substitution.
    return "{}{}{}".format(data[:start],
    placeholder, data[end:]), True, 0
  2. After each substitution happens, it goes back to the beginning of the string and tries to apply each of the regexes again. So basically each regex (of which there are around 20, both from Smarty and the core ones) is executed once per substitution.
    for match in pattern.getCompiledRegExp().finditer(data, startIndex):
    node, start, end = pattern.handleMatch(match, data)

Neither of the two is the clear bigger bottleneck.

This behavior is basically how full extensibility via plugins is ensured, and trying to rework or even just trying to optimize the lookbacks risks subtly breaking any of the downstream plugins.


But all I've said so far applies generally to all InlineProcessors (of which there are very many for various core features).
The reason that Smarty is particularly affected by this, then, is two-fold:

  1. Probably just prone to a lot of small replacements? That's minor, though.
  2. Its regular expressions seem to be very numerous and very heavy on the engine.

So, the small things that can be done:

  1. Just plain regex optimization.
  2. Maybe reducing the number of regexes by joining them all into one
    • I have actually tried this and it did not help much, though.
  3. Various micro-optimizations that don't change any logic.

@mitya57
Copy link
Collaborator

mitya57 commented Apr 6, 2021

I am the original author of this extension, but unfortunately I don’t have enough time to dig into this. So the only thing I can say is that PRs are welcome.

@waylan waylan added extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request. labels Nov 3, 2021
@raylutz
Copy link

raylutz commented Sep 17, 2022

I find that markdown.markdown (md to html conversion) is bogged down due to embedded html with < or > enclosing it. It seems the algorithm is slow because it does character by character parsing rather than perhaps looking for '<' and '>' in regex and then marking the location in the file so it can be parsed and apparently removed and then reinserted. I am considering working around this issue by inserting the html after the conversion. Not sure if this has anything to do with "SmartyPants", I am speaking about markdown.markdown().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request.
Projects
None yet
Development

No branches or pull requests

5 participants