Placeholders are not replaced #222

TomLottermann · 2013-06-04T08:52:57Z

Hi!
I have some issues with a plugin I wrote for python-markdown. When I store a placeholder on the htmlStash it is not replaced afterwards. I would see the problem on my side, but it worked in version 2.2.1 of python-markdown.

Here is the code of my extension:

from markdown import Extension
from markdown.preprocessors import Preprocessor
import re

import logging

# Global vars

# this regex looks for characters before or after the $$ $$ wrapping. If there is one: it is an inline formula!
MATHJAX_PATTERN_RE = re.compile( \
    r'(?P<start_char>\S)?(?P<start_whitespace>[^\S\r\n]*)(?P<fence>\${2,})\s*(?P<formula>.*?)\s*(?P=fence)(?P<end_whitespace>[^\S\r\n]*)(?P<end_char>\S)?',
    re.MULTILINE|re.DOTALL
    )

CLEAN_MULTILINE_WRAP = '$$$$%s$$$$'

CLEAN_INLINE_WRAP = '$$$%s$$$'

logger = logging.getLogger(__name__)

class MathJaxExtension(Extension):

    def extendMarkdown(self, md, md_globals):
        """ Add MathJaxPreprocessor to the Markdown instance. """
        md.registerExtension(self)

        md.preprocessors.add('mathjax_block',
                                 MathJaxPreprocessor(md),
                                 "_begin")


class MathJaxPreprocessor(Preprocessor):

    def run(self, lines):
        """ Match and store Fenced Formula Blocks in the HtmlStash.
        Makes sure to make a difference between inline and multiline formulas. """
        text = "\n".join(lines)
        while 1:
            m = MATHJAX_PATTERN_RE.search(text)
            if m:
                formula = self._escape(m.group('formula'))

                start_char = m.group('start_char')
                start_whitespace = m.group('start_whitespace')
                end_whitespace = m.group('end_whitespace')
                end_char = m.group('end_char')

                if start_char != None or end_char != None:
                    wrapped_formula = CLEAN_INLINE_WRAP % (formula, )
                else:
                    wrapped_formula = CLEAN_MULTILINE_WRAP % (formula, )

                # Mark formula as save
                placeholder = self.markdown.htmlStash.store(wrapped_formula, safe=True)
                text = '%s%s%s%s%s%s%s'% (text[:m.start()], start_char or '', start_whitespace, placeholder, end_whitespace, end_char or '', text[m.end():])
            else:
                break

        return text.split("\n")

    def _escape(self, txt):
        """ basic html escaping """
        txt = txt.replace('&', '&amp;')
        txt = txt.replace('<', '&lt;')
        txt = txt.replace('>', '&gt;')
        txt = txt.replace('"', '&quot;')
        return txt


def makeExtension(configs=None):
    return MathJaxExtension(configs=configs)

The output of using it is the following:

>>> markdown('$$ \sqrt{4} $$', extensions=[mdx_mathjax.MathJaxExtension()])
u'<p>wzxhzdk:0</p>'

What could be the reason for that?

Cheers,
Thomas

The text was updated successfully, but these errors were encountered:

mitya57 · 2013-06-15T11:57:36Z

This looks very similar to the problem I ran into while I was trying to implement SmartyPants extension (see comments in #12).

TomLottermann · 2013-06-15T12:33:46Z

@mitya57: I can't seem to find the answer to my question there.
Might the reason be, that I pass a simple string to the htmlstash? What should I pass instead? And why did it work before?
The problem with my plugin is, that I am not able to use inlinepatterns, because formulas could extend over multiple lines...

mitya57 · 2013-06-15T14:06:27Z

I didn't yet have time to debug my issue because of some exams, sorry — let's wait on @waylan to reply :)

P.S. Inline patterns do work with multiple lines, see how it is implemented in pymarkups for example.

waylan · 2013-06-16T21:32:26Z

@TomLottermann the problem you are having is related to the change made @ 3a1806b. Whitespace normalization was moved to a preprocessor in that commit. As you are setting your preprocessor to be the first one (_begin), yours is running before whitespace normalization - which strips part of the placeholder out. \u0002wzxhzdk:0\u0003 becomes wzxhzdk:0 (\u0002 and \u0003 are whitespace after all) and so when placeholders are later swapped out, your placeholder fails to match.

The fix is to make sure your preprocessor runs after normalize_whitespace.

As a side note, I would look again at using inline patterns. An inline pattern is matched against the entire text inside any block tag. So as long as your stuff never has a blank line in it (which would split it into multiple blocks), inline patterns should work. If we need to add the re.MULTILINE flag here, let me know and I'll look into it. If doing so breaks something else (I don't recall off hand), you could always subclass markdown.inlinepatterns.Pattern and override the default behavior for your pattern only.

As there is no bug in markdown here, I'm closing this issue.

serser · 2017-09-13T03:18:23Z

@waylan, how to ensure the preprocessor runs after normalize_whitespace? what are the options when we register an extension?

md.preprocessors.add('toc',
                      TocPreprocessor(md),
                      '_begin')

serser · 2017-09-13T03:27:05Z

Did with this,

md.preprocessors.add('toc',
                      TocPreprocessor(md),
                      '>normalize_whitespace')

However, my heading are shown as <p> other than <h1> . I gonna rethink how to do with it.

Would be good to upgrade much further, but there are various regressions, mostly with markdown internal placeholders showing up in the output, which is no good. This starts at 2.3 and perhaps is because of our own extensions needing update, but also I think due to handling invalid markdown (e.g. markdown within html) differently -- but I don't want that to break. More info at Python-Markdown/markdown#458 and Python-Markdown/markdown#222

nicbou · 2023-02-13T18:08:20Z

I want to confirm that this is still valid in 2022. Thank you for saving me a lot of debugging! However, ">normalize_whitespace" no longer works. I used a hard-coded "29", which if my math is correct, is one below "30".

https://python-markdown.github.io/extensions/api/#registry.register

waylan · 2023-02-13T18:14:17Z

@nicbou while the API for registering processors is different, the basic concept still applies, yes.

See Python-Markdown/markdown#222 docs_md/en/md2html.py docs_md/md2html.py

waylan closed this as completed Jun 16, 2013

jamiemcg mentioned this issue Aug 6, 2016

code tag inside HTML tag jamiemcg/Remarkable#30

Open

ShikiOkasaka added a commit to esrille/ibus-hiragana that referenced this issue Jun 8, 2024

Update markdown processor

0a30c52

See Python-Markdown/markdown#222 docs_md/en/md2html.py docs_md/md2html.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Placeholders are not replaced #222

Placeholders are not replaced #222

TomLottermann commented Jun 4, 2013

mitya57 commented Jun 15, 2013

TomLottermann commented Jun 15, 2013

mitya57 commented Jun 15, 2013

waylan commented Jun 16, 2013

serser commented Sep 13, 2017 •

edited

Loading

serser commented Sep 13, 2017 •

edited

Loading

nicbou commented Feb 13, 2023 •

edited

Loading

waylan commented Feb 13, 2023

Placeholders are not replaced #222

Placeholders are not replaced #222

Comments

TomLottermann commented Jun 4, 2013

mitya57 commented Jun 15, 2013

TomLottermann commented Jun 15, 2013

mitya57 commented Jun 15, 2013

waylan commented Jun 16, 2013

serser commented Sep 13, 2017 • edited Loading

serser commented Sep 13, 2017 • edited Loading

nicbou commented Feb 13, 2023 • edited Loading

waylan commented Feb 13, 2023

serser commented Sep 13, 2017 •

edited

Loading

serser commented Sep 13, 2017 •

edited

Loading

nicbou commented Feb 13, 2023 •

edited

Loading