Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholders are not replaced #222

Closed
TomLottermann opened this issue Jun 4, 2013 · 8 comments
Closed

Placeholders are not replaced #222

TomLottermann opened this issue Jun 4, 2013 · 8 comments
Labels
bug Bug report.

Comments

@TomLottermann
Copy link

Hi!
I have some issues with a plugin I wrote for python-markdown. When I store a placeholder on the htmlStash it is not replaced afterwards. I would see the problem on my side, but it worked in version 2.2.1 of python-markdown.

Here is the code of my extension:

from markdown import Extension
from markdown.preprocessors import Preprocessor
import re

import logging

# Global vars

# this regex looks for characters before or after the $$ $$ wrapping. If there is one: it is an inline formula!
MATHJAX_PATTERN_RE = re.compile( \
    r'(?P<start_char>\S)?(?P<start_whitespace>[^\S\r\n]*)(?P<fence>\${2,})\s*(?P<formula>.*?)\s*(?P=fence)(?P<end_whitespace>[^\S\r\n]*)(?P<end_char>\S)?',
    re.MULTILINE|re.DOTALL
    )

CLEAN_MULTILINE_WRAP = '$$$$%s$$$$'

CLEAN_INLINE_WRAP = '$$$%s$$$'

logger = logging.getLogger(__name__)

class MathJaxExtension(Extension):

    def extendMarkdown(self, md, md_globals):
        """ Add MathJaxPreprocessor to the Markdown instance. """
        md.registerExtension(self)

        md.preprocessors.add('mathjax_block',
                                 MathJaxPreprocessor(md),
                                 "_begin")


class MathJaxPreprocessor(Preprocessor):

    def run(self, lines):
        """ Match and store Fenced Formula Blocks in the HtmlStash.
        Makes sure to make a difference between inline and multiline formulas. """
        text = "\n".join(lines)
        while 1:
            m = MATHJAX_PATTERN_RE.search(text)
            if m:
                formula = self._escape(m.group('formula'))

                start_char = m.group('start_char')
                start_whitespace = m.group('start_whitespace')
                end_whitespace = m.group('end_whitespace')
                end_char = m.group('end_char')

                if start_char != None or end_char != None:
                    wrapped_formula = CLEAN_INLINE_WRAP % (formula, )
                else:
                    wrapped_formula = CLEAN_MULTILINE_WRAP % (formula, )

                # Mark formula as save
                placeholder = self.markdown.htmlStash.store(wrapped_formula, safe=True)
                text = '%s%s%s%s%s%s%s'% (text[:m.start()], start_char or '', start_whitespace, placeholder, end_whitespace, end_char or '', text[m.end():])
            else:
                break

        return text.split("\n")

    def _escape(self, txt):
        """ basic html escaping """
        txt = txt.replace('&', '&amp;')
        txt = txt.replace('<', '&lt;')
        txt = txt.replace('>', '&gt;')
        txt = txt.replace('"', '&quot;')
        return txt


def makeExtension(configs=None):
    return MathJaxExtension(configs=configs)

The output of using it is the following:

>>> markdown('$$ \sqrt{4} $$', extensions=[mdx_mathjax.MathJaxExtension()])
u'<p>wzxhzdk:0</p>'

What could be the reason for that?

Cheers,
Thomas

@mitya57
Copy link
Collaborator

mitya57 commented Jun 15, 2013

This looks very similar to the problem I ran into while I was trying to implement SmartyPants extension (see comments in #12).

@TomLottermann
Copy link
Author

@mitya57: I can't seem to find the answer to my question there.
Might the reason be, that I pass a simple string to the htmlstash? What should I pass instead? And why did it work before?
The problem with my plugin is, that I am not able to use inlinepatterns, because formulas could extend over multiple lines...

@mitya57
Copy link
Collaborator

mitya57 commented Jun 15, 2013

I didn't yet have time to debug my issue because of some exams, sorry — let's wait on @waylan to reply :)

P.S. Inline patterns do work with multiple lines, see how it is implemented in pymarkups for example.

@waylan
Copy link
Member

waylan commented Jun 16, 2013

@TomLottermann the problem you are having is related to the change made @ 3a1806b. Whitespace normalization was moved to a preprocessor in that commit. As you are setting your preprocessor to be the first one (_begin), yours is running before whitespace normalization - which strips part of the placeholder out. \u0002wzxhzdk:0\u0003 becomes wzxhzdk:0 (\u0002 and \u0003 are whitespace after all) and so when placeholders are later swapped out, your placeholder fails to match.

The fix is to make sure your preprocessor runs after normalize_whitespace.

As a side note, I would look again at using inline patterns. An inline pattern is matched against the entire text inside any block tag. So as long as your stuff never has a blank line in it (which would split it into multiple blocks), inline patterns should work. If we need to add the re.MULTILINE flag here, let me know and I'll look into it. If doing so breaks something else (I don't recall off hand), you could always subclass markdown.inlinepatterns.Pattern and override the default behavior for your pattern only.

As there is no bug in markdown here, I'm closing this issue.

@serser
Copy link

serser commented Sep 13, 2017

@waylan, how to ensure the preprocessor runs after normalize_whitespace? what are the options when we register an extension?

md.preprocessors.add('toc',
                      TocPreprocessor(md),
                      '_begin')

@serser
Copy link

serser commented Sep 13, 2017

Did with this,

md.preprocessors.add('toc',
                      TocPreprocessor(md),
                      '>normalize_whitespace')

However, my heading are shown as <p> other than <h1> . I gonna rethink how to do with it.

asfgit pushed a commit to apache/allura that referenced this issue Oct 12, 2017
Would be good to upgrade much further, but there are various regressions,
mostly with markdown internal placeholders showing up in the output, which is
no good.  This starts at 2.3 and perhaps is because of our own extensions
needing update, but also I think due to handling invalid markdown (e.g.
markdown within html) differently -- but I don't want that to break.  More info
at Python-Markdown/markdown#458 and
Python-Markdown/markdown#222
@nicbou
Copy link
Contributor

nicbou commented Feb 13, 2023

I want to confirm that this is still valid in 2022. Thank you for saving me a lot of debugging! However, ">normalize_whitespace" no longer works. I used a hard-coded "29", which if my math is correct, is one below "30".

https://python-markdown.github.io/extensions/api/#registry.register

@waylan
Copy link
Member

waylan commented Feb 13, 2023

@nicbou while the API for registering processors is different, the basic concept still applies, yes.

ShikiOkasaka added a commit to esrille/ibus-hiragana that referenced this issue Jun 8, 2024
See Python-Markdown/markdown#222

docs_md/en/md2html.py
docs_md/md2html.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report.
Projects
None yet
Development

No branches or pull requests

5 participants