Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeHilite: expand option support #334

Closed
mingos777 opened this issue Aug 21, 2014 · 14 comments
Closed

CodeHilite: expand option support #334

mingos777 opened this issue Aug 21, 2014 · 14 comments
Labels
extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request.

Comments

@mingos777
Copy link

I would like to use Python Markdown in conjunction with CodeHilite plugin + Pygments to highlight some PHP code snippets, often one-liners. Currently, CodeHilite doesn't allow any lexer configuration (only some formatter configuration).

Pygments' default PHP lexer has the option startinline, which is set to False by default. This means that if I want any PHP code highlighted, I must start it with <?php. This is not very pretty (again: think about one-liners) and I would rather have Pygments instructed to switch this setting to True.

Currently it is not possible to pass in any lexer options to CodeHilite. It would be nice to be able to add lexer configuration to the lexer name, e.g.:

:::php(startinline=True)
@waylan
Copy link
Member

waylan commented Aug 21, 2014

@mingos777, thanks for the report. I wasn't aware of Pygments behavior here.

I suppose Pygments is expecting (X)HTML with <?php ...> interspersed within it. The startinline (strange name for it btw) keyword tells it we have PHP code only. Is this keyword used by any other lexers?

My recollection is that in the past Pygments had multiple lexers for PHP, one which was for PHP only and another for PHP embedded in (X)HTML. That way, you could just specify the different lexer - no need to configure it. But is has been awhile. Is that no longer the case?

I hate to add a confiig option that is only useful for a single lexer, but I can see how this could be useful. What does the PHP lexer do if startinline=False and (X)HTML with <?php ...> embedded in it? In other words, could we always have startinline=False for the PHP lexer? if so, could we pass that into all Lexers (without error even if they don't know about it)?

And for the record, I just checked and the behavior is the same with the fenced code extension as it is with CodeHilite on indented code blocks.

@mitya57
Copy link
Collaborator

mitya57 commented Aug 21, 2014

@waylan The PHP lexer is not the only Pygments lexer that supports configs, most other lexers do.

Some time ago I have written a patch for Docutils to add configs support, you can probably reuse that code

@mingos777
Copy link
Author

@waylan As far as I'm aware, only the (pure) PHP lexer uses this option. Options shared by all lexers are stripnl (strip leading/trailing newlines), stripall (trim the input string), ensurenl (make sure the input ends with a newline), tabsize (self-explanatory and IMO very useful) and encoding. Several lexers have an extra set of options - this is the case for PHP.

So no, setting startinline for all lexers makes little sense, as it's only used by one lexer (and will probably be used by a Hack lexer as soon as it's added, as Hack code starts with <?hh). I thought about such an option initially, but ultimately, I believe that a possibility to manually configure the lexer would be more flexible.

You are right about other PHP lexers - there are four or five more for mixing PHP with other languages, such as JavaScript or XML. I'm talking about the one for PHP only.

@waylan
Copy link
Member

waylan commented Aug 21, 2014

Wait, so the PHP-only Lexer requires either <?php ...> orstartinline=True` setting? That seems like a problem with Pygments that they should fix to me.

So let me try this again: What does the PHP lexer do if startinline=True and the code is wrapped in <?php ...>? In other words, if we special-cased the PHP lexer and passed in startinline=True even when the user didn't ask for it, will that still work?

The point is, I'm very averse to adding more bloat. I'm already unsure that we should have added anything except the lexer name to begin with. It is apparent that doing to has created a slippery slope.

That said, if we really much allow the user to pass in lexer options, an adaptation of @mitya57's Docutils patch seems like the way to go. However, we should never raise an error if a document author types a bad option key, as that could bring down the whole server in some common server setups. Not sure how that should work. After catching the error. do we run the same lexer without the options, or fall back to the TextLexer, or skip highlighting altogether for that block?

@mingos777
Copy link
Author

When using <?php AND setting startinline=True, the behaviour is NOT correct, I'm afraid. < and ? are interpreted as Operators and php becomes a Name.Other. Instead, <?php should be highlighted as a Comment.Preproc. So no, special casing the PHP lexer will fix one situation, but cause a misbehaviour in the other.

I do agree that more bloat is an unwanted addition. However, if using Pygments from within CodeHilite only allows a subset of functionalities of Pygments, feature request such as mine are pretty much inevitable.

Raising errors when bad lexer options are passed in is indeed a no-go. My personal preference would be to fall back to TextLexer because it would immediately indicate that there's an issue with the highlighter.

@waylan
Copy link
Member

waylan commented Aug 22, 2014

I'm still very resistant to adding any option support to the lexers. My knee-jerk reaction is to say this is a problem upstream... But I understand why Pygments is doing it that way.

What if we special-cased the PHP lexer so that when it is detected, we check for <?php at the start of the code block. If it is not found, then we set startinline=True. That way, it just does "the right thing." As this is the only lexer with that option, it shouldn't be too much of a maintenance headache. Perhaps something like (untested):

if lang == "php" and not src.strip().startswith('<?php'):
    lexer = get_lexer_by_name(lang)(startinline=True)
else:
    lexer = get_lexer_by_name(lang)

And to be clear, this would only apply when the language is explicitly defined - so it won't work when guess_lexer() is used.

I realize that this could guess wrong if src.strip().startswith('<?php') fails to detect the use of <?php. Or would this be a problem for people who are using the "php" lexer when they should be using the 'html+php' lexer? Gah! I think I just talked myself out of this. Any input?

@mingos777
Copy link
Author

@waylan This is not the way I imagined it, but I must admit that your solution not only solves the problem with the PHP lexer, but also introduces no new syntax to learn. I believe this is the best approach.

@waylan
Copy link
Member

waylan commented Aug 22, 2014

Or would this be a problem for people who are using the "php" lexer when they should be using the 'html+php' lexer?

I guess we can't force people to use the 'html+php' lexer. Perhaps they want the PHP code to stand out from the HTML and therefore do not want the HTML to be highlighted. Never mind it could break existing documents. For example the following code sample (which may or may not be valid PHP -- it's been a while) would trigger startinline=True with my proposed solution if the "php' lexer was used rather than 'html+php' and that is clearly not correct:

<p>Some HTML with some <?php print('PHP');></p>

The only options appear to be (1) offer general support to lexer options (as per @mitya57's Docutils patch), or (2) do nothing and require all PHP one-liners to include <?php. Personally, I'm leaning toward the later, but if I was writing documents with PHP one-liners I would absolutely insist on the former.

This is one of many reasons why I favor JavaScript highlighting libraries (see this, this and this - Warning! they are all a little outdated).

@mingos777
Copy link
Author

I can see your point (1), and I'm glad to see you understand mine (2).

Maybe it's possible to create an extended plugin for Python Markdown, based on CodeHilite, but with the addition of lexer options support? This way people already using the original plugin will not have an extra (unwanted/potentially dangerous?) feature enabled, but if they need it, they can switch to the new plugin (--extension=codehilite2)?

If you decide against adding any features, my best option will indeed be to use a JS highlighter. Not that I'm against it, but I've yet to see a code highlighter as good as Pygments :).

@waylan waylan added feature Feature request. someday-maybe Approved low priority request. labels Sep 26, 2014
@waylan waylan modified the milestone: Version 3.0 Mar 9, 2015
@waylan waylan changed the title CodeHilite plugin: lexer config CodeHilite: expand option support Mar 9, 2015
@waylan
Copy link
Member

waylan commented Mar 9, 2015

For the record, I have assigned this to the 3.0 milestone (see the roadmap in #391) and am using this as a general issue for CodehHlite. The intention is to expand/refactor the extension to accept any present/future config options of Pygments and simply pass them through. I'm not sure how that will look just yet, but suggests are certainly welcome. I've also updated the title to better reflect this change.

@anlutro
Copy link

anlutro commented Jul 21, 2015

I'm looking for a workaround for this. Is there a way to change the PHP lexer's default options?

@waylan
Copy link
Member

waylan commented Jul 21, 2015

@anlutro at this time the only workaround would be to fork/hack the existing Extension to meet your needs.

@anlutro
Copy link

anlutro commented Jul 21, 2015

Bummer :(

@anlutro
Copy link

anlutro commented Aug 10, 2015

This was a lot of work! I basically gave up on trying to extend existing classes and just copy-pasted the existing extension modules.

https://github.com/autarky/docs/blob/master/generate.py#L10-L13
https://github.com/autarky/docs/tree/master/md_extension
https://github.com/autarky/docs/blob/master/md_extension/codehilite.py#L107-L120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Related to one or more of the included extensions. feature Feature request. someday-maybe Approved low priority request.
Projects
None yet
Development

No branches or pull requests

4 participants