Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor fenced code attributes #816

Merged
merged 26 commits into from
Jun 23, 2020
Merged

Refactor fenced code attributes #816

merged 26 commits into from
Jun 23, 2020

Conversation

waylan
Copy link
Member

@waylan waylan commented Apr 12, 2019

Addresses #775. A work in progress.

The behavior as of the first commit is that a single string (with an optional dot as first character), defines the language of the block:

With dot:

``` .lang

Without dot:

``` lang

If more than one thing is to be defined, you must use an attribute list (wrapped in curly brackets ({}). The first class (startswith a dot) is the language. All other classes are simply set on the rendered code element. The hash tag is the id and all key=value pairs are codehilite config values. You cannot define HTML attributes with key=value pairs.

``` {.lang .otherclass #id hl_lines='1 3'}

You can even set other config values that normally are global (per document) on a per block basis:

```{.lang linenums=true}

Although, actually that might not work (untested at this point) as we don't (yet) convert the string to a Boolean.

However, we have a backward incompatibility in that the existing tests include the following 2 cases where a key=value pair is used, but not in an attribute list:

 ```hl_lines="1 3"

and

 ```.python hl_lines="1 3"

With these changes, those will no longer be valid. In fact, the two tests are failing. This better matches the behavior of the PHP implementation we claim to copy. But it could break existing documents. We can't even claim it worked by accident as we explicitly tested for it.

@waylan waylan added the work-in-progress A partial solution. More changes will be coming. label Apr 12, 2019
@waylan
Copy link
Member Author

waylan commented Dec 30, 2019

I resolved conflicts and added back in support for hl_lines outside of an attr_list for backward-compatibility.

@waylan
Copy link
Member Author

waylan commented Dec 31, 2019

Got as far as I can with this without also modifying the codehilite extension, but that will be addressed in #822.

This still needs documented.

@waylan
Copy link
Member Author

waylan commented Dec 31, 2019

I'm using this for both fenced_code changes and codehilite changes. In other words this replaces/overrides #822. In the latest commit, I've made the changes to codehilite and updated the existing tests. However, I need to add more tests for the new behavior and document everything.

@waylan
Copy link
Member Author

waylan commented Jun 8, 2020

I need to confirm if {class="some string of classes"} works correctly. It does for attr_list so it should here.

@facelessuser
Copy link
Collaborator

Seems the answer is in your own comment:

The hash tag is the id and all key=value pairs are codehilite config values. You cannot define HTML attributes with key=value pairs.

@waylan
Copy link
Member Author

waylan commented Jun 8, 2020

@facelessuser you make a good point. I was thinking that this should reflect the behavior we find in attr_lists, but if key=value pairs are not used for HTML attributes, then that doesn't make sense. So, never mind.

@waylan
Copy link
Member Author

waylan commented Jun 8, 2020

From @wlupton in #977:

I guess hl_lines is special.

Yes, but only for backward compatibility. Originally, only the language was supported. At some point we added support for hl_lines. Now that we are adding support for all of pygments options, hl_lines does not make sense in its current form. So as not to break existing documents, we continue to support it outside of brackets. Of course, you can define the language and hl_lines within brackets like anything else. Therefore, we may deprecate support outside of brackets in the future.

Is there any way that the attribute handling could be more common across extensions, e.g., I think that pandoc attributes are the same wherever they're supported (and one of the places where they're supported is fenced code blocks). From https://pandoc.org/MANUAL.html:

Optionally, you may attach attributes to fenced or backtick code block using this syntax:

~~~~ {#mycode .haskell .numberLines startFrom="100"}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here mycode is an identifier, haskell and numberLines are classes,
and startFrom is an attribute with value 100. Some output formats can
use this information to do syntax highlighting.

[...] the code block above will appear as follows:

<pre id="mycode" class="haskell numberLines" startFrom="100">
  <code>
  ...
  </code>
</pre>

In fact (and this is probably the last thing you want to hear!) is there any way that the fenced code block extension could depend on the attr_list extension for processing its attribute lists?

Actually the list parser in the attr_list extension is imported and used by fenced-code as of this PR. However, it is only used for anything defined within the brackets. But as only a single-word language and hl_lines are supported outside of brackets (as explained above), for all intents and purposes, it is an attr_list.

However, it does differ from an attr_list in at least one way. As the key=value pairs are used as Pygments options, it does not make sense to include them in the syntax highlighted output. The exception being classes, which can be defined with keys. We need a way to differentiate between values that should be passed to Pygments, and values which should be assigned as attributes. Additionally, Pygments only provides a way to assign classes. Therefore, as many .classnames as are defined will get added to the class attribute. IDs get ignored because Pygments doesn't provide that option. Everything else is a Pygments option which effects the output, but doesn't get included in the output.

Now, that behavior is dependent on syntax highlighting being enabled (codehilite extension is enabled). Without syntax highlighting, then the ID gets set on the pre tag and class gets set on the code tag. I suppose we could also assign key=value pairs. However, in its current state, that doesn't happen. key=value pairs are simply ignored. This was done to match PHP's behavior of only allowing class and id to be set. After all, we claim to imitate that implementation. It also happens to match GitHub's implementation.

However, an argument could be made that including random attributes in the output would allow JavaScript highlighting libs to make use of them. And we have Pandoc as another implementation which already supports that. We also have received bug reports from time to time from users who expected any attribute to work if/when the attr_list is enabled. I suppose we could change behavior based on that by only assigning id and class by default, but assigning all defined attributes if attr_list is also enabled (and code highlighting is off).

@waylan
Copy link
Member Author

waylan commented Jun 8, 2020

We need a way to differentiate between values that should be passed to Pygments, and values which should be assigned as attributes.

I should point out that I considered hard-coding the options that Pygments accepts and passing only those to Pygments, but setting everything else as attributes in the HTML. However, there are a few issues with that:

  1. Pygments doesn't provide an option to include random attributes in its output anyway.
  2. Pygments adds new options from time to time and I don't want to create a situation where I need to update my list of options every them they add a new one.

Given the above, it makes more sense to simply pass everything blindly to Pygments. That way we automatically support any new pygments feature and still don't loose any functionality.

of course, all of that is dependent on code highlighting being turned on. When it's off, we are open to considering other options.

@waylan waylan force-pushed the fence branch 2 times, most recently from bb85631 to 86d547c Compare June 9, 2020 16:01
@waylan
Copy link
Member Author

waylan commented Jun 9, 2020

I just added support for attr_list, which follows the following rules:

  1. All key/value pairs must be defined within curly brackets (except for hl_lines which may be deprecated in the future).
  2. If code highlighting is enabled, the attr_list extension is ignored. All key/value pairs are passed to Pygments as config options.
  3. If code highlighting is disabled and the attr_list extension is enabled, all key/value pairs (except use_pygments; see item 7, below) are assigned as HTML attributes on the code tag.
  4. If both code highlighting and the attr_list extension are disabled, all key/value pairs are ignored. Only #id and .class are used (with or without curry brackets).
  5. You can disable code highlighting in multiple ways:
    1. Do not have Pygments installed.
    2. Set use_pygments=False on the extension config (globally).
    3. Use { use_pygments=False } on an individual code block.
  6. Even if code highlighting is disabled globally (as described in 5.i or 5.ii above), setting { use_pygments=True } on an individual code block will cause key/value pairs to not be included at HTML attributes. It is assumed that they are Pygments config options.
  7. The use_pygments key is always removed from the list after its value is evaluated and is therefore never included in the output.

@facelessuser
Copy link
Collaborator

(except for hl_lines which may be deprecated in the future).

Can you elaborate on this a bit more?

@waylan
Copy link
Member Author

waylan commented Jun 9, 2020

(except for hl_lines which may be deprecated in the future).

Can you elaborate on this a bit more?

@facelessuser, as I explained above:

Yes, but only for backward compatibility. Originally, only the language was supported. At some point we added support for hl_lines. Now that we are adding support for all of pygments options, hl_lines does not make sense in its current form. So as not to break existing documents, we continue to support it outside of brackets. Of course, you can define the language and hl_lines within brackets like anything else. Therefore, we may deprecate support outside of brackets in the future.

Perhaps I should clarify by pointing out that if brackets are being used, then everything must be defined within them. When I say "outside of brackets" I mean no brackets are being used at all. In other words, at this time the following are both valid:

``` .lang hl_lines=3
``` { .lang hl_lines=3 }

In the future the first of those may be deprecated for consistency. In fact, I expect to not document that hl_lines can be defined outside of brackets. Users who already have documents which include it won't notice as their documents will continue to render the same. And everyone else won't even know its an option unless they read the source code and/or this discussion.

Finally, when I say it may be deprecated in the future, I am using "may" because I haven't decided for sure (perhaps "might" would have been a better word choice). If it does happen, it would first be 'pending deprecation' to give users an opportunity to update their documents. I'm not sure how that would work though, as we don't generally raise warnings while parsing a document. Warnings are generally only raised when initializing the Markdown class. And if it later becomes fully deprecated, that would mean the entire code block would not be recognized as one. Maybe some future pending deprecation would mean that hl_lines outside of brackets is simply ignored. I guess I haven't decided, in part, because I haven't figured out how to go about it.

Regardless, at this time I only intend to exclude mentioning the option in the docs. Any additional steps towards deprecation would be in a future release if they happen at all.

@waylan
Copy link
Member Author

waylan commented Jun 11, 2020

I have finished updating the docs for fenced code. However, I just realized that I never expanded the codehilite config options. I only expanded the internal API which fenced_code calls. We need to expand the global options as well.

@wlupton
Copy link

wlupton commented Jun 12, 2020

Sorry for the delayed response, but re 2 above:

  1. If code highlighting is enabled, the attr_list extension is ignored. All key/value pairs are passed to Pygments as config options.

I wondered whether in this case it would be possible for Pygments to consume ("eat") the options that it understands, leaving the remainder to be available on the node, e.g., to be included as HTML attributes. This would seem to be quite a Pythonic approach (ref. Python's super() considered super!, which is referenced from the Python super() docs).

@waylan
Copy link
Member Author

waylan commented Jun 12, 2020

I wondered whether in this case it would be possible for Pygments to consume ("eat") the options that it understands, leaving the remainder to be available on the node, e.g., to be included as HTML attributes.

I have no idea how this would work. Our code has no knowledge of the arguments which Pygments accepts. We simply blindly pass all arguments on to Pygments. It just so happens that Pygments was designed so that it ignores arguments it doesn't understand (rather than raising an error).

Regardless, as I put it in the docs:

The fenced_code extension does not alter the output provided by Pygments. Therefore, only options which Pygments provides can be ultilized. As Pygments does not currently provide a way to define an ID, any ID defined in an attribute list will be ignored when syntax highlighting is enabled. Additionally, any key/value pairs which are not Pygments options will be ignored, regardless of whether the attr_list extension is enabled.

In other words, I am placing the blame squarely on Pygments. And I have no interest in modifying Pygments output. If someone wants to do that, they can create their own third-party extension.

If anyone is going to argue that the current behavior is inconsistent, then the only change I would consider is to only accept Pygments arguments and never accept anything else.

@wlupton
Copy link

wlupton commented Jun 13, 2020

Thanks. Makes sense. Re this:

It just so happens that Pygments was designed so that it ignores arguments it doesn't understand (rather than raising an error).

I have not looked at Pygments but at least in theory it could return the arguments that it hasn't consumed.

@waylan
Copy link
Member Author

waylan commented Jun 13, 2020

I have not looked at Pygments but at least in theory it could return the arguments that it hasn't consumed.

It does not. Perhaps a subclass could address that, but I'm not interested in going down that road.

@waylan
Copy link
Member Author

waylan commented Jun 14, 2020

I think this is done and ready to go.

@waylan waylan removed the work-in-progress A partial solution. More changes will be coming. label Jun 14, 2020
@waylan waylan force-pushed the fence branch 3 times, most recently from e30dbb5 to a09c0cc Compare June 23, 2020 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved The pull request is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multiple classes for fenced code blocks
4 participants