Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change $base to $self #185

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Change $base to $self #185

wants to merge 6 commits into from

Conversation

kylebarron
Copy link

@kylebarron kylebarron commented Jul 2, 2018

The use of 'include': '$base' causes LaTeX syntax highlighting to not work correctly when embedded in another grammar. The short summary of the difference between them, from here, is:

$self points to the grammar $self appears in (points to itself), whereas $base points to the base language of the file, which could be anything.

This means that when language-latex is embedded within language-markdown, every time an 'include': '$base' occurs, Markdown highlighting is included instead of LaTeX highlighting. Ugliness ensues.

Description of the Change

All occurrences of

'include': '$base'

were replaced with

'include': '$self'

Alternate Designs

There is no other way to prevent any incorrect highlighting from the $base file. A compromise that could theoretically work, but does not in practice is using

{
  'include': '$self'
}
{
  'include': '$base'
}

This does not work for this situation because Markdown highlighting is still injected where there should only be LaTeX highlighting. In this example, the HTML injections (part of Markdown syntax) causes the entire rest of the document to be miscolored:
image

Benefits

Accurate syntax highlighting for grammars that embed LaTeX. I think this is primarily Markdown, however due to the popularity of Pandoc and programs that use it, such as Knitr/R Markdown, this is an important change that would improve a lot of syntax highlighting for math and tables.

Possible Drawbacks

The use of $self instead of $base means that recursive includes happen only within the scope of the inner file, and do not include the rules within the top-level grammar that is embedding the inner file. However with a few included changes, there should be no drawbacks.

Currently, text.tex.latex.beamer and text.tex.latex.memoir include text.tex.latex, and text.tex.latex includes text.tex.

Including of text.tex

text.tex only uses $base once. That is for anything within arbitrary { ... } blocks.

{
'begin': '\\{'
'beginCaptures':
'0':
'name': 'punctuation.section.group.begin.tex'
'end': '\\}'
'endCaptures':
'0':
'name': 'punctuation.section.group.end.tex'
'name': 'meta.group.braces.tex'
'patterns': [
{
'include': '$base'
}
]
}

In order to fix this, I add that rule to the end of text.tex.latex and change both to include: '$self'. This should give near-identical highlighting as now. (I could also include this rule for text.tex.latex.beamer and text.tex.latex.memoir, but this seems unnecessary as those provide few extra rules and are unlikely to be nested within arbitrary { } blocks.)

Including of text.tex.latex

The only other side-effects to watch out for are text.tex.latex.beamer and text.tex.latex.memoir including text.tex.latex. However these side-effects seem rare if not impossible. The extra rules provided within text.tex.latex.beamer and text.tex.latex.memoir seem top-level only, and they would not be used within any of the environments in text.tex.latex that currently employ $base.

For example, you would not have

\begin{equation}
\begin{frame}
\end{frame}
\end{equation}

This means that it's fine to have include: $self and not include: $base in each of these text.tex.latex environments, because they would not need any special rules from the text.tex.latex.memoir or text.tex.latex.beamer grammars.

Applicable Issues

burodepeper/language-markdown#226

Kyle Barron added 3 commits July 2, 2018 11:51
The use of `'include': '$base'` causes LaTeX syntax highlighting to not
work correctly when embedded in another grammar.
- Allows for use of text.tex.latex and text.tex rules within arbitrary
brackets without resorting to using `$base`
@kylebarron
Copy link
Author

@Aerijo

I'm pretty sure I accounted for all possible drawbacks of switching from $base to $self. I can figure why C/C++ would want to use $base, but I think its use is unnecessary here.

@kylebarron
Copy link
Author

With these changes, syntax highlighting when embedded in Markdown works correctly:

image

text:


```latex
\begin{table}[htbp]\centering
\def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
\begin{tabular}{l*{5}{c}}
\toprule
&\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}&\multicolumn{1}{c}{(5)}\\
&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }\\
Variable & 0.00\sym{**} & 0.00\sym{**} & 0.00 & 0.00 & 0.00\sym{**} \\
        & (0) & (0) & (0) & (0) & (0) \\
\addlinespace
Variable & 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}\\
        & (0) & (0) & (0) & (0) & (0) \\
\addlinespace
Constant & 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}\\
        & (0) & (0) & (0) & (0) & (0) \\
\midrule
N       & 0 & 0 & 0 & 0 & 0 \\
\bottomrule
\multicolumn{6}{l}{\footnotesize Standard errors in parentheses}\\
\multicolumn{6}{l}{\footnotesize \sym{*} \(p<0.05\), \sym{**} \(p<0.01\), \sym{***} \(p<0.001\)}\\
\end{tabular}
\end{table}
```

@kylebarron
Copy link
Author

@Aerijo any chance this can get a review? I believe with my edits there are only upsides and no drawbacks.

@Aerijo
Copy link
Collaborator

Aerijo commented Jul 19, 2018

@kylebarron Changing base to self introduces some unwanted side effects. E.g.,

{
\begin{fboxverbatim}
foo
\end{fboxverbatim}
}

When using LaTeX Memoir, the contents was originally verbatim, but the change now treats it as a generic latex environment (because latex grabs hold of the contents of {}. Using base didn't have this issue, because it would point back to memoir.

How many people use memoir, I don't know. This sort of thing applies to everthing else in memoir and beamer as well.

Personally, I believe the better solution is to make a dedicated grammar for embedded latex. It would be bare bones; highlighting commands, math delims (but only the delim itself, not the contents), and avoid any begin/end rules. This way, we still get reasonable highlighting, but no risk of breaking anything.

@kylebarron
Copy link
Author

kylebarron commented Jul 19, 2018

I'll check that out when I get back to my computer. But I just realized that the only thing that needs to change to self from base for it to work with latex is the text.tex.latex file, because that's what Markdown points to. The other special classes can stay the same.

- Allows for use of `$self` instead of `$base` within text.tex.latex.
@kylebarron
Copy link
Author

kylebarron commented Jul 19, 2018

@Aerijo
TL;DR: With some minimal repetition, we can add the 1-3 rules that are broken by this change to the memoir and beamer files, so that the rules within those files are recursively scoped first before descending into LaTeX and TeX rules.


These are the rules from memoir and beamer.

 'begin': '(?:\\s*)((\\\\)begin)(\\{)(framed|shaded|leftbar)(\\})'
 'begin': '(?:\\s*)((\\\\)begin)(\\{)((?:fboxv|boxedv|V)erbatim)(\\})'
 'begin': '(?:\\s*)((\\\\)begin)(\\{)(alltt)(\\})'

 'begin': '(\\\\use(?:color|font|inner|outer)?theme)(?:(\\[)([^\\]]*)(\\]))?(\\{)'
 'begin': '(?:\\s*)((\\\\)begin)(\\{)(frame)(\\})'
 'match': '((\\\\)frametitle)(\\{)(.*)(\\})'

As long as these rules don't appear as patterns inside environments changed to $self, we have no drawbacks. Since these are top-level rules, I assert that they would not exist within any environment that currently uses base except for { ... } and maybe 1 or 2 others (see below). It's trivial to fix this by adding a simple { ... } environment to the beamer and memoir files, such that if the grammar starts with either of those files and sees { ... }, it looks within its own rules before descending into text.tex.latex (see latest commit).

Below I've grouped the rules for which the package currently uses $base recursively. They can be grouped into

  1. programming
  2. environments
  3. literal text
  4. math

The above rules wouldn't appear within math or literal text, so that leaves programming and environments.

I can't imagine any of the above appearing within a named environment or within \ExplSyntaxOn ... \ExplSyntaxOff. Potentially they could occur within the arbitrary \begin{\w+} ... \end{$1} or within \ProvidesExplPackage ... since that rule currently doesn't have an end clause.

If you think either of these two rules could allow recursive memoir or beamer text, I propose adding these two rules, with informative comments, to the memoir and beamer files. It's only ~20 lines of repetition, and then we could satisfy all constituents of the package.


Programming:

  • \ExplSyntaxOn ... \ExplSyntaxOff
  • \ProvidesExplPackage ... until end of file? The clause has no end regex... I'm guessing that's a typo. If that's not a typo, it could probably be better highlighted as a match rather than a begin-end pair. Or at the very least it could end at \begin{document}.

Environments:

  • \begin{align|equation|multline|split|gather|alignat|aligned|gathered|eqnarray|array|tabular|itemize|enumerate|description|list} ... \end{$1}
  • \begin{}... \end{} of any not previously named environment.

Literal text:

  • \marginpar{ ... }
  • \footnote{ ... }
  • \emph{ ... }
  • \textit{ ... }
  • \textbf{ ... }
  • \texttt{ ... }

Math:

  • \( ... \)
  • \[ ... \]

Arbitrary braces clause, currently in text.tex but proposed to be added to text.tex.latex, text.tex.latex.beamer, and text.tex.latex.memoir:

  • { ... }

@kylebarron
Copy link
Author

Hi @Aerijo , I pushed a commit that catches all \begin{}-\end{} environments inside latex-memoir and latex-beamer. So if there's some arbitrary environment that allows a memoir-specific or beamer-specific token to be nested within, it will highlight correctly (i.e. look first within memoir or beamer before descending to latex.cson).

Could you please take a look?

@Aerijo
Copy link
Collaborator

Aerijo commented Aug 28, 2018

@kylebarron I can't think of anything this breaks. But then again, it's late right now and I'm tired. I'll check back in tomorrow and (probably) merge.

My biggest concern was changing self -> base in tex.cson math, but it seems the behaviour is pretty much broken currently anyway. Latex commands won't (and never did) work in $...$ because they get captured by the "generic math command" rule.

@Aerijo
Copy link
Collaborator

Aerijo commented Aug 29, 2018

Yup, the changes to add \\begin{\w+} break all environments. E.g., the equation environment will no longer be scoped as math in a memoir document.

Personally, I strongly believe we should be embedding a customised subset of the language instead. I've been sitting on this for a while because something was breaking, but I finally found and squashed the bug.

The injecting grammar

The injected rules

This way, we can remove begin/end matches entirely (and mostly prevent leaking scopes). The end result would be similar to this:
screen shot 2018-08-29 at 7 32 52 pm

It may not highlight math the same, but it's much safer inside a block (where the highlighting is less important too IMO). All of the rules here are just match's

@kylebarron
Copy link
Author

I've been on vacation the last week and haven't been able to look at this until now.

You're right that the most recent commit was misguided, but I still strongly believe that it is both possible and desirable to use the standard LaTeX grammar for embedded purposes.

Catching arbitrary \begin ... \end environments in the beamer and memoir files would only be necessary if some of the beamer- or memoir-specific commands were included in some unknown \begin ... \end environment. Since the number of beamer and memoir is tiny, and since logically the beamer- and memoir-specific commands are top-level commands, this would be exceedingly rare. Some small corner case could be added to the beamer or memoir file later upon request.

I reverted that commit and I contend that the current state is stable without breaking environments.

It may not highlight math the same

Your embedded syntax appears to not highlight math at all, which would be a huge step backwards for most users of Markdown, who use LaTeX syntax mostly for math.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants