Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP code blocks require closing tag (?>) to format correctly #221

Closed
g105b opened this issue Nov 18, 2014 · 13 comments
Closed

PHP code blocks require closing tag (?>) to format correctly #221

g105b opened this issue Nov 18, 2014 · 13 comments

Comments

@g105b
Copy link

g105b commented Nov 18, 2014

Here is a snippet of PHP that will be formatted by Github:

<?php
namespace My\Name\Space;

class MyClass extends \Original\ClassName {

public function myFunction() {
    # Do something...
}

}# Notice no closing PHP tags

and this paragraph is no longer formated as PHP.

However, in MarkdownEditing syntax, the subsequent paragraphs after the above snippet will continue to be formatted as code, unless a closing tag (?>) ends the code snippet.

@drKnoxy
Copy link

drKnoxy commented Jan 19, 2015

god i wish this would be fixed

@drKnoxy
Copy link

drKnoxy commented Jan 19, 2015

To help reiterate if you have a <?php at the beginning of the fenced block it won't match the closing fence

```php
<?php
function test()
{
    return 'something';
}
``` \\closing the php fence, delete this comment

# and then add a header down here will won't get styled properly

If you can point me at the right area to fix this i'd be glad to help, but I think it might be something more to the core of subl.

@felixhao28
Copy link
Contributor

@drKnoxy If you really like to solve this problem on your own, search <key>fenced-php</key> in Markdown.tmLanguage. What makes php special is that php accepts back ticks, so sublime cannot determine whether ``` is a part of php code or not.

@ttscoff
Copy link
Contributor

ttscoff commented Feb 26, 2015

IIRC, the complication comes mostly from the fact that the MarkdownEditing tmlanguage syntax inherits from HTML, which includes <?php as a regex trigger for a sub-syntax. It could be possible to set the fenced code regex to match only pairs of backticks at the same indentation with the exact same number of characters, thus allowing a user to use 4 backticks or more when the PHP snippet -- for whatever reason -- contained ``` already. If you really want to dig into it…

@maliayas
Copy link
Member

I'll tell my dirty explanation and possible fix for the bug.

This bug happens because;

  1. When external syntax is imported it continues until it completes it's syntax.
  2. PHP syntax that begins with <?php is not completed until it reaches ?> or EOF. That's a must.
  3. Above rule cuses the closing backticks eaten by the PHP. PHP consumes those backticks and rest of the markdown is broken.

Possible fix:

  1. PHP syntax should be separated into 2 different syntaxes: "php" and say "php-raw"
  2. "php" syntax shoudn't define any php syntax. It should define <?php ?> which imports "php-raw".
  3. MarkdownEditing should import "php-raw" instead of "php".
  4. This is a really disturbing dirty hack but: "php-raw" syntax shouldn't allow more than 3 sequential backticks in the main scope. It's ok to have them in a string, that's not a problem. This rule will help the php flow stop at closing backticks. Otherwise it would still continue after those backticks.

This is what I have in mind at the moment without deep research. :)

@g105b
Copy link
Author

g105b commented Feb 26, 2015

My solution:

  1. Stop parsing the code snippet at the occurrence of three back ticks at the start of a line.

@maliayas
Copy link
Member

@g105b That still requires putting <?php at the beginning. And both solutions require modifying default PHP package which doesn't sound a safe action.

Also this bug has some other faces. For example if you put if () { in the code, you have to put } as well, otherwise the issue will still occur. Inner code always has to be syntax-complete. (For more broken examples, see #96, #188, #172. These are related as well: #215, #214)

I don't know how we can remove the requirement of being "syntax complete". Maybe we should create simplified syntaxes that only highlights keywords and punctuations for that language.

@felixhao28
Copy link
Contributor

So, should this be marked as "won't fix"?

@maliayas
Copy link
Member

It is up to you.

P.S. I was using "impossible" label for such stuff, instead of "wontfix"

@markeissler
Copy link

I'm seeing this as a much larger problem that isn't language specific and suggest creating a new issue to track this bug. For instance, I'm looking at a markdown file right now that has the following fenced block:

```bash
$ git contrib visionmedia
  visionmedia (18):
  Export STATUS_CODES
  Replaced several Array.prototype.slice.call() calls with Array.prototype.unshift.call()
  Moved help msg to node-repl
  Added multiple arg support for sys.puts(), print(), etc.
  Fix stack output on socket error
  ...
```

Leading up to that section markdown hi lighting works correctly. Following the last three backticks, however, markdown hi lighting is broken. It breaks on the for word in the second to last sentence, which is clearly being parsed, incorrectly, as a language. This isn't ideal at all and the problem seems to affect most markdown files I open in Sublime with this plugin installed.

The suggestion by @g105b to simply disable syntax parsing within code blocks makes a lot of sense. @maliayas Not sure why you state that "putting <?php at the beginning" is relevant. The suggestion is that there would be no parsing within the fenced or indented code block, so you cold put any broken code in there whatsoever.

Is it possible to disable syntax hilighting within code blocks?

@felixhao28
Copy link
Contributor

felixhao28 commented Mar 14, 2015

@markeissler , I am going to answer your question by answering the following three instead:

  1. What is the nature of this issue?
  2. What causes this issue?
  3. What can we do about it?

What is the nature of this issue?

You noticed that for breaks the syntax hightlight on the following text. But this issue is not for specific, language specific, not even markdown specific. We have been actually observing this issue from the very beginning of Sublime Text. Look at the following javascript:

//Syntax: Javascript
var a;
var text = "some text";
var b;

What if we remove the trailing quotation mark:

//Syntax: Javascript
var a;
var text = "some text;
var b;

See, var b is colored differently from var a because it is considered as part of the string. Even if we add an explict scope for it:

<!--Syntax: HTML-->
<label>label 1</label>
<script>
    var a;
    var text = "some text;
    var b;
</script>
<label>label 2</label>

Essentially, what you just saw is same as the problem you described. In bash, for always pairs with done,so they are the left and right quotation marks. Missing the right one will of course break the highlight.

What causes this issue?

Sublime Text uses "scope" to decide how text should be highlighted. And the scopes are decide by syntax definition files. When ST sees an "scope opener" like a left quotation mark, it will search for the "scope closer" like a right quotation mark. Anything between them will be swallowed as a whole and assigned to that scope. We cannot simply add three consecutive backticks as a "scope closer" for all scopes. Consider this:

<!--Syntax:Markdown GFM-->
```python
a_long_string = """
There can be many random things in this string,
including three consecutive backticks and an empty line:
```

Take that!
"""
```
<the document continues>

How do you decide Take that! is a part of python multiline string or markdown? You cannot. However, most markdown parsers put a higher priority on three backticks or ~~~ than """. Sublime Text simply does not support "looking back". Its text processng engine is an hungry monster that crawls and devours all the way to abyss till it is satisfied. The only solution that I can think of is by adding three backticks and ~~~ to every scope closer pattern to every supported language, e.g. `var text = "only left quotation mark;```` will become legal in "Markdown Code Block: JavaScript" syntax. This is definitely a costly solution. In addition, MarkdownEditing does not have its own syntax definition files for every language that gfm styled code block supports, and it uses the syntax definition files from ST. The reasons are:

  1. We don't have to make/maintain something that is already there.
  2. Reusing external syntax definition files will make MDE core, modular and extensible.

What can we do about it?

Sadly, not much.

Like you said, adding the support for turning on/off syntax highlight so you can turn off the highlight if you really need to write broken code in a code block sounds like an acceptable workaround. Unfortuanately, neither syntax highlight files (.thTheme) nor syntax definition files (.thLanguage) can be modified dynamically. We have to add three more syntaxes files as "Markdown (without code block highlight)", "Markdown GFM (without code block highlight)" and "MultiMarkdown (without code block highlight)" in addition to the three we already have. Personally, I think this will make the project too messy.

Settings cannot change the behaviour of highlight.

Despite all that I have said, this is still a bug because the way the text gets highlighted is different from how most markdown parsers work. But before a major change happens to ST parsing engine, we have to bear with it.

I hope this ends all questions about this issue.

@kumarharsh
Copy link

Oh the sadness. :'(

What I'm doing right now is writing my markdown with the ?> included, and at the time of publishing, I just do a Ctrl+D (multi-select) for the ?> and remove them. Extremely painful and error-prone, but c'est la vie...

@felixhao28
Copy link
Contributor

Upcoming fix described in #484. Closing it for now. Feel free to reopen it if that solution is not sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants