Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 new hooks specifically for grammars #2081

Closed
RunDevelopment opened this issue Sep 29, 2019 · 1 comment
Closed

2 new hooks specifically for grammars #2081

RunDevelopment opened this issue Sep 29, 2019 · 1 comment

Comments

@RunDevelopment
Copy link
Member

Motivation
Right now, the before-tokenize and after-tokenize hooks are called by Prism.highlight.

This is a problem because this means that these hooks cannot be run for embedded languages like PHP:

Prism.languages.template = {
    'with-embedded-php': {
        pattern: /(^Begin PHP$)[\s\S]+?(?=^End PHP$)/m,
        lookbehind: true,
        inside: Prism.languages.php
    }
};

In this example grammar, the markup templating hooks for PHP cannot be run meaning that none of the markup in which the PHP code is embedded will be highlighted.

Description
So let's add 2 new hooks which are run by Prism.tokenize instead. Since inside grammars are also executed by it, it's an easy solution. The will be equivalent to before-tokenize and after-tokenize. Prism.tokenize will then look like this:

function tokenize(text, grammar) {
	var env = { text, grammar };

	// names are still undecided
	Prism.hooks.run('new-before-tokenize', env);
	text = env.text;
	grammar = env.grammar;

	var strarr = [text];
	// handle .rest in grammar
	_.matchGrammar(text, strarr, grammar, 0, 0, false);

	env.tokens = strarr;
	Prism.hooks.run('new-after-tokenize', env);

	return env.tokens;
}

Moving the existing before-tokenize and after-tokenize into Prism.tokenize is not possible because the function doesn't know the name of the grammar and considering inside grammars, it shouldn't.

Alternatives
Because the use-case of the new hooks are all grammar specific (they are NOT intended to be used by plugins), we could instead make them two non-enumerable functions in the grammar itself.

function tokenize(text, grammar) {
	const { beforeTokenize, afterTokenize } = grammar;

	var env = { text, grammar };
	beforeTokenize && beforeTokenize(env);
	text = env.text;
	grammar = env.grammar;

	var strarr = [text];
	// handle .rest in grammar
	_.matchGrammar(text, strarr, grammar, 0, 0, false);

	env.tokens = strarr;
	afterTokenize && beforeTokenize(env);

	return env.tokens;
}

Closing thoughts

For both solutions to actually solve the problem motivating them, markup templating has to be modified. Richt now, MT identifies languages by their name and not by their grammar (this also makes aliases impossible but that's another issue).

On the positive side, this will also simplify some code because now, Prism.tokenize is all you need to tokenize with a grammar. No more tokenizeWithHooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant