2 new hooks specifically for grammars #2081

RunDevelopment · 2019-09-29T20:14:16Z

Motivation
Right now, the before-tokenize and after-tokenize hooks are called by Prism.highlight.

This is a problem because this means that these hooks cannot be run for embedded languages like PHP:

Prism.languages.template = {
    'with-embedded-php': {
        pattern: /(^Begin PHP$)[\s\S]+?(?=^End PHP$)/m,
        lookbehind: true,
        inside: Prism.languages.php
    }
};

In this example grammar, the markup templating hooks for PHP cannot be run meaning that none of the markup in which the PHP code is embedded will be highlighted.

Description
So let's add 2 new hooks which are run by Prism.tokenize instead. Since inside grammars are also executed by it, it's an easy solution. The will be equivalent to before-tokenize and after-tokenize. Prism.tokenize will then look like this:

function tokenize(text, grammar) {
	var env = { text, grammar };

	// names are still undecided
	Prism.hooks.run('new-before-tokenize', env);
	text = env.text;
	grammar = env.grammar;

	var strarr = [text];
	// handle .rest in grammar
	_.matchGrammar(text, strarr, grammar, 0, 0, false);

	env.tokens = strarr;
	Prism.hooks.run('new-after-tokenize', env);

	return env.tokens;
}

Moving the existing before-tokenize and after-tokenize into Prism.tokenize is not possible because the function doesn't know the name of the grammar and considering inside grammars, it shouldn't.

Alternatives
Because the use-case of the new hooks are all grammar specific (they are NOT intended to be used by plugins), we could instead make them two non-enumerable functions in the grammar itself.

function tokenize(text, grammar) {
	const { beforeTokenize, afterTokenize } = grammar;

	var env = { text, grammar };
	beforeTokenize && beforeTokenize(env);
	text = env.text;
	grammar = env.grammar;

	var strarr = [text];
	// handle .rest in grammar
	_.matchGrammar(text, strarr, grammar, 0, 0, false);

	env.tokens = strarr;
	afterTokenize && beforeTokenize(env);

	return env.tokens;
}

Closing thoughts

For both solutions to actually solve the problem motivating them, markup templating has to be modified. Richt now, MT identifies languages by their name and not by their grammar (this also makes aliases impossible but that's another issue).

On the positive side, this will also simplify some code because now, Prism.tokenize is all you need to tokenize with a grammar. No more tokenizeWithHooks.

The text was updated successfully, but these errors were encountered:

RunDevelopment · 2022-08-31T18:31:04Z

Closed in favor of #3539.

RunDevelopment added the enhancement label Sep 29, 2019

snyk-bot mentioned this issue Dec 27, 2021

[Snyk] Upgrade yargs from 17.2.1 to 17.3.0 turkdevops/prism#268

Closed

snyk-bot mentioned this issue Jan 24, 2022

[Snyk] Upgrade yargs from 17.2.1 to 17.3.1 turkdevops/prism#273

Closed

snyk-bot mentioned this issue Apr 18, 2022

[Snyk] Upgrade yargs from 17.2.1 to 17.4.0 turkdevops/prism#309

Open

RunDevelopment closed this as completed Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 new hooks specifically for grammars #2081

2 new hooks specifically for grammars #2081

RunDevelopment commented Sep 29, 2019

RunDevelopment commented Aug 31, 2022

2 new hooks specifically for grammars #2081

2 new hooks specifically for grammars #2081

Comments

RunDevelopment commented Sep 29, 2019

Closing thoughts

RunDevelopment commented Aug 31, 2022