Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape Nunjucks for special tags #1049

Merged
merged 3 commits into from
Mar 21, 2020
Merged

Conversation

crphang
Copy link
Contributor

@crphang crphang commented Feb 20, 2020

Initial attempt:

It escapes the tokens }, }}, {{ in a single pass by adding {%raw%} {%endraw%}.

In the event of {{, it attempts to check whether the content is a variable or not, if it is, it ignores it and goes to the next token. If not, it would escape content to the next token (Always safe to escape).

P.S.

This PR allows {%raw%} block tags that are used for escaping nunjuck syntax to be used for special tags. This allows special tag plugins that rely on these syntax to escape these syntax by providing a way to escape repeatedly called nunjucks render through markbind's repository.

Suggested commit message:

Provide a way to escape mustache syntax for repeatedly called nunjucks render for special tags.

@crphang
Copy link
Contributor Author

crphang commented Feb 22, 2020

I changed the approach from when I first started. This partially address #762. The main value of this PR is that together with #1047 that escapes tags, we can completely escape content that are meant to be parsed by external libraries (diagrams, latex) which commonly uses > and } as tokens.

Previously, the attempt was more ambitious. I attempted to tokenize and capture nunjuck variable tags, and conditionally escape those if they are not a variable. This works generally well for user-defined variables within Markbind.

The issue arrives when the user defines variables that are not within Markbind, for example in Markbind user guide. These variables are very hard to detect, and are not explicitly handled within Nunjucks itself see mozilla/nunjucks#557.

Currently, we allow the user to define and escape it's own {%raw%} content {%endraw%} tags in places they want to escape. However, the escaping is only for plugins and will not be displayed until we managed to deal with other syntax clashes from Vue and Markdown-it-attr. For now, I explicitly remove all the raw tags before we call render because of syntax clash with markdown-it-attrs and how newlines are handled.

For instance:

const MarkdownIt = require('markdown-it');
const md = new MarkdownIt().use(require('markdown-it-attrs'));

console.log(md.render('{%raw%}content{%endraw%}')); // <p %endraw%="">{%raw%}content</p>

// For new lines
console.log(md.render(`<div>{%raw%}

content

{%endraw%}</div>`)); 

// <p %raw%="">&lt;div&gt;</p>
// <p>content</p>
// <p>{%endraw%}&lt;/div&gt;</p>

@crphang crphang changed the title WIP: Escape Nunjucks Escape Nunjucks Feb 22, 2020
@crphang crphang changed the title Escape Nunjucks Escape Nunjucks for plugins Feb 22, 2020
Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean solution 👍

src/lib/markbind/nunjuckUtils.js Outdated Show resolved Hide resolved
src/Page.js Outdated Show resolved Hide resolved
Copy link
Contributor

@marvinchin marvinchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is still some work pending for this PR. Shall we label is [WIP] in the meantime?

This approach looks promising, hopefully we can iron out the remaining issues and fix the bug :)

src/lib/markbind/nunjuckUtils.js Outdated Show resolved Hide resolved

expect(escapedContent).toBe(escapedString);
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this nested escapes are a case we need to account for?

test('Escaping nested nunjucks raw tags', () => {
  const escapedString = 'This is a content with nested escaped data {%raw%} {%raw%} CONTENT {%endraw%} {%endraw%}';
  const escapedContent = nunjuckUtils.renderEscaped(nunjucks, escapedString);
  expect(escapedContent).toBe(escapedString);
});

I can imagine this might be useful for documenting how to use to use the escape tags in the first place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting with stuff in between doesn't seem to work correctly 😅

test('Escaping nested nunjucks raw tags', () => {
  const escapedString = 'Nested escaped data: {% raw %}BEFORE{% raw %} CONTENT {% endraw %}AFTER{% endraw %}';
  const escapedContent = nunjuckUtils.renderEscaped(nunjucks, escapedString);
  expect(escapedContent).toBe(escapedString);
});
Expected: "Nested escaped data: {% raw %}BEFORE{% raw %} CONTENT {% endraw %}AFTER{% endraw %}"
Received: "Nested escaped data: {% raw %}BEFORE{% raw %} CONTENT {% endraw %}{% endraw %}AFTER"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like as long as we go with the opentag/closetag approach, we would always run into this issue:
$OPEN$ $CLOSE$ $CLOSE$ <- how do we match the $OPEN to the right $CLOSE?

Do you think having a special tag that escapes the next character would help? This would "break" the syntax so the original special tag wont be treated as such.

src/lib/markbind/nunjuckUtils.js Outdated Show resolved Hide resolved
@crphang crphang changed the title Escape Nunjucks for plugins WIP: Escape Nunjucks for plugins Feb 24, 2020
@crphang
Copy link
Contributor Author

crphang commented Feb 24, 2020

The scope of this PR will not address rendering of displaying {{ }}, but would allow plugins that rely on such syntax to parse properly. would continue the discussion of deconflicting markbind-it-attr and Vue in the issue itself.

I would rebase this PR on top of #1047, will be adding an integration test to ensure that nunjuck syntax are properly escaped for plugins.

@ang-zeyu
Copy link
Contributor

The scope of this PR will not address rendering of displaying {{ }}, but would allow plugins that rely on such syntax to parse properly. would continue the discussion of deconflicting markbind-it-attr and Vue in the issue itself.

I would rebase this PR on top of #1047, will be adding an integration test to ensure that nunjuck syntax are properly escaped for plugins.

#1047 deals with markbind-it-attr already, since the entire special tag is processed as a html block.

For vue, perhaps after this PR we could extend #1047's api with this PR's functionality.

The regex would then also comb for these additional patterns and append {% raw/endraw %} to them.
e.g. for markdown code blocks, we comb for triple backticks and append said tags, then we modify the renderer to add a v-pre attribute to the resulting <pre> tag.

For general use, we could create a special component, something like <escape></escape> that acts the same as a <span v-pre></span> in addition to appending {% raw %} {% endraw %}

The same could be done for plugin defined components sparingly.
( I don't think we currently have plugins needing {{ ... }} at the moment though )

@crphang
Copy link
Contributor Author

crphang commented Feb 29, 2020

#1047 deals with markbind-it-attr already, since the entire special tag is processed as a html block.

Yep. Thanks to good work in #1047, special tags are isolated from markdown-it and markdown-it-attr. I was considering about how to handle other cases when the nunjuck escapes are processed by md.render.

( I don't think we currently have plugins needing {{ ... }} at the moment though )

Right now we probably don't, but potentially plugins like latex will start having {{ }} when we have more complicated compositions.

For general use, we could create a special component, something like <escape></escape> that acts the same as a <span v-pre></span> in addition to appending {% raw %} {% endraw %}

Something we could certainly explore

@crphang crphang changed the title WIP: Escape Nunjucks for plugins Escape Nunjucks for plugins Mar 9, 2020
@crphang
Copy link
Contributor Author

crphang commented Mar 9, 2020

Ready for review.

Integration test added hints at how this can be used with special tags, it naively replaced all mustaches with success. A common use case would be helping to enable escaped special tag which relies of mustaches such as latex or other syntax.

@crphang crphang changed the title Escape Nunjucks for plugins Escape Nunjucks for special tags Mar 9, 2020
Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On how this can be used with special tags, would it be better to keep it to a separate PR since it hasn't been implemented?

src/lib/markbind/src/parser.js Outdated Show resolved Hide resolved
@crphang
Copy link
Contributor Author

crphang commented Mar 9, 2020

On how this can be used with special tags, would it be better to keep it to a separate PR since it hasn't been implemented?

For now, the test focuses on ensuring that a special tag plugin could retrieve an untouchedescaped unparsed mustache syntax. Before we have the special component, it would be good to ensure that there will be no regression by having the integration test.

For general use, we could create a special component, something like that acts the same as a in addition to appending {% raw %} {% endraw %}

Would be great to have something like that. When we do, we could add other tests that checks this or even replace the current one.

@ang-zeyu
Copy link
Contributor

ang-zeyu commented Mar 9, 2020

My bad, realised I edited your comment instead of making a new one 😅
Reverted it

For now, the test focuses on ensuring that a special tag plugin could retrieve an escaped unparsed mustache syntax. Before we have the special component, it would be good to ensure that there will be no regression by having the integration test.

Ahh I see the point now, thanks!

For general use, we could create a special component, something like that acts the same as a in addition to appending {% raw %} {% endraw %}

Would be great to have something like that. When we do, we could add other tests that checks this or even replace the current one.

Reason I was confused is that if we are going in this direction, the test would likely be replaced as you mentioned. I'm fine with including it here as well in the meanwhile though.

A brief glance:

  • there's quite a few nunjucks.renderString usages I noticed that wasn't replaced with the wrapper call. Should all such calls before the two nunjucksUtils.removeNunjucksEscapes be wrapped? ( would be good to document otherwise )
  • could move nunjucksUtils into the new lib/markbind/src/utils folder

@crphang
Copy link
Contributor Author

crphang commented Mar 10, 2020

My bad, realised I edited your comment instead of making a new one

No worries 😄.

Thank you for your review! I reviewed some of the existing nunjucks.RenderString that exists, and I agree that it will get confusing when there are 2 ways of using nunjucks. Actually, there is no difference between nunjucks.RenderString and nunjuckUtils.renderEscaped after nunjucksUtils.removeNunjucksEscapes is called.

I have added additional tests to ensure that the behavior of rendering nunjucks to remove brackets is retained even when raw tags is used. This helps to prevent backward compatibility issues where we actually need the mustaches to be removed so that Vue can work properly until we have the component mentioned above.

Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review! I reviewed some of the existing nunjucks.RenderString that exists, and I agree that it will get confusing when there are 2 ways of using nunjucks. Actually, there is no difference between nunjucks.RenderString and nunjuckUtils.renderEscaped after nunjucksUtils.removeNunjucksEscapes is called.

It wasn't confusing per se, more a matter of functionality. ( eg. the renderString in insertHeaderFile wasn't wrapped before, so {% raw %} would be erased before the final nunjucks call for the page's content )

On that note, and {{ still getting erased, shouldn't the final nunjucks call for the page / external dependency's output be the raw call to nunjucks.renderString still?
Perhaps nunjucks.removeNunjucksEscapes is not necessary in this sense, as nunjucks already removes {% raw %}... {% endraw %}

test/functional/test_site/testNunjuckUtils.md Outdated Show resolved Hide resolved
test/functional/test_site/testNunjuckUtils.md Outdated Show resolved Hide resolved
@crphang crphang force-pushed the nunjucks-explore branch 2 times, most recently from a7961c8 to 0bb9ee9 Compare March 10, 2020 15:19
@crphang
Copy link
Contributor Author

crphang commented Mar 10, 2020

On that note, and {{ still getting erased, shouldn't the final nunjucks call for the page / external dependency's output be the raw call to nunjucks.renderString still?

True 👍 , we should remove any additional {% raw %} if it still exists in the final call.

Perhaps nunjucks.removeNunjucksEscapes is not necessary in this sense, as nunjucks already removes {% raw %}... {% endraw %}

We would still require this. As explained above, we still need to ensure {% raw %} tags does not change current behavior by interfering with markdown-it-attr, now that it can be "retained" until the parser stage.

Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still require this.

I see that {% raw %} conflicts with markdown-it-attrs now, and the reason for the scope of this pr.

In this sense, is removeNunjucksEscape's purpose is only to prevent conflict with the md.render call in page.renderFile?
If its the case, and since the scope of this PR targets special tags, should the functional test be removed, and removeNunjucksEscapes along with it? ( since it wouldn't conflict if it is wrapped by a special tag anyway )

Once we have a patch for markdown-it-attrs and markdown-it, we could then reintroduce a similar test that tests the functionality of {% raw %}...{% endraw %} when not wrapped by a special tag, without the need for removeNunjucksEscapes.

@crphang
Copy link
Contributor Author

crphang commented Mar 11, 2020

👍 . I removed removeNunjuckEscape here and moved the test to postRender to test that special tags still works under more rounds of parsing.

Once we have a patch for markdown-it-attrs and markdown-it, we could then reintroduce a similar test that tests the functionality of {% raw %}...{% endraw %} when not wrapped by a special tag, without the need for removeNunjucksEscapes.

Yea we could do that. Removing removeNunjucksEscapes right now means that when user does {% raw %}, they will be parsed as classes due to markdown-it-attr but they are previously not. This is because raw tags are retained until md.render. There will be no key differences in user experience and this can be addressed later on.

Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍, thanks for the explanations thus far!

@marvinchin marvinchin self-requested a review March 14, 2020 06:21
Copy link
Contributor

@marvinchin marvinchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting still seems to be an issue I think :( let me know if I'm misunderstanding this!

Also, just to clarify, does this work only for plugins? Would it work if we want to actually escape the variable syntax (#1086)?

Just tested it out, I think I finally get what you mean by the syntax clashing with vue.

Do you happen to know if we currently have any other way to escape the {{ VARIABLE }} syntax?


expect(escapedContent).toBe(escapedString);
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting with stuff in between doesn't seem to work correctly 😅

test('Escaping nested nunjucks raw tags', () => {
  const escapedString = 'Nested escaped data: {% raw %}BEFORE{% raw %} CONTENT {% endraw %}AFTER{% endraw %}';
  const escapedContent = nunjuckUtils.renderEscaped(nunjucks, escapedString);
  expect(escapedContent).toBe(escapedString);
});
Expected: "Nested escaped data: {% raw %}BEFORE{% raw %} CONTENT {% endraw %}AFTER{% endraw %}"
Received: "Nested escaped data: {% raw %}BEFORE{% raw %} CONTENT {% endraw %}{% endraw %}AFTER"

src/Page.js Outdated Show resolved Hide resolved

expect(escapedContent).toBe(escapedString);
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like as long as we go with the opentag/closetag approach, we would always run into this issue:
$OPEN$ $CLOSE$ $CLOSE$ <- how do we match the $OPEN to the right $CLOSE?

Do you think having a special tag that escapes the next character would help? This would "break" the syntax so the original special tag wont be treated as such.

@crphang
Copy link
Contributor Author

crphang commented Mar 14, 2020

@marvinchin Thank you so much for your review. I updated it and fixed some of the styles to make it consistent. I removed the test that was supporting nested tags because there is no need for the user to input nested escapes in any case. It is much better to remove the test than to suggest to future developers that we are officially supporting and maintaining an unnecessary interaction.

Do you happen to know if we currently have any other way to escape the {{ VARIABLE }} syntax?

There are two main approaches for it:

  1. Automated detection of {{ and }}.
    1.1 If it is a variable, we simply leave as is.
    1.2 If it is not a variable, we could use {% raw %} tags to escape those which is the supported way from nunjucks by default. Alternatively we could try escaping using nunjucks.RenderString, but we inject the text in between {{ and }} as a variable. However, this method is quite buggy as renderString does not work when the text has spaces and it is very limiting. {% raw %} tags is still the most reliable and users who are well-acquainted with nunjucks would be familiar with them and expect it to work out of the box.
    1.3 Automated detection has a lot of edge cases to cover. For example, on top of having to build to cover edge cases like non-wellformed {{ and }} brackets where the numbers are unequal, we also need to know what are all the variables defined in the page. For example, not all variables are defined within markbind. There are some that uses nunjucks syntax to define variable directly: https://mozilla.github.io/nunjucks/templating.html#for which some users could adopt (our own userguide). Fetching those variables to ignore escaping is not officially supported by nunjucks: Get variables from template mozilla/nunjucks#329.

  2. Manual user input of {% raw %}. Users know best what they want to ignore and we leave it as such. (1.2) describes why other ways is not that simple to escape and {%raw%} is simpler and more reliable.
    2.1 The current "decorator" approach of supporting {%raw%} tags still allow us to go to automatic escaping (if we actually want). It helps to maintain the {%raw%} tags throughout if we are able to detect.

Hope that helps to explain!

@crphang
Copy link
Contributor Author

crphang commented Mar 21, 2020

@marvinchin ready for review! Sorry for the double ping 🙇

Copy link
Contributor

@marvinchin marvinchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@marvinchin marvinchin added this to the v2.12.1 milestone Mar 21, 2020
@marvinchin marvinchin merged commit 7ba15e9 into MarkBind:master Mar 21, 2020
Tejas2805 added a commit to Tejas2805/markbind that referenced this pull request Mar 23, 2020
* 'master' of https://github.com/MarkBind/markbind:
  Update test files (MarkBind#1138)
  Remove OK button from modals (MarkBind#1134)
  Add start from line number functionality to code blocks (MarkBind#1115)
  Allow special tags to be self-closing (MarkBind#1101)
  Simplify baseUrl resolving process (MarkBind#1087)
  Remove redundant file writing (MarkBind#1090)
  Bump acorn from 5.7.3 to 5.7.4 (MarkBind#1121)
  Bump acorn from 7.1.0 to 7.1.1 in /src/lib/markbind (MarkBind#1120)
  Unify markdown-it parser variants (MarkBind#1056)
  Remove dynamic include feature (MarkBind#1037)
  Fix flex shrink not applying in content wrapper (MarkBind#1135)
  Escape Nunjucks for special tags (MarkBind#1049)
  Update documentation on icon slot for boxes (MarkBind#1123)
Tejas2805 added a commit to Tejas2805/markbind that referenced this pull request Mar 23, 2020
…x-pageNav

* 'master' of https://github.com/MarkBind/markbind:
  Update test files (MarkBind#1138)
  Remove OK button from modals (MarkBind#1134)
  Add start from line number functionality to code blocks (MarkBind#1115)
  Allow special tags to be self-closing (MarkBind#1101)
  Simplify baseUrl resolving process (MarkBind#1087)
  Remove redundant file writing (MarkBind#1090)
  Bump acorn from 5.7.3 to 5.7.4 (MarkBind#1121)
  Bump acorn from 7.1.0 to 7.1.1 in /src/lib/markbind (MarkBind#1120)
  Unify markdown-it parser variants (MarkBind#1056)
  Remove dynamic include feature (MarkBind#1037)
  Fix flex shrink not applying in content wrapper (MarkBind#1135)
  Escape Nunjucks for special tags (MarkBind#1049)
  Update documentation on icon slot for boxes (MarkBind#1123)
  Convert code in boxes to code block (MarkBind#1086)
marvinchin pushed a commit that referenced this pull request Apr 10, 2020
* Allow nunjucks that is called multiple times to support escaping for special tags

* Remove the need to prune raw tags, simplify logic of replacement

* Fix style and remove nested nunjucks escape test
@damithc
Copy link
Contributor

damithc commented Apr 18, 2020

@crphang what's the outcome of this PR, from author POV?
I thought this syntax should work now, but doesn't seem to.

{% raw %}
{{ abc }}
{% endraw %}

@crphang
Copy link
Contributor Author

crphang commented Apr 18, 2020

@damithc This works when used in specialTags, such as puml.

This allows curly braces {{ content }} to be retained throughout Parse and PostRender, which can be seen in the integration test added, which allows later plugins that require such syntax to not be conflicted.

To actually show the curly braces: {{ content }}, we need an addition as described: #1049 (comment).

For general use, we could create a special component, something like <escape></escape> that acts the same as a <span v-pre></span> in addition to appending {% raw %} {% endraw %}

This could be a separate plugin (which can be registered as special tag) that also notifies Vue that it is raw with v-pre.

@crphang
Copy link
Contributor Author

crphang commented Apr 18, 2020

From an author POV, nothing should change right now, but it allows developers to define more components that could consist of conflicting syntax, in this case curly braces.

I would raise a seperate issue to track this, for creating a new plugin that allows author to render {{ }} syntax

@damithc
Copy link
Contributor

damithc commented Apr 18, 2020

From an author POV, nothing should change right now, but it allows developers to define more components that could consist of conflicting syntax, in this case curly braces.

I would raise a seperate issue to track this, for creating a new plugin that allows author to render {{ }} syntax

Got it. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants