Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlEncoding consistent with rules in handlebars.js #473

Merged
merged 5 commits into from
Dec 22, 2021

Conversation

tommysor
Copy link
Contributor

@tommysor tommysor commented Nov 25, 2021

The original issue this PR was intended to solve have been fixed in PR #477.
This PR now deals with general rules for encoding in Handlebars.Net vs handlebars.js.

Using this PR since it contains the history of how this change came to be.

[WIP] Configuration.NoEscape inconsistent fix

Regarding issue: HandlebarsDotNet.Handlebars.Configuration.NoEscape Applied Inconsistently #468

Change in commit: Reset SuppressEncoding to configured instead of false. (old) UnencodedStatementVisitor resets value to previous (new) makes both tests for Handlebars_Should_Encode_Chinese pass.

Both tests for Handlebars_Should_Not_Encode_Chinese still fail (output is Chinese characters, rather than expected &#xxxx;).
I don't know if this should be considered correct behavior, or possibly a separate issue.

Looking forward to your comments.

@rexm
Copy link
Member

rexm commented Nov 25, 2021

What’s the behavior in the JS library for the Chinese characters?

@tommysor
Copy link
Contributor Author

What’s the behavior in the JS library for the Chinese characters?

I have no idea. Or any idea of how to figure it out.
Closest I see is this page: http://tryhandlebarsjs.com/
But I don't know what settings they are using and if that matches what's relevant for this case.

On http://tryhandlebarsjs.com/ with "expression with raw HTML"
"<" is shown as &lt;
While non-ascii characters "öä ñ øå 产品" are shown as entered.

Any insight here @zjklee or @tonysneed ?

@tommysor
Copy link
Contributor Author

I had another look, still cant test if I'm correct.
Looks to me like the JS library does not escape anything except this small list of chars.

https://github.com/handlebars-lang/handlebars.js/blob/master/lib/handlebars/utils.js

const escape = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#x27;',
  '`': '&#x60;',
  '=': '&#x3D;'
};

const badChars = /[&<>"'`=]/g,
  possible = /[&<>"'`=]/;

function escapeChar(chr) {
  return escape[chr];
}

[...]

export function escapeExpression(string) {
  if (typeof string !== 'string') {
    // don't escape SafeStrings, since they're already safe
    if (string && string.toHTML) {
      return string.toHTML();
    } else if (string == null) {
      return '';
    } else if (!string) {
      return string + '';
    }

    // Force a string conversion as this will be done by the append regardless and
    // the regex test will do this transparently behind the scenes, causing issues if
    // an object's to string has escaped characters in it.
    string = '' + string;
  }

  if (!possible.test(string)) {
    return string;
  }
  return string.replace(badChars, escapeChar);
}

https://github.com/Handlebars-Net/Handlebars.Net/blob/master/source/Handlebars/IO/HtmlEncoder.cs

[...]

private static void EncodeImpl<T>(T text, TextWriter target) where T: IEnumerator<char>
{
	while (text.MoveNext())
	{
		var value = text.Current;
		switch (value)
		{
			case '"':
				target.Write("&quot;");
				break;
			case '&':
				target.Write("&amp;");
				break;
			case '<':
				target.Write("&lt;");
				break;
			case '>':
				target.Write("&gt;");
				break;

			default:
				if (value > 159)
				{
					target.Write("&#");
					target.Write((int)value);
					target.Write(";");
				}
				else target.Write(value);
				break;
		}
	}
}

@tommysor
Copy link
Contributor Author

Cleaned up branch. The two commits now address 2 separate issues.

Commit: UnencodedStatementVisitor resets value to previous
is a strait forward bugfix, and I believe it should be enough to fix the specific problem in issue #468.

Commit: HtmlEncoder escape chars based on https://github.com/handlebars-lang/…
is a danger zone.
Encoding rules copied from JS based on my understanding.
I left out (with comment) 2 chars ( ' and = ) because encoding them breaks existing tests.
Would likely break real world implementations that rely on current behavior.

@tonysneed
Copy link

@tommysor Thanks for working to fix #468. I'll pull your branch to see if the tests pass in my repro.

@rexm @zjklee I'm looking forward to your review of this PR.

@tonysneed
Copy link

@tommysor I puled your branch and ran my tests. The tests which set Handlebars.Configuration.NoEscape = true all pass.

However, setting Handlebars.Configuration.NoEscape = false has no effect. The Chinese characters are not encoded when they should be. Is this expected behavior?

@tommysor
Copy link
Contributor Author

tommysor commented Dec 6, 2021

@tonysneed That, I believe, is the $1,000 question for this pull request.

Under assumption: Existing encoding rules in Handlebars.Net is correct (https://github.com/Handlebars-Net/Handlebars.Net/blob/master/source/Handlebars/IO/HtmlEncoder.cs).
Then expected result is that the Chinese characters are encoded to &#xxx; when NoEscape = false.
This is what you should get if you cherry pick only the commit UnencodedStatementVisitor resets value to previous.
@tonysneed Your tests will not pass as is. They will pass if you make a change in Class.hbs to double curlys {{> properties}} from triple curlys {{{> properties}}}.

Under assumption: Encoding rules in JavaScript repo is correct (https://github.com/handlebars-lang/handlebars.js/blob/master/lib/handlebars/utils.js).
Then expected result is that the Chinese characters are kept unchanged regardless of NoEscape setting.
This is the current behavior of this branch. Some deviations from js rules would need some cleanup work.

Maintainers (so @rexm and/or @zjklee) should decide the direction for this.
My own suggestion would be that I split the commit UnencodedStatementVisitor resets value to previous out into a new pull request. That commit can be released as a patch with very low chance of breaking anything.
While the change to general handling of encoding would be a separate larger change, or possibly not done, according to the maintainers decision.

@oformaniuk
Copy link
Member

@tommysor , thanks for taking time for creating the PR.
I'd probably go with a safer option - first merge non-breaking changes. In a separate PR introduce the breaking changes but hide them behind feature toggle. The default behaviour should stay the same in current version but can be changed in the next major release.

@tommysor tommysor changed the title [WIP] Configuration.NoEscape inconsistent fix HtmlEncoding consistent with rules in handlebars.js Dec 20, 2021
@tommysor
Copy link
Contributor Author

Add feature toggle Compatibility.UseLegacyHandlebarsNetHtmlEncoding

Copy link
Member

@oformaniuk oformaniuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add corresponding documentation to the README?

source/Handlebars/IO/HtmlEncoder.cs Outdated Show resolved Hide resolved
@tommysor
Copy link
Contributor Author

Can you please also add corresponding documentation to the README?

Added section to README.

@oformaniuk
Copy link
Member

Rebase your branch to the latest master and it will merge automatically 👍

Run BasicIntegrationTests with new version of HtmlEncoder.
auto-merge was automatically disabled December 22, 2021 22:05

Head branch was pushed to by a user without write access

@sonarcloud
Copy link

sonarcloud bot commented Dec 22, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
16.6% 16.6% Duplication

@oformaniuk oformaniuk merged commit 02e1794 into Handlebars-Net:master Dec 22, 2021
@tommysor tommysor deleted the NoEscape_inconsistent_fix branch December 22, 2021 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HandlebarsDotNet.Handlebars.Configuration.NoEscape Applied Inconsistently
4 participants