Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode characters in template expressions #342

Closed
johan-ohrn opened this issue Dec 3, 2018 · 21 comments
Closed

unicode characters in template expressions #342

johan-ohrn opened this issue Dec 3, 2018 · 21 comments

Comments

@johan-ohrn
Copy link

We have some autogenerated code and for whatever reasons we have javascript objects with property names using non english characters such as åäö.
When we want to print a property from such an object we would type something like this in our template:
{{:myObj.someÅpropÄerty}}

This doesn't work because the javascript generated from this template expression is invalid.
+((v=data.myObj.someÅdata.propÄdata.erty)!=null?v:"");

I managed to track down the problem to a couple of regexes using \w character class, which doesn't support anything but ascii characters.

I made the following changing in jsrender.js - we use the separate files
Lines 54-57:

    uW = "\\w\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6",

    //rPath = /^(!*?)(?:null|true|false|\d[\d.]*|([\w$]+|\.|~([\w$]+)|#(view|([\w$]+))?)([\w$.^]*?)(?:[.[^]([\w$]+)\]?)?)$/g,
    rPath = new RegExp("^(!*?)(?:null|true|false|\\d[\\d.]*|([" + uW + "$]+|\\.|~([" + uW + "$]+)|#(view|([" + uW + "$]+))?)([" + uW + "$.^]*?)(?:[.[^]([" + uW + "$]+)\\]?)?)$", "g"),
    //        not                               object     helper    view  viewProperty pathTokens      leafToken


    //rParams = /(\()(?=\s*\()|(?:([([])\s*)?(?:(\^?)(~?[\w$.^]+)?\s*((\+\+|--)|\+|-|~(?![\w$_])|&&|\|\||===|!==|==|!=|<=|>=|[<>%*:?\/]|(=))\s*|(!*?(@)?[#~]?[\w$.^]+)([([])?)|(,\s*)|(\(?)\\?(?:(')|("))|(?:\s*(([)\]])(?=[.^]|\s*$|[^([])|[)\]])([([]?))|(\s+)/g,
    rParams = new RegExp("(\\()(?=\\s*\\()|(?:([([])\\s*)?(?:(\\^?)(~?[" + uW + "$.^]+)?\\s*((\\+\\+|--)|\\+|-|~(?![" + uW + "$_])|&&|\\|\\||===|!==|==|!=|<=|>=|[<>%*:?\\/]|(=))\\s*|(!*?(@)?[#~]?[" + uW + "$.^]+)([([])?)|(,\\s*)|(\\(?)\\\\?(?:(')|(\"))|(?:\\s*(([)\\]])(?=[.^]|\\s*$|[^([])|[)\\]])([([]?))|(\\s+)", "g"),

Seems kinda ugly to hard code the extra non-english characters like this. Perhaps you could make this user configurable or give unicode more generic support?

Even if using non-english characters for property names seem like a bad idea it's still valid javascript and as such I feel the same expression should be valid inside the template.

As a workarround I know we can type
{{:myObj["someÅpropÄerty"]}}
but this is ugly.

What's your take on the whole?

@johan-ohrn johan-ohrn changed the title unicode characters in expressions unicode characters in template expressions Dec 3, 2018
@johan-ohrn
Copy link
Author

I forgot to mention this is on version 0.9.91 but the code is identical in version 1.0.0.

@BorisMoore
Copy link
Owner

It looks like providing generic support for unicode property names would be difficult without significant perf / file size cost, at least judging by this https://mothereff.in/js-variables#%E0%B2%A0_%E0%B2%A0, and the code behind it.

Providing extensibility is possible, so people can add specific unicode characters they want to support, like you did, via

$.views.settings.advanced({addJsNameChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6");

or similar.

But given the fact that you can do {{:myObj["someÅpropÄerty"]}}, as you say, I'm not sure about adding the setting... It still requires setting charactars manually, character by character...

@johan-ohrn
Copy link
Author

I think a setting would be valuable for people in this situation. I agree that it's clumsy to have to individually add support for each character but at least it's a one time configuration.

In this particular case the offending property names are auto generated resource keys. We have a resource object like this:

resources: {
  för_lågt_antal_angivet: 'Entered quantity is to low',
  ...
}

In our view models we write code such as
displayError(resources.för_lågt_antal_angivet)

and in our templates we do the same:

{{if quantity < 4}}
  {{:~resources.för_lågt_antal_angivet}}
{{/if}}

Having to use one syntax in the javascript files and an other in the template (sometimes) is unintuitive.
Also developers are more comfortable with the dot notation instead of the index notation to access properties.

What's worse is when we need to change a resource. A developer would just copy/paste the new resource key in the template without considering that a property with non english characters need to be typed using the index notation. And because the change is so small it's also likely that it's not tested. It's just expected to work. In this contrived example there is an {{if}} surrounding the expression making it even more likely that this particular code path is not tested.

My 2 cents is that even though there is a viable workaround for the issue a setting would make this kind of code less error prone since you don't have to remember a special rule and more consistent with how you write code in the javascript viewmodel.

Also a setting like in the example you gave would allow for a small plugin/extension such as jsrender.latin1.js or whatever pre defined charset such that users wouldn't have to manually configure this.

@BorisMoore
Copy link
Owner

Yes, I am in two minds about doing this, given the test matrix, need to document etc. but FWIW the feature would be like this:

Additional code at line 164

var uW = value && value.addedJsNameChars;
if (uW) {
	uW = "[\\w" + uW + "$";
	rPath = new RegExp("^(!*?)(?:null|true|false|\\d[\\d.]*|(" + uW + "]+|\\.|~(" + uW + "]+)|#(view|(" + uW + "]+))?)(" + uW + ".^]*?)(?:[.[^](" + uW + "]+)\\]?)?)$", "g");
	rParams = new RegExp("(\\()(?=\\s*\\()|(?:([([])\\s*)?(?:(\\^?)(~?" + uW + ".^]+)?\\s*((\\+\\+|--)|\\+|-|~(?!" + uW + "_])|&&|\\|\\||===|!==|==|!=|<=|>=|[<>%*:?\\/]|(=))\\s*|(!*?(@)?[#~]?" + uW + ".^]+)([([])?)|(,\\s*)|(\\(?)\\\\?(?:(')|(\"))|(?:\\s*(([)\\]])(?=[.^]|\\s*$|[^([])|[)\\]])([([]?))|(\\s+)", "g");
}
return value

(Here it is: jsviews.js.txt)

Usage:

$.views.settings.advanced({addedJsNameChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6"});
var data = {foo.someÅpropÄerty:444};
...

@johan-ohrn
Copy link
Author

Thanks. I'll try it out and will keep an eye out on this thread to see where it ends up.

@BorisMoore
Copy link
Owner

Ah, I had missed your comment above (the page had not been refreshed when I wrote my response above.)

Yes, I hear you... I'll let it mature a bit before deciding. Meantime, if you can test fully with the update I included above that would be helpful. Let me know if you see any issues/bugs.

@BorisMoore
Copy link
Owner

I have an alternative which I am looking at. I'll probably send you a new patch to test, in the coming days, rather than testing the version above...

@johan-ohrn
Copy link
Author

Sounds good. In the meantime I use this quick fix.

@BorisMoore
Copy link
Owner

BorisMoore commented Dec 6, 2018

Here is an update:
jsviews.js.txt

I managed to figure out a way of allowing all unicode chars, without needing to specifically specify the character codes. The default is not to support unicode chars in names, but you can write

$.views.settings.advanced({unicodeChars: true}); // support all unicode chars in JavaScript names

or else limit to specific ones:

$.views.settings.advanced({unicodeChars: "\\u00E4\\u00E5\\u00F6\\u00C4\\u00C5\\u00D6"});

You can't have unicode characters on helpers, or helper properties, tag names or tag properties, etc. Only on properties of data passed in to the link() or render() method.

The generic unicode version, passing true, is done basically by replacing

/[\w]*/

by

/[^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]*/

so word characters are any characters other than the specified ones.

It adds 1% to the filesize (of jsrender.min.js), which I hesitate about, but at least the usage is easy now. (Assuming it works correctly!)

So if you are able to test it , it would be helpful, particularly to make sure it does indeed cover all unicode characters as expected (or at least test with a good range of characters...).

I am still considering whether or not to publish this feature...

@johan-ohrn
Copy link
Author

I was able to create a test html page. Download here (sorry about the adds and wait time to download)

I extracted every character from the regex.
The only character that appears to be invalid in fact in chrome is "ⸯ" which I have commented out in the html file.

I tested this with the jsviews version you provided. It throws an error.

I will test some more to see if I can isolate which characters are the culprit.

@BorisMoore
Copy link
Owner

Excellent. That is very helpful!

I did some testing with that, and all the characters seem to work. There was still an error from JsViews because of {{:aⸯa}} still being in the template markup. So I removed it.

Also, of course you need the line:

$.views.settings.advanced({unicodeChars: true});

before calling $.templates(...)

I also had some issues when using a file that did not have the BOM (with UTF-8 encoding).

But with those changes, the template rendered completely...

@BorisMoore
Copy link
Owner

Here is an updated jsviews.js and jsrender.js and a copy of my version of your test page:

jsviews.js.txt
jsrender.js.txt
testUnicode.html.txt

The test page works for me in Chrome and in Firefox, but not in IE, Edge, Safari... It seems that some of the unicode characters trigger javascript errors in those browsers. With a reduced set of unicode characters it works in all those browsers.

@johan-ohrn
Copy link
Author

I was at the end of my shift and had been messing back and forth with this test page so I missed the obvious $.views.settings.advanced(...) call :)

Good that you made it work!

So we confirmed that the regex does indeed cover all unicode characters.
The fact that it covers more than some browsers doesn't really matter I think.

Also since the regex does cover all unicode characters I don't see the need for a user to explicitly allow specific characters. Unicode on or off is sufficient. This should also result in a lower footprint for the feature.

If the option is to allow unicode or not then I see two possible implementations, well three.

  1. Making unicode opt-in via $.views.settings.advanced({unicodeChars: true});
    This alternative should ensure that no existing templates are broken by an unexpected bug in the new regex. Also adds more bytes to the final js file.
  2. Somehow allow setting the full rPath and rParams regexes from outside jsviews.js. This would allow you to make unicode support opt-in from a jsviews-unicode.js file moving the overhead completely from jsviews.js itself.
  3. Skip the option alltogether and just change the regex thus making unicode supported out of the box.

What's your take on these ideas?

@BorisMoore
Copy link
Owner

BorisMoore commented Dec 9, 2018

Yes, I agree with your analysis.

Option 2 is probably my preference.

Option 3 makes me nervous about possible bugs, especially doing it just after releasing v1.0.0 which is hopefully stable.

So option2 is a more conservative approach, given that this feature is unlikely to be used/needed by many users...

I'll post a version of approach 2 soon.

@BorisMoore
Copy link
Owner

BorisMoore commented Dec 9, 2018

Here is an update where you load the jsrender-unicode.js file after jsrender.js/jsviews.js, before calling $.templates(...) to compile any templates that have unicode characters in them.

Let me know if it works as expected. Thanks!

jsrender-unicode.js.txt
jsrender.js.txt
jsviews.js.txt

@johan-ohrn
Copy link
Author

I gave it a try but run into some issues with jsrender-unicode.js

The regexes in jsrender-unicode.js is not the same as those we used here
Is this intentional?
Modifying jsrender-unicode.js to use those regexes seem to work (comments left out for abbrevity):

$.views.sub.rPath =
    /^(!*?)(?:null|true|false|\d[\d.]*|([^.^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+|\.|~([\w$]+)|#(view|([\w$]+))?)([^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]*?)(?:[.[^]([^.^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+)\]?)?)$/g;
$.views.sub.rPrm =
  /(\()(?=\s*\()|(?:([([])\s*)?(?:(\^?)(~?[\w$.^]+)?\s*((\+\+|--)|\+|-|~(?![\w$])|&&|\|\||===|!==|==|!=|<=|>=|[<>%*:?/]|(=))\s*|(!*?(@)?[#~]?[^\s~/\x00-\x23\x25-\x2D\x3A-\x40[\]\\`{}|]+)([([])?)|(,\s*)|(\(?)\\?(?:(')|(\"))|(?:\s*(([)\]])(?=[.^]|\s*$|[^([])|[)\]])([([]?))|(\s+)/g;

I also found some left over code and double comments (line numbers from jsrender.js):
line 114:
// subscription, e.g. JsViews integration
line 161:

				var unicodeChars = value && value.unicodeChars;
				if (unicodeChars) {
					setPathRegEx(unicodeChars);
				}

You might want to double check any changes related to this issue.

@BorisMoore
Copy link
Owner

Ah - sorry, I must have accidentally uploaded an intermediate version. Thanks for pointing out the stale code/comments, too!

So hopefully this is correct now. The regEx expressions are close to yours, but not quite identical. Let me know if you have any concerns, or see any issues.

jsrender-unicode.js.txt
testUnicode.html.txt
jsviews.js.txt
jsrender.js.txt

@johan-ohrn
Copy link
Author

I finally got to implement the fix in our code. Works perfectly so far!
I'll test it some more and see if anything turns up.

@BorisMoore
Copy link
Owner

I'll include this in the next update (v1.0.1)

@johan-ohrn
Copy link
Author

Glad to hear.
No problems so far.

BorisMoore added a commit to BorisMoore/jsviews.com that referenced this issue Dec 23, 2018
Additional documentation topics:

  Cascading <select>s: dynamic selection of subcategory items in child <select>:
  https://www.jsviews.com/#unicode

  Unicode character support:
  https://www.jsviews.com/#link-select@cascade

Minor bug fixes:

  Unicode characters in template expressions
  BorisMoore/jsrender#342

  Data-linked {^{for start= end=}} renders all items on $.observable.refresh()
  BorisMoore/jsviews#410

  Correction to typescript definition files:
  https://www.jsviews.com/#typescript
BorisMoore added a commit to BorisMoore/jsviews.com that referenced this issue Dec 23, 2018
Additional documentation topics:

  Cascading <select>s: dynamic selection of subcategory items in child <select>:
  https://www.jsviews.com/#unicode

  Unicode character support:
  https://www.jsviews.com/#link-select@cascade

Minor bug fixes:

  Unicode characters in template expressions
  BorisMoore/jsrender#342

  Data-linked {^{for start= end=}} renders all items on $.observable.refresh()
  BorisMoore/jsviews#410

  Correction to typescript definition files:
  https://www.jsviews.com/#typescript
BorisMoore added a commit to BorisMoore/jsviews.com that referenced this issue Dec 23, 2018
Feature improvement
  Unicode character support:
  BorisMoore/jsrender#342
  https://www.jsviews.com/#unicode

Additional documentation topics:

  Cascading <select>s: dynamic selection of subcategory items in child <select>:
  https://www.jsviews.com/#link-select@cascade

  Unicode character support:
  https://www.jsviews.com/#unicode

Minor bug fixes:

  Data-linked {^{for start= end=}} renders all items on $.observable.refresh()
  BorisMoore/jsviews#410

  Correction to typescript definition files:
  https://www.jsviews.com/#typescript
BorisMoore added a commit that referenced this issue Dec 23, 2018
Feature improvement
  Unicode character support in template expressions
  https://www.jsviews.com/#unicode
  #342

Minor bug fix:
  Correction to typescript definition files:
  https://www.jsviews.com/#typescript
BorisMoore added a commit to BorisMoore/jsviews that referenced this issue Dec 24, 2018
Feature improvement
  Unicode character support:
  BorisMoore/jsrender#342
  https://www.jsviews.com/#unicode

Additional documentation topics:

  Cascading <select>s: dynamic selection of subcategory items in child <select>:
  https://www.jsviews.com/#link-select@cascade

  Unicode character support:
  https://www.jsviews.com/#unicode

Minor bug fixes:

  Data-linked {^{for start= end=}} renders all items on $.observable.refresh()
  #410

  Correction to typescript definition files:
  https://www.jsviews.com/#typescript
@BorisMoore
Copy link
Owner

Fixed in commit v1.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants