Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Here's how to fix SugarJS parsing of dates with Unicode space characters in them #687

Open
jikamens opened this issue Sep 5, 2023 · 0 comments

Comments

@jikamens
Copy link
Contributor

jikamens commented Sep 5, 2023

Andrew Plummer appears to have gone radio silent on SugarJS for several years now. It's kind of weird since he's active in other repositories and people are clearly still using SugarJS. I hope everything's OK.

In any case, I just had to deal with a SugarJS date-parsing issue that I suspect an increasing number of people are going to run into over time, so even if nothing is ever going to be done by Andrew about this issue, I thought I should post it here to tell others how to fix it.

If you've arrived here because machine-generated dates you're trying to parse with SugarJS are suddenly failing to parse, and when you look at the dates they look perfectly fine, it may be because (you may have already figured this next part out) some of the spaces in the dates are actually Unicode short space characters (a.k.a. 0x202F, \u202F, 202F), which SugarJS doesn't understand. This is the Unicode standards for generating human-readable dates have changed, and various JavaScript platforms are switching to the new standards over time. See, e.g., nodejs/node#45938 and nodejs/node#45171.

I dug into the innards of the SugarJS date-parsing code, and this is the change you need to make SugarJS understand these dates:

--- a/lib/date.js
+++ b/lib/date.js
@@ -2269,7 +2269,7 @@ function getNewLocale(def) {
       function formatToSrc(str) {
 
         // Make spaces optional
-        str = str.replace(/ /g, ' ?');
+        str = str.replace(/ /g, '\\s*');
 
         str = str.replace(/\{([^,]+?)\}/g, function(match, token) {
           var tokens = token.split('|');

Note that this change actually does two things: makes the code accept all whitespace characters, not just ASCII 32, and makes the code treat multiple adjacent whitespace characters as one. I think this behavior is correct since you never know when people are going to put extra spaces in things, but if you just want the first half of that, then you can do this:

--- a/lib/date.js
+++ b/lib/date.js
@@ -2269,7 +2269,7 @@ function getNewLocale(def) {
       function formatToSrc(str) {
 
         // Make spaces optional
-        str = str.replace(/ /g, ' ?');
+        str = str.replace(/ /g, '\\s?');
 
         str = str.replace(/\{([^,]+?)\}/g, function(match, token) {
           var tokens = token.split('|');

If want to keep the only-match-one-character before and you want to be paranoid and only match the specific Unicode character we're talking about, then you can do:

--- a/lib/date.js
+++ b/lib/date.js
@@ -2269,7 +2269,7 @@ function getNewLocale(def) {
       function formatToSrc(str) {
 
         // Make spaces optional
-        str = str.replace(/ /g, ' ?');
+        str = str.replace(/ /g, '[ \u202F]?');
 
         str = str.replace(/\{([^,]+?)\}/g, function(match, token) {
           var tokens = token.split('|');

But I think the first change above is probably reasonable and safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant