-
Notifications
You must be signed in to change notification settings - Fork 473
Issue #1292: es6 style template string #1316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is awesome @dorafmon. Here's my opinion:
|
jscomp/ext/ext_str.mli
Outdated
| (** Compile a regular expression. The syntax for regular expressions | ||
| is the same as in Gnu Emacs. The special characters are | ||
| [$^.*+?[]]. The following constructs are recognized: | ||
| - [. ] matches any character except newline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str comes with ocaml core distribution, so you don't need copy here. But I would suggest not using it, since you only need one function which can be replaced with a string function easily
jscomp/others/js_json.mli
Outdated
| val test : 'a -> 'b kind -> bool | ||
|
|
||
| external parse : string -> t = "JSON.parse" [@@bs.val] | ||
| external stringify: 'a -> string = "JSON.stringify" [@@bs.val] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding on function in Js.String
external ofAny : 'a -> t = "String" [@@bs.val]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already Js.String.make, which does exactly that
jscomp/syntax/ast_utf8_string.ml
Outdated
| let rec print_string_list ss = match ss with | ||
| | [] -> () | ||
| | (hd::tl) -> let _ = print_endline hd in print_string_list tl | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List.iter print_endline
jscomp/test/es6_style_string.ml
Outdated
|
|
||
| let es62 = {j|$str, 君の名は|j} | ||
|
|
||
| let a = "a";; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need add a test case here:
let a = {j| blabla \$(xx) |j} (* should not be interpolated*)
let b = {j| blabla \$xxx |j} (* should not be interpolated *)
jscomp/syntax/ast_utf8_string.ml
Outdated
| } in _transform_individual_expression rexp new_loc (new_exp::nl)) | ||
|
|
||
| let transform_es6_style_template_string s loc = | ||
| let sub_strs = Ext_str.full_split template_string_regex s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a Ext_string.split, but you need verify it is utf8 first
jscomp/syntax/ast_utf8_string.ml
Outdated
| | [] -> prev | ||
| | (e::re) -> | ||
| let string_concat_exp:Parsetree.expression = {e with pexp_desc = Parsetree.Pexp_ident {txt = Longident.Lident ("^"); loc = e.pexp_loc}} in | ||
| let new_string_exp = {e with pexp_desc = Parsetree.Pexp_apply (string_concat_exp, [("", prev); ("", e)])} in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To play safe, qualified it as Pervasives.(^) instead of (^)
|
The CI error means you need link |
jscomp/others/js_json.ml
Outdated
|
|
||
| external parse : string -> t = "JSON.parse" [@@bs.val] | ||
| (* TODO: more docs when parse error happens or stringify non-stringfy value *) | ||
| external stringify: 'a -> string = "JSON.stringify" [@@bs.val] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be t or 'a?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't need this anymore. Will move this change to another PR later.
|
@bobzhang Hi, I will remove the dependency on |
|
@glennsl Hi thanks I will use |
|
@bobzhang never mind I realized we probably do not want unnecessary dependencies anyway. I will replace it with a function. |
|
@dorafmon let me know when it is good for review ; ) |
|
@bobzhang sure, I am missing out some unit tests, that's why I did not ask for a review. Will work on this and let you know. 😄 |
|
@bobzhang hi I think it's ready for a review. Thanks! |
bobzhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you use ocp-indent for indenting, I would suggest have it set up, it is a nice tool
jscomp/ext/ext_string.ml
Outdated
|
|
||
|
|
||
|
|
||
| let append s c = s ^ String.make 1 c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is confusing. append_char?
| end; | ||
| __LOC__ >:: begin fun _ -> | ||
| let s, new_index = Ast_utf8_string.consume_text "Hello \\$world" 0 in | ||
| let _ = s =~ "Hello \\$world" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a few more edge cases?
"\$"
"\$x"
"\\$x"
"\\$"
"\\\$"
"\\\$x"
"\\\\$x"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here, what is the expected behavior? Note here "Hello \\$world" in fact equals to {j|Hello \$world|j} since the OCaml parser will do some processing and escape the \\$ to \$. Thanks.
jscomp/syntax/ast_utf8_string.ml
Outdated
| if index = String.length s then new_word, String.length s | ||
| else begin | ||
| match s.[index] with | ||
| | '$' -> if last_char = '\\' then _consume_text s (index+1) '$' (Ext_string.append new_word '$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is incorrect, see my edge cases above
jscomp/syntax/ast_utf8_string.ml
Outdated
| | '(' -> (if !with_par = false then (with_par := true; _consume_delim s (index+1) ident) else (None, index)) | ||
| | ')' -> (if !with_par = false then (None, index + 1) else (with_par := false; (Some ident, index+1))) | ||
| | '$' -> (_consume_delim s (index+1) ident) | ||
| | c -> if (Char.code c >= Char.code 'a' && Char.code c <= Char.code 'z') || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ocaml support char range patterns
| 'a' ... 'z' | 'A' .. 'Z' | '0' .. '9'| '_'
jscomp/syntax/ast_utf8_string.ml
Outdated
| } in _transform_individual_expression rexp new_loc (new_exp::nl)) | ||
| | Delim p -> (let new_loc = compute_new_loc loc p in | ||
| let ident = Parsetree.Pexp_ident { txt = (Longident.Lident p); loc = loc } in | ||
| let js_to_string = Parsetree.Pexp_ident { txt = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you factor out it into a small function?
jscomp/syntax/ast_utf8_string.ml
Outdated
| | [] -> prev | ||
| | (e::re) -> | ||
| let string_concat_exp:Parsetree.expression = {e with pexp_desc = Parsetree.Pexp_ident | ||
| {txt = Longident.Ldot (Longident.Lident ("Pervasives"), "^"); loc = e.pexp_loc}} in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| Location.raise_errorf ~loc "{j||j} is reserved for future use" | ||
| else if Ext_string.equal delim Literals.unescaped_j_delimiter then | ||
| let starting_loc = {loc with loc_end = loc.loc_start} in | ||
| let empty_string_concat_exp = {e with pexp_desc = Pexp_constant (Const_string ("", None)); pexp_loc = starting_loc} in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get rid of empty_string_concat_exp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to get rid of it? I think it makes the implementation much simpler if we keep it.
jscomp/syntax/ast_utf8_string.ml
Outdated
| if index >= String.length s then List.rev nl | ||
| else begin | ||
| match consume_text s index, consume_delim s index with | ||
| | ("" , str_index) , (None , err_index) -> let new_loc = error_reporting_loc loc index err_index in Location.raise_errorf ~loc:new_loc "Not a valid es6 template string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of raising directly, can we return an optional for easy unit-testing?
The current unit testing depends on compiler-libs, it would be nice that such utility is self-contained
|
when we do testing, we just use {||} which will not do any escape so that
{|\|} ( which is the same as "\\")
reply@reply.github.com At: 03/14/17 18:49:37" data-digest="From: reply@reply.github.com At: 03/14/17 18:49:37" style="">
From: reply@reply.github.com At: 03/14/17 18:49:37
To: bucklescript@noreply.github.com
Cc: Hongbo Zhang (BLOOMBERG/ 731 LEX), mention@noreply.github.com
Subject: Re: [bloomberg/bucklescript] Issue #1292: es6 style template string (#1316)
@dorafmon commented on this pull request.
In jscomp/ounit_tests/ounit_utf8_test.ml: > @@ -19,5 +43,81 @@ let suites =
__LOC__ >:: begin fun _ ->
Ext_utf8.decode_utf8_string
"" =~ []
- end
+ end;
+ __LOC__ >:: begin fun _ ->
+ Ext_string.append "Hell" 'o' =~ "Hello"
+ end;
+ __LOC__ >:: begin fun _ ->
+ Ast_utf8_string.consume_text "Hello $world" 0 =~ ("Hello ", 6)
+ end;
+ __LOC__ >:: begin fun _ ->
+ let s, new_index = Ast_utf8_string.consume_text "Hello \\$world" 0 in
+ let _ = s =~ "Hello \\$world" in
I am a bit confused here, what is the expected behavior? Note here "Hello \\$world" in fact equals to {j|Hello \$world|j} since the OCaml parser will do some processing and escape the \\$ to '$'. Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
@bobzhang sorry I don't quite get what you mean here, functions like |
|
@dorafmon it seems your code does not work on |
|
It will be compiled to |
|
it should be "\" + x |
|
Just to clarify here, we only need to escape |
bobzhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the lexical convention will follow js style, would you take a look at how we do utf8 decoding https://github.com/bloomberg/bucklescript/blob/master/jscomp/syntax/ast_utf8_string.ml#L26
I think you need to that in the first pass, other the location would be wrong, for example
{j|你好$x|j} the offset of x would be 6 if you don't do utf8 decoding
| else (new_word, index) | ||
| | c -> _consume_text s (index + 1) c (Ext_string.append new_word c) | ||
| | '\\' -> (if index + 1 = String.length s then "", index else | ||
| match s.[index+1] with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may get out of bound
| Char.code c = Char.code '_' | ||
| then _consume_delim s (index+1) (Ext_string.append ident c) | ||
| else if !with_par = false then (Some ident, index) else (None, index + 1) | ||
| | 'a' .. 'z' | 'A' .. 'Z' | '0' .. '9'| '_' ->_consume_delim s (index+1) (Ext_string.append_char ident s.[index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
|
@bobzhang Hi, I rebased my changes to master and have fixed some small issues according to the code review (do a utf8 decoding and OCaml lexical escape first so we get the location correct). As I understand that we know have a different error reporting rather than Location.raise_error? Could you point me to that so I can change my code to use that? Overall this is well tested already, let's fix this and ship it. |
| Format.pp_print_string ppf if_highlight | ||
| else begin | ||
| fprintf ppf "%a%a %s" print loc print_error_prefix () msg; | ||
| List.iter (Format.fprintf ppf "@\n@[<2>%a@]" default_error_reporter) sub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, would you update your ocaml submodule? I cherry picked a fix from upstream, thanks
| let starting_loc = {loc with loc_end = loc.loc_start} in | ||
| let empty_string_concat_exp = {e with pexp_desc = Pexp_constant (Const_string ("", None)); pexp_loc = starting_loc} in | ||
| let exps_list = Ast_utf8_string.transform_es6_style_template_string s starting_loc in | ||
| Ast_utf8_string.fold_expression_list_with_string_concat empty_string_concat_exp exps_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just export one function in Ast_utf8_string, so it looks simpler here
| @@ -1,5 +1,5 @@ | |||
| (* Copyright (C) 2015-2016 Bloomberg Finance L.P. | |||
| * | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create an interface file for this module, sorry for my laziness
| | '$' -> _consume_text s (index+2) ' ' (Ext_string.append_char new_word '$') | ||
| | c -> _consume_text s (index+1) '\\' (Ext_string.append_char new_word '\\')) | ||
| | '$' -> (new_word, index) | ||
| | c -> _consume_text s (index + 1) c (Ext_string.append_char new_word c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend_string.append_char is highly in-efficient. you should record offset, instead of creating new string each time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also there is no utf8 involvement when consuming(decoding)
| if index = String.length s then (if !with_par = true then (None, index) else (Some ident, index)) | ||
| else | ||
| match s.[index] with | ||
| | '(' -> (if !with_par = false then (with_par := true; _consume_delim s (index+1) ident) else (None, index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unclear about such piece of code, our interpolation is quite simple(no nested interpolation).
it is only $x or $(x)
| let error_reporting_loc (loc:Location.t) start_index end_index = | ||
| let new_loc = | ||
| {loc with loc_start = {loc.loc_start with pos_cnum = loc.loc_start.pos_cnum + start_index}; | ||
| loc_end = {loc.loc_end with pos_cnum = loc.loc_start.pos_cnum + end_index }} in new_loc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should record offset, instead of creating new loc each time, otherwise, it would do lots of allocation
Hi, this is the first implementation. So we can have
compiled to
From what I can see here there are a few issues I will be working on to improve this:
Do not use JSON.stringify to convert any object into a string. This should be done by other means, but my JS knowledge is limited here. Any good suggestions?
More unit testing on how we report error for certain location
To use regex in OCaml I added the dependency on the standard library's Str module, do you have an opinion on this? It's named Ext_str.
It depends on a C file, which I also pulled in, but if you use clang to compile it, it will raise some warnings. I will look into removing them later.
Anyway, any other suggestions are very welcomed!