New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the connection between the syntax and the `Expr`s be clearer? #2786

Closed
o-jasper opened this Issue Apr 7, 2013 · 5 comments

Comments

Projects
None yet
2 participants
@o-jasper
Member

o-jasper commented Apr 7, 2013

No need to have a difference between -:(a|b|c) -> :(|(|(a,b),c)), :(a+b+c) -> :(a+b+c), :(a:b).head -> symbol(":") also i noticed that :"a$x" changed from a macrocall, cannot fathom why for that either, i suppose because it follows this, however it'd make more sense to just have the .head be :str.

I have slightly incoherently(link FTR, no need read that) whined about it before, but it is much clearer in my head how it could be done.. Two things come close to capturing the syntax:

  1. Infix notation. They have some order, but imo otherwise they should all be equal.

  2. A list of beginners and ender strings. At any point there is a ender string on the stack, if this is reached, it is popped off and an Expr is completed. If any of the beginner strings is reached, a new Expr is started, with a corresponding ending .

Clearly begin/quote/function/type/if/for/while would have ender end would be pairs, also @,typealias and newline, [-], (-), {-}. "-"(no reason the beginner wouldnt be allowed to be equal)

Of course there are weakness to this, for instance fun_use(args) would need 'nothing or just whitespace' be possible as an infix. "" would need alterations for escaping("stuff$"stuff"", isnt expected to work?), if doesnt work nicely with else and elseif without some work.. May also be other weaknesses.

I have written something like it here.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Apr 7, 2013

Member

There is in fact a need to parse chains of + and * as multi-argument calls; it lets you do many things more efficiently. For example * can do string cat on many arguments at once instead of adding them one at a time.

Parsing interpolated strings with head str and only later lowering it to call string would make sense.

I don't understand your point about beginners and enders. Of course we terminate an expression when we find its closing token. Example perhaps?

Member

JeffBezanson commented Apr 7, 2013

There is in fact a need to parse chains of + and * as multi-argument calls; it lets you do many things more efficiently. For example * can do string cat on many arguments at once instead of adding them one at a time.

Parsing interpolated strings with head str and only later lowering it to call string would make sense.

I don't understand your point about beginners and enders. Of course we terminate an expression when we find its closing token. Example perhaps?

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Apr 7, 2013

Member

I should add that parsing `"a$x" as a macro call hardly makes sense either, so that should be a welcome change.

Member

JeffBezanson commented Apr 7, 2013

I should add that parsing `"a$x" as a macro call hardly makes sense either, so that should be a welcome change.

@o-jasper

This comment has been minimized.

Show comment
Hide comment
@o-jasper

o-jasper Apr 8, 2013

Member

"a$x" now it produces :($(Expr(:top, :string))("a",x,"")), barring this having a useful(future) purpose i'd prefer :(Expr(:string, "a",x,"")), it follows the idea that .head has to do with how it was parsed.

There is in fact a need to parse chains of + and * as multi-argument calls; it lets you do many things more efficiently. For example * can do string cat on many arguments at once instead of adding them one at a time.

Good point. Of course you could do that after macroexpansion too? Like a compiler macro, or inlining |(first,second,rest...) = |(|(first,second),rest...) though people writing |(first::Sometypes,second,rest....) might ruin the performance of that for you if the types arent clear. Could of course just bludgeon it, and hope it doesnt confuse too many people.

The idea with the beginners and enders is that you just have a function where you specify the infixes and 'hooks of various kinds', basically that begin is also a kind of parenthesis (, and instead of ending with ), it ends with end.

function julia_subset_parse(input)
infix_set = ["=", "&&","||", "!","~","==","!=", ">",">=","<=","<",
             "-","+", "/","*", "::", ":"]

add_infix(to) = (to[1],to[2],infix_set)
return treekenize(input,
                  map(add_infix,
                      {("(",")"), ("[","]"), ("{","}"), 
                       ("begin","end"),("function","end"),("if","end"),
                       ("type","end"),("module", "end")
                       ("typealias","\n"),("export","\n"), ("@","\n")}),
                  add_infix(("\n","\n")), 10,1)
end

This is the 'simplest' way of entering, the list you feed as 'begin-enders' has functions like head_begin, head_end, head_infix, head_seeker; the latter can return a complete new list for its body(" and # need to do this.), aswel as a 'disallowed strings' list.(Cant end ( with anything else than ); so it would disallow all the other enders in its juridiction) (Note: probably rename them expr_.., also currently the infixes in particular are much too buggy.)

The point of it is that i can call it a lisp with pretty good certaintly if the relationship between code and data is clear. It is fairly clear, but shifting a bit, and could be clearer. Really should have read julia/src/julia-parser.scm, i suppose, but looking at it now, it doesnt use this concept.

Member

o-jasper commented Apr 8, 2013

"a$x" now it produces :($(Expr(:top, :string))("a",x,"")), barring this having a useful(future) purpose i'd prefer :(Expr(:string, "a",x,"")), it follows the idea that .head has to do with how it was parsed.

There is in fact a need to parse chains of + and * as multi-argument calls; it lets you do many things more efficiently. For example * can do string cat on many arguments at once instead of adding them one at a time.

Good point. Of course you could do that after macroexpansion too? Like a compiler macro, or inlining |(first,second,rest...) = |(|(first,second),rest...) though people writing |(first::Sometypes,second,rest....) might ruin the performance of that for you if the types arent clear. Could of course just bludgeon it, and hope it doesnt confuse too many people.

The idea with the beginners and enders is that you just have a function where you specify the infixes and 'hooks of various kinds', basically that begin is also a kind of parenthesis (, and instead of ending with ), it ends with end.

function julia_subset_parse(input)
infix_set = ["=", "&&","||", "!","~","==","!=", ">",">=","<=","<",
             "-","+", "/","*", "::", ":"]

add_infix(to) = (to[1],to[2],infix_set)
return treekenize(input,
                  map(add_infix,
                      {("(",")"), ("[","]"), ("{","}"), 
                       ("begin","end"),("function","end"),("if","end"),
                       ("type","end"),("module", "end")
                       ("typealias","\n"),("export","\n"), ("@","\n")}),
                  add_infix(("\n","\n")), 10,1)
end

This is the 'simplest' way of entering, the list you feed as 'begin-enders' has functions like head_begin, head_end, head_infix, head_seeker; the latter can return a complete new list for its body(" and # need to do this.), aswel as a 'disallowed strings' list.(Cant end ( with anything else than ); so it would disallow all the other enders in its juridiction) (Note: probably rename them expr_.., also currently the infixes in particular are much too buggy.)

The point of it is that i can call it a lisp with pretty good certaintly if the relationship between code and data is clear. It is fairly clear, but shifting a bit, and could be clearer. Really should have read julia/src/julia-parser.scm, i suppose, but looking at it now, it doesnt use this concept.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Apr 8, 2013

Member

It's not implemented that way because it wouldn't work. There are many subtleties needed to get the behavior people intuitively expect, and to get line number nodes in the right places. If you want to make simplifications to the parser code, feel free to pull request. If it works I'll merge it.

The string thing I will change, so string interpolations are parsed differently from calls to string.

Member

JeffBezanson commented Apr 8, 2013

It's not implemented that way because it wouldn't work. There are many subtleties needed to get the behavior people intuitively expect, and to get line number nodes in the right places. If you want to make simplifications to the parser code, feel free to pull request. If it works I'll merge it.

The string thing I will change, so string interpolations are parsed differently from calls to string.

@o-jasper

This comment has been minimized.

Show comment
Hide comment
@o-jasper

o-jasper Apr 10, 2013

Member

What sort of subtileties exactly? I do realize this (unfortunately) it may have edge cases will have to be fixed in code, that cant be fixed without overly leaving the premise of the change.

I suppose i am fine with the syntax as it is now. Probably shouldnt take this on.. How infix notation is represented in the AST tree changes affects macros, macro writers should just flatten the Expr tree themselves before they work with it. I have been asserting everything so far, but i'll leave that approach for macros for a while.(Idea is to always know the range of values stuff may take)

Edit: thanks for your time, hope i didnt waste it. I suppose "the behavior people intuitively expect" comes from your experience with this too much work to convey.

Member

o-jasper commented Apr 10, 2013

What sort of subtileties exactly? I do realize this (unfortunately) it may have edge cases will have to be fixed in code, that cant be fixed without overly leaving the premise of the change.

I suppose i am fine with the syntax as it is now. Probably shouldnt take this on.. How infix notation is represented in the AST tree changes affects macros, macro writers should just flatten the Expr tree themselves before they work with it. I have been asserting everything so far, but i'll leave that approach for macros for a while.(Idea is to always know the range of values stuff may take)

Edit: thanks for your time, hope i didnt waste it. I suppose "the behavior people intuitively expect" comes from your experience with this too much work to convey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment