Beanshell generics or type parameter support #66

nickl- · 2018-05-19T10:30:00Z

Since last year bsh was happy to consume <identifier> as tokens scattered all over the grammar. There are 7 separate comparisons, thus far we are missing a few, for the parser to negotiate this form.
This is adding a lot more complexity to the grammar and already shows an impact on performance.

So I sat out to implement the diamond operator <> that was easy enough, then I saw wildcards <?> are rejected so we add another definition for this one. Better cater for parameter lists too <? | identifier (, identifier)*>, that does the trick but what about type bounds? <? [(extends | super) identifier] | identifier (, identifier)*> something like that but I'm starting to have my doubts. This is quickly becoming more trouble than it's worth.

Lets see type parameter types <List<String>> so we start with a < that's 1 then find another < so we have 2 open, then we get a > but we still need 1 more so keep going till we get another > and we're done. Then I started wondering, why exactly are we doing this anyway? My immediate goal is just to avoid parser exceptions and make some failing tests work. Will we actually ever implement type safety for beanshell? Will we? There most definitely are far more pressing matters to be concerned with right now that is for sure.

Lets take a step back... if we're not actually enforcing the rules then we don't really need to tokenise the parameters. That makes things much easier, to implement and the parser to process, we only need to look for < and > anywhere and disregard its contents. Less things to match of course means quicker through matching the things we need.

This is what I've got for now:

"<" ("?")? (["a"-"z","A"-"Z","0"-"9",","," ","_","&"])* ("<" (~[">"])* ">")? ">"

The match set [] seems less greedy than the not match one ~[], or maybe I just haven't hit the nail on the head yet. This manages to parse all the examples on the oracle help pages but I'm sure there are other cases I miss. Something to work on...

For the raw types...

            public class Box<T> {
                private T t;
                public void set(T t) {
                    this.t = t;
                }
                public T get();
                    return t;
                }
            }

I just type cast to java.lang.Object if class not found and class name length == 1. Isn't this what java does before inference anyway? Everything I've tried so far seems to work without a hitch, everything except varargs.

Not quite ready to push just yet, still have some uncommitted snippets which I'm still pondering over. just wanted to put this out there and hear your thoughts on the topic.

The text was updated successfully, but these errors were encountered:

stefanofornari · 2018-05-19T13:53:38Z

Hi Nick it sounds a pragmatic approach, and I agree we do not really need to enforce the type. I ran into the grammar used by PMD which at the end seems to do the same. You probably already found it, but I understand they use the follwoing: TOKEN : { < RUNSIGNEDSHIFT: ">>>" > { matchedToken.kind = GT; ((Token.GTToken)matchedToken).realKind = RUNSIGNEDSHIFT; input_stream.backup(2); matchedToken.image = ">"; } | < RSIGNEDSHIFT: ">>" > { matchedToken.kind = GT; ((Token.GTToken)matchedToken).realKind = RSIGNEDSHIFT; input_stream.backup(1); matchedToken.image = ">"; } | < GT: ">" > } (see https://github.com/pmd/pmd/blob/master/pmd-java/etc/grammar/Java.jjt) HTH Ste Ste

…

On Sat, May 19, 2018 at 12:30 PM, Nick Lombard ***@***.***> wrote: Since last year bsh was happy to consume <identifier> as tokens scattered all over the grammar. There are 7 separate comparisons, thus far we are missing a few, for the parser to negotiate this form. This is adding a lot more complexity to the grammar and already shows an impact on performance. So I sat out to implement the diamond operator <> that was easy enough, then I saw wildcards <?> are rejected so we add another definition for this one. Better cater for parameter lists too <? | identifier (, identifier)*>, that does the trick but what about type bounds? <? [(extends | super) identifier] | identifier (, identifier)*> something like that but I'm starting to have my doubts. This is quickly becoming more trouble than it's worth. Lets see type parameter types <List<String>> so we start with a < that's 1 then find another < so we have 2 open, then we get a > but we still need 1 more so keep going till we get another > and we're done. Then I started wondering, why exactly are we doing this anyway? My immediate goal is just to avoid parser exceptions and make some failing tests work. Will we actually ever implement type safety for beanshell? Will we? There most definitely are far more pressing matters to be concerned with right now that is for sure. Lets take a step back... if we're not actually enforcing the rules then we don't really need to tokenise the parameters. That makes things much easier, to implement and the parser to process, we only need to look for < and > anywhere and disregard its contents. Less things to match of course means quicker through matching the things we need. This is what I've got for now: "<" ("?")? (["a"-"z","A"-"Z","0"-"9",","," ","_","&"])* ("<" (~[">"])* ">")? ">" The match set [] seems less greedy than the not match one ~[], or maybe I just haven't hit the nail on the head yet. This manages to parse all the examples on the oracle help pages but I'm sure there are other cases I miss. Something to work on... For the raw types... public class Box<T> { private T t; public void set(T t) { this.t = t; } public T get(); return t; } } I just type cast to java.lang.Object if class not found and class name length == 1. Isn't this what java does before inference anyway? Everything I've tried so far seems to work without a hitch, everything except varargs. Not quite ready to push just yet, still have some uncommitted snippets which I'm still pondering over. just wanted to put this out there and hear your thoughts on the topic. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#66>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACoghaafnohZ60d-TuoCSd8hVFwrnSOks5tz_QrgaJpZM4UFqVy> .

nickl- · 2018-05-19T16:48:08Z

Hey @stefanofornari! Tx for the feedback! 👍

Yes I also reference pmd often and the EL lexer which covers less but has compatible syntax. I originally went their route... wait I don't think that is it... which is indeed similar to the new approach you are right, well spotted. Their generics tokeniser is called TypeParameters But this seems to be missing quite a bit, do you agree?

This won't even chew diamond tags <> if I'm not mistaken:

void TypeParameters():
{}
{
   "<" {checkForBadGenericsUsage();} TypeParameter() ( "," TypeParameter() )* ">"
}

void TypeParameter():
{Token t;}
{
   (TypeAnnotation())*
   t=<IDENTIFIER> {jjtThis.setImage(t.image);} [ TypeBound() ]
}

void TypeBound():
{}
{
   "extends" (TypeAnnotation())* ClassOrInterfaceType() ( "&" (TypeAnnotation())* ClassOrInterfaceType() )*
}

If you go and search through their grammar for TypeParameters (I think I called ours TypeArguments for some reason), you will see all the points of reference where the parser needs to work at this. Now compare that to the expression which does everything and anything oracle can imagine within diamond tags in the future as well.

Yes TypeArguments This is what we have currently (from last year):

beanshell/src/main/jjtree/bsh.jjt

Lines 1181 to 1185 in c4b098e

    
           void TypeArguments() : 
        
           { } 
        
           { 
        
               "<" <IDENTIFIER> ">" 
        
           }

Just the basics which gets referenced 7 times already in our grammar and takes a significant punch. As you can see from their implementation it is going to get busy quick. That is very expensive stuff.

As far as priorities go implementing type safety is very low on my list. =) On the contrary I am in two minds whether to lax beanshell even more or not. It baffles me why a = 2; is perfectly acceptable variable declaration in bsh but if you want to decorate it you need to add a type. final a = 2; will give you the slightly frustrating annoying Found "final" final...is that really your final word(s) on my syntax. Sure it may take some getting used, not easy on the eyes much, but we already have everything implemented to get it done so why not?

Anyway back to topic... the point I was getting to, why waste resources on something we don't want. Moving on... Theodore S. Norvell suggests using MORE for this instead as it will produce more verbose feedback while he warns of dragons when finding problems with bad expressions, because they are silent. Which is one of the things that attracts me... not the dragons... it is quick! The first few times I ran with it I was confused why unit tests were passing, surely it chewed my whole script that fast. I had to go look, what are these tests actually validating.

// When a /* is seen in the DEFAULT state, skip it and switch to the IN_COMMENT state
MORE : {
"/*": IN_COMMENT }
      
// When any other character is seen in the IN_COMMENT state, skip it.
< IN_COMMENT > MORE : {
<  ~[] > }
      
// When a */ is seen in the IN_COMMENT state, skip it and switch back to the DEFAULT state
< IN_COMMENT > SKIP : {
"*/": DEFAULT }

From What is MORE?- The JavaCC FAQ - Theodore S. Norvell

For block comments, something spanning many lines this probably makes sense but it seems a lot of effort (for the parser) to swap states several times per line for the small generics blocks, doesn't that seem overkill? Another caveat with this approach is that "<" needs to be an IDENTIFIER but we already use it as such for something else.

nickl- · 2018-06-02T22:05:00Z

So after some more tweaking and deliberating... this is what we ended up with:

   "<" ( (~[">",";","|","&","\n","\r"])* ("& "|">,")? )* ([">"])+

Which properly addresses those nested parameters like

   List<List<Map<String,Map<List<String>,Integer>>>> ofNameMaps = new ArrayList<>();`

but also avoids swallowing conditional expressions, without the need to keep a count of opening minus closing braces.

Spot anything I missed?

nickl- · 2018-06-08T05:06:55Z

See 031a635 and 2056ede

This was referenced Jun 22, 2018

Unable to parse generic types #22

Closed

Starting to define a method crashes beanshell #61

Closed

nickl- added the discussion label Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beanshell generics or type parameter support #66

Beanshell generics or type parameter support #66

nickl- commented May 19, 2018

stefanofornari commented May 19, 2018 via email

nickl- commented May 19, 2018

nickl- commented Jun 2, 2018

nickl- commented Jun 8, 2018

Beanshell generics or type parameter support #66

Beanshell generics or type parameter support #66

Comments

nickl- commented May 19, 2018

stefanofornari commented May 19, 2018 via email

nickl- commented May 19, 2018

nickl- commented Jun 2, 2018

nickl- commented Jun 8, 2018