Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beanshell generics or type parameter support #66

Open
nickl- opened this issue May 19, 2018 · 4 comments
Open

Beanshell generics or type parameter support #66

nickl- opened this issue May 19, 2018 · 4 comments

Comments

@nickl-
Copy link
Member

nickl- commented May 19, 2018

Since last year bsh was happy to consume <identifier> as tokens scattered all over the grammar. There are 7 separate comparisons, thus far we are missing a few, for the parser to negotiate this form.
This is adding a lot more complexity to the grammar and already shows an impact on performance.

So I sat out to implement the diamond operator <> that was easy enough, then I saw wildcards <?> are rejected so we add another definition for this one. Better cater for parameter lists too <? | identifier (, identifier)*>, that does the trick but what about type bounds? <? [(extends | super) identifier] | identifier (, identifier)*> something like that but I'm starting to have my doubts. This is quickly becoming more trouble than it's worth.

Lets see type parameter types <List<String>> so we start with a < that's 1 then find another < so we have 2 open, then we get a > but we still need 1 more so keep going till we get another > and we're done. Then I started wondering, why exactly are we doing this anyway? My immediate goal is just to avoid parser exceptions and make some failing tests work. Will we actually ever implement type safety for beanshell? Will we? There most definitely are far more pressing matters to be concerned with right now that is for sure.

Lets take a step back... if we're not actually enforcing the rules then we don't really need to tokenise the parameters. That makes things much easier, to implement and the parser to process, we only need to look for < and > anywhere and disregard its contents. Less things to match of course means quicker through matching the things we need.

This is what I've got for now:

"<" ("?")? (["a"-"z","A"-"Z","0"-"9",","," ","_","&"])* ("<" (~[">"])* ">")? ">"

The match set [] seems less greedy than the not match one ~[], or maybe I just haven't hit the nail on the head yet. This manages to parse all the examples on the oracle help pages but I'm sure there are other cases I miss. Something to work on...

For the raw types...

            public class Box<T> {
                private T t;
                public void set(T t) {
                    this.t = t;
                }
                public T get();
                    return t;
                }
            }

I just type cast to java.lang.Object if class not found and class name length == 1. Isn't this what java does before inference anyway? Everything I've tried so far seems to work without a hitch, everything except varargs.

Not quite ready to push just yet, still have some uncommitted snippets which I'm still pondering over. just wanted to put this out there and hear your thoughts on the topic.

@stefanofornari
Copy link
Contributor

stefanofornari commented May 19, 2018 via email

@nickl-
Copy link
Member Author

nickl- commented May 19, 2018

Hey @stefanofornari! Tx for the feedback! 👍

Yes I also reference pmd often and the EL lexer which covers less but has compatible syntax. I originally went their route... wait I don't think that is it... which is indeed similar to the new approach you are right, well spotted. Their generics tokeniser is called TypeParameters But this seems to be missing quite a bit, do you agree?

This won't even chew diamond tags <> if I'm not mistaken:

void TypeParameters():
{}
{
   "<" {checkForBadGenericsUsage();} TypeParameter() ( "," TypeParameter() )* ">"
}

void TypeParameter():
{Token t;}
{
   (TypeAnnotation())*
   t=<IDENTIFIER> {jjtThis.setImage(t.image);} [ TypeBound() ]
}

void TypeBound():
{}
{
   "extends" (TypeAnnotation())* ClassOrInterfaceType() ( "&" (TypeAnnotation())* ClassOrInterfaceType() )*
}

If you go and search through their grammar for TypeParameters (I think I called ours TypeArguments for some reason), you will see all the points of reference where the parser needs to work at this. Now compare that to the expression which does everything and anything oracle can imagine within diamond tags in the future as well.

Yes TypeArguments This is what we have currently (from last year):

void TypeArguments() :
{ }
{
"<" <IDENTIFIER> ">"
}

Just the basics which gets referenced 7 times already in our grammar and takes a significant punch. As you can see from their implementation it is going to get busy quick. That is very expensive stuff.

As far as priorities go implementing type safety is very low on my list. =) On the contrary I am in two minds whether to lax beanshell even more or not. It baffles me why a = 2; is perfectly acceptable variable declaration in bsh but if you want to decorate it you need to add a type. final a = 2; will give you the slightly frustrating annoying Found "final" final...is that really your final word(s) on my syntax. Sure it may take some getting used, not easy on the eyes much, but we already have everything implemented to get it done so why not?

Anyway back to topic... the point I was getting to, why waste resources on something we don't want. Moving on... Theodore S. Norvell suggests using MORE for this instead as it will produce more verbose feedback while he warns of dragons when finding problems with bad expressions, because they are silent. Which is one of the things that attracts me... not the dragons... it is quick! The first few times I ran with it I was confused why unit tests were passing, surely it chewed my whole script that fast. I had to go look, what are these tests actually validating.

// When a /* is seen in the DEFAULT state, skip it and switch to the IN_COMMENT state
MORE : {
"/*": IN_COMMENT }
      
// When any other character is seen in the IN_COMMENT state, skip it.
< IN_COMMENT > MORE : {
<  ~[] > }
      
// When a */ is seen in the IN_COMMENT state, skip it and switch back to the DEFAULT state
< IN_COMMENT > SKIP : {
"*/": DEFAULT }

From What is MORE?- The JavaCC FAQ - Theodore S. Norvell

For block comments, something spanning many lines this probably makes sense but it seems a lot of effort (for the parser) to swap states several times per line for the small generics blocks, doesn't that seem overkill? Another caveat with this approach is that "<" needs to be an IDENTIFIER but we already use it as such for something else.

@nickl-
Copy link
Member Author

nickl- commented Jun 2, 2018

So after some more tweaking and deliberating... this is what we ended up with:

   "<" ( (~[">",";","|","&","\n","\r"])* ("& "|">,")? )* ([">"])+

Which properly addresses those nested parameters like

   List<List<Map<String,Map<List<String>,Integer>>>> ofNameMaps = new ArrayList<>();` 

but also avoids swallowing conditional expressions, without the need to keep a count of opening minus closing braces.

Spot anything I missed?

@nickl-
Copy link
Member Author

nickl- commented Jun 8, 2018

See 031a635 and 2056ede

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants