-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JavaCC 21 workgroup #685
Comments
Wow is it already Christmas again? =) You just made my day and probably the rest of my year!
Even though we have just about resolved all the outstanding parser issues now, having only one outstanding item remaining, I do think the best long term strategy will be to consider the upstream grammar as master and attempt to port only the variations over from the BeanShell grammar. From my initial and very brief assessment I already noticed some naming alterations and if we want to streamline future updates with the least amount of friction then it behooves BeanShell to conform to what is defined upstream instead of the other way around. There is going to be work that needs done and we might as well suffer some growing pains now to have more pleasure in the future. What do you think @revusky? These are some of the things which we may need to considered. More relaxed statement or expression grammar: beanshell/src/main/jjtree/bsh.jjt Lines 1306 to 1331 in 0258df1
This should allow even basic Being able to specify import and package statements anywhere beanshell/src/main/jjtree/bsh.jjt Lines 1280 to 1286 in 0258df1
The same kind of goes for variables and methods too which can also be declared just about anywhere, and it is loosely typed so wherever there is a Type definition it is optional: beanshell/src/main/jjtree/bsh.jjt Lines 679 to 688 in 0258df1
Formal parameters can also be only an identifier. beanshell/src/main/jjtree/bsh.jjt Lines 712 to 724 in 0258df1
Method declaration can also just be only an identifier followed by a "(" beanshell/src/main/jjtree/bsh.jjt Lines 642 to 660 in 0258df1
Then there are some other conveniences like array dimensions allows us to also just use an arrayinitializer instead of defined dimensions: beanshell/src/main/jjtree/bsh.jjt Lines 1178 to 1191 in 0258df1
We added python style slices beanshell/src/main/jjtree/bsh.jjt Lines 1016 to 1023 in 0258df1
Lets us basically do arrays. maps, lists and general collections all through the same syntax String and character literals are interchangeable, we deal with a character in the engine any character string longer than 1 is considered a string. beanshell/src/main/jjtree/bsh.jjt Lines 398 to 400 in 0258df1
We BeanShell numbers have extensive integer and floating point literal definitions through which we have turned BigInteger and BigDecimal into glorified primitives with auto widening and narrowing support and just yesterday have turned all primitives into math classes beanshell/src/main/jjtree/bsh.jjt Lines 372 to 376 in 0258df1
Floating point literals beanshell/src/main/jjtree/bsh.jjt Lines 386 to 391 in 0258df1
For example script source files there are the test scripts and commands which are scripts that can be called as functions. These should give a good idea of what should be parseable. But while all of this fun stuff is true, it does not come at the expense of JAVA compatibility and nothing added must break our ability to process standard JAVA. Additions on top of the JAVA grammar with a liberal helping of freedom from constraints. Our motto or credo is, that while BeanShell works like JAVA does it doesn't also have to break like JAVA does. Welcome on board!! |
Well, to be clear, it's unlikely that I'm going to do much work on the semantics of things or how your runtime works. My goal right now is just to get you on the rails in terms of using JavaCC 21 for parser generation. Then you can better focus on the semantics, since most of the grammar/parsing side will be handled by the included Java grammar, except in the cases where you want to override it. Probably most (or maybe even all) of the BSHXXX.java classes that you have to check in and maintain by hand can be just generated, which really is a big win. As for In any case, my offer is pretty much solely to get the parsing side using JavaCC 21 |
From off topic issue:
That is the plan yes, and your contributions do not go unnoticed. thank you very much. For as long as we still remain on the legacy parser, wasteful energy expenditure is to be expected, and mustn't be avoided less the wheels start coming off. The sooner we get this migration to a point the better for everyone. |
Further from off topic issue:
There now we are on topic and the correct place to find clarity of the current understanding, where we should report on progress, and where we are free to make new decisions or change prior commitments. As long as the issue remains open it also communicates that the task is not complete. We've had several discussions off line to date but unless it is recorded here there is really no way to keep track. Lets recap the current understanding to date: ....revusky: Here's the proposal. I'll do it for you. My commitment remains the same, you can be certain that I do not expect you to work alone, I will help. I am with the understanding that you are at the wheel. While you are driving I can make conversation to ensure you stay awake, point out land marks or potholes in the road, even take incentive like opening a cooldrink so you don't spill or light your cigarette to avoid distraction. But ultimately if you want something specific from me like opening the window a smidge, put on the heater or find a place for you on the map, you will need to tell me. I may know where we are going but as the driver only you know how to get there. The expectation remains clear, you are only driving until BeanShell is parsing against the Congo grammar. At which point I will hopefully be able to see what remains of the road left to travel so that I can take the wheel. If I cannot see the road we are going to get lost or worse, have a terrible accident and get stranded without transport. If you recall, you requested a review of what has been done, we had a discussion about it and my conclusion was that I was not yet able to see the road ahead. I humbly admitted in conclusion, that unless you can clearly point out the route, like on a map for example, you will have to take the wheel again and drive us to the next milestone. If you leave it with me like this, I am afraid all the work will be for nothing. This is not me backing out of my commitment, or leaving you in the lurch, but simply a statement of fact. If you leave me here to pick it up on my own, we will not reach our destination. I did do another audit over the last week and will report my findings in the next post.
Hopefully I was successful to mitigate line item b) to satisfaction, however should even the slightest doubt, suspicion, disbelief or confusion remain, please talk to me for my intentions are admirable and true. In light hereof if you are unwilling, unable or simply too busy to recant the above statement please know you are under no obligation to deliver on the proposal or commitment. No harm done so no foul we simply close the issue and forget about it. If like me you'd prefer a win instead, please clearly state your commitment and where we go next. |
Rebased the congocc branch on top of latest master, it required a few minor tweaks to realign with package changes and permission scope, which after application the build again completes with all tests ending in success. These are the findings... Slow build:After updating to the new congo jar file it reverted the custom jar file which included the proposed fix to ignore generating files that exist which has not been merged, which is what the current solution is capable of. It may only be due to more files being physically generated and written to disk but in the absence of a solution current effects are:
This will negatively impact development costs. Further benchmarks on script parsing and running costs still outstanding. No congo parsing:Since all tests are completing with success it has the appearance that everything is done, but it turns out the congocc branch still uses javacc in its entirety, with all files currently generated now being merged under the Since the current parser is still in use there is no change, when for example trying to parse a lambda expression. Current master branch:BeanShell 3.0.0-SNAPSHOT.5569
bsh % array = new int[10];
--> $0 = {0I, 0I, 0I, 0I, 0I, 0I, 0I, 0I, 0I, 0I} :int[]
bsh % Arrays.setAll(array, p -> p > 9 ? 0 : p);
// Error: Parser Error: Unable to parse code syntax. Encountered: -> at line 2, column 24
bsh % // Error: Parser Error: Unable to parse code syntax. Encountered: ) at line 1, column 15 From the new congocc branch:BeanShell 3.0.0-SNAPSHOT.CONGOCC
bsh % array = new int[10];
--> $0 = {0I, 0I, 0I, 0I, 0I, 0I, 0I, 0I, 0I, 0I} :int[]
bsh % Arrays.setAll(array, p -> p > 9 ? 0 : p);
// Error: Parser Error: Unable to parse code syntax. Encountered: -> at line 2, column 24
bsh % // Error: Parser Error: Unable to parse code syntax. Encountered: ) at line 1, column 15 Both equally incapable of parsing the arrow Where to next:As it stands now it is unclear what needs to happen next. It would be understandable if BeanShell tests are failing because the new AST has not been completely migrated yet and with an example or two we would be able to pick it up and continue implementing the rest. Unfortunately the proposed solution and work done on the congocc branch for which we are very grateful, has yet to migrate to the new parser. So what are we supposed to do with that? |
What is required:As a minimum I think the following two items, if completed, would set us on a clear path ahead.
|
Okay, here is what I would like you guys to do now. There is a file TestHarness.java that I updated just a few minutes ago. (I also updated the congocc-full.jar in the root directory, BTW.) You can run the test harness on any .bsh file and see what it outputs, as in:
Could you try it out a bit and report back? I would also like @opengo to do this as well. Actually, anybody who is interested in development on this project. I'll answer your other points separately after you have tried out the above. Just let me know... Actually, to be clear:
Run this yourself and try to get other people to run it. I'll answer your other points, but after you've verified that you've done the above. |
Output for: $ java -classpath target/classes bsh.TestHarness src/main/resources/bsh/commands/getBshPrompt.bsh |
I notice the new parser has dropped the constructor which takes a Reader, it monitors for input. beanshell/src/main/java/bsh/legacy/Parser.java Lines 7776 to 7805 in 5a9d8d8
We use it for the REPL client, takes input from STDIN. Is there a replacement for this functionality or will we have to write a work around? |
Okay, did you try it on other files? (It would be hard to imagine that you didn't. Curiosity killed the cat and all....) You surely understand what the TestHarness is outputting, right? Also, I'd like to ask: is this the first time that you ran that TestHarness? |
It is true that the parsers generated by CongoCC do not have the constructor that takes a Reader or InputStream as a parameter. This is because of a major refactoring where the thing does not buffer any input. It just slurps in all the input (typically a file) and works on it. It's true that that did (temporarily) screw up this use-case of the interactive interpreter, which is not currently being supported. IOW, we don't have the use-case of blocking I/O working. But it's not big deal, believe me. I'll do the incremental work (not much) to get that sort of thing working again. You see, the thing is that having a Reader or InputStream as the argument does take advantage of blocking input. So, when the thing tries to get the next character from input and there is none, it just blocks until somebody types some more input. But that's not really very hard to get working again, trust me... So, yeah, it's not very hard to get this working again probably, but I tore out that a good while ago, probably having in the back of my mind, to have a better version of this working later. And I just never had any cause to get that working again. Until now. BUT.... you can certainly use the existing parser to parse/run any non-interactive script. So, IOW, there is no need for you guys to be sitting around on your hands waiting for me to get the REPL thing going again. Just do your end of getting the interpreter working with actual script files and I'll certainly do my end of getting the REPL working. And, you know, now that I think about it, this may be the source of some earlier misunderstanding. I meant to tell you this, that there was some work pending from me regarding supporting interactive interpreters. However, this does not really prevent you from moving forward on this on your end. You can certainly test/tweak the new parser against scripts that are non-interactive, i.e. you just read them in and parse the whole thing and run it. Meanwhile, I can get the interactive stuff going in parallel, and actually, once I concentrate on that, it will almost certainly be working better than it ever was, because frankly, the whole existing thing is a bit lame really. And, you know, there is also the potential of using the fault-tolerant machinery, which really needs a good test case to get all the kinks out. Have you seen this article? This is basically working actually, but Beanshell could provide a very good use case for getting it really polished. I suppose, needless to say, that you know that legacy JavaCC has NOTHING like this! (And it never will...) So, anyway, to answer your original question, yes, I need to put back in some functionality to support a REPL, because the parser that the tool generates just assumes that it can slurp in all the input at one go at the start. And that is not the case with a REPL. This is totally doable and the fact that that piece is not currently working does NOT present any obstacle to you guys proceeding with this. And again, the way it is working in the legacy code is not that great really. It's just leveraging the fact that you have blocking in put, so if there is pending input, like the stream is open, it just sits there waiting for more input. We can do better than that. But look, if you commit to getting it going sans REPL, I'll commit to getting the necessary piece going for the REPL. And, really, I think that if we commit to this, we can certainly set a realistic goal of throwing away the older parser by the end of this month. But my main point I made elsewhere was NOT off-topic. To move forward, you do have to have the courage to throw away the older system at some point. It relates to why Hernan Cortes burned his boats after getting to the New World. He told his men that there was no going back. It's a similar situation. I think you have to burn the boats within the next few weeks. |
I figured, which is perhaps the better route to go.
Don't sweat about that for now, something that might be more useful is being able to take an instance of the parser and feed that instance lines. But lets see what happens there might be more pressing stuff. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Have a problem, Tokens are not Nodes. =(
Not sure how we're going to treat these ones. The problem is with the public void setChild(int i, Node n) {
throw new UnsupportedOperationException();
}
public void addChild(Node n) {
throw new UnsupportedOperationException();
}
public void addChild(int i, Node n) {
throw new UnsupportedOperationException();
}
public Node removeChild(int i) {
throw new UnsupportedOperationException();
} |
This comment was marked as off-topic.
This comment was marked as off-topic.
I may also be incorrectly mapping Delimiter, this is not going to be easy.
|
YOU DON'T HAVE TO DO THE REPL!!!! |
Well, you need to get rid of the legacy code as soon as possible. Tokens ARE Nodes in CongoCC. Or, to be more precise, they ARE Nodes in the sense that they implement the Node interface. But the concrete implementation of setChild or addChild etcetera in Token.java is to just throw UnsupportedOperationException, because the assumption is that a Token, being a terminal node, never has any children, so any attempt to add/remove children must be a programming error. And, in fact, that is CORRECT. Tokens are the terminal nodes of the parse tree. At least, that is how any natural treatment of the problem would handle this. The reason that tokens are not nodes in the legacy JJTree thing is because it was implemented as a preprocessor and the guy who did it (I think his name is Rob Duncan) back in 1997 (!) did not feel comfortable changing anything in the core JavaCC, so the Token class was never retrofitted to implement the Node API. So, as for the UnsupportedOperationException being thrown when you try to add/remove a Really, once you get into this stuff, I think you'll see that it all kinda does make sense. But if something doesn't seem to make sense, by all means raise the issue! Are you actually hitting that exception somehow? It is perfectly normal that an attempt to add/remove a child from a Token, i.e. a terminal node, hits an exception.
|
Well, |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Well Delimeter is returned as the next child so it is happy to be a Node when it is not wanted. |
Well, in principle, I don't absolutely have to. But the fact remains that it is currently a flaw in CongoCC that this use-case isn't really handled. And surely Beanshell won't be the only project that needs something like this, so it makes sense to handle this upstream. But also, I do hope you understand that there is no urgent need to get the REPL functionality working immediately. Getting to the point where you can just read in a .bsh file, parse it and run it, that is really orthogonal to the REPL issue. That can be solved separately at a later point. The basic problem that you can (optionally) have an incomplete input buffer and if you reach the end, you block until there is new input -- well, it's really a pretty well understood problem. The way Beanshell currently handles this is just by counting on the blocking I/O in the core Java class. So the read() method just blocks until there is new input. But it's possible to do something more elegant and flexible than that. And as I have said, legacy JavaCC has ZERO concept of fault-tolerant parsing -- error recovery, backtracking... So you're bound to discover that you're opening up so many interesting possibilities by using the more advanced tool. |
Well, it could be that your first pass on an interpreter should just be an instance of As you surely saw, you can just walk the tree recursively and execute any node handling methods that are encountered. But, I mean, if you don't want to do anything with Tokens, well, then just:
Or actually, what I am I saying? If you don't want to do anything with Tokens, you just don't define a handler! It just recurses into the children, but since there aren't any children.... You can also define a handler solely for the Token subclass you're interested in:
So, well, you can "visit" the Tokens in the tree or not, if you want. With JJTree, you can't put in visit handlers for the Tokens, because, as you point out, Tokens are NOT Nodes! In Congo, they are Nodes when it's convenient, but if you want to ignore them, that's not a problem! Oh, there is also a |
This comment was marked as off-topic.
This comment was marked as off-topic.
Looks like this was helpful. Thanx. Ast still all over the place... |
Pushed new grammar with additional mappings added. Here is a modified BaseNode with dump output changes. |
Actually, this probably can't work for this case because I think too much of the logic in the Java grammar assumes that the tokens are in the tree. For example, by default, the tree-building logic will build a node if there is >1 nodes on the stack, and that includes tokens, so it will build a completely different tree if Well, come to think of it, you could change the aforementioned tree-building logic to build a node for every production, i.e.
up top. But I'm not sure offhand if that would set things straight. Probably the better approach is to use the defaults, that the tokens are nodes, and it builds a new node if there is more than one node on the stack. Now, if you really need to massage the tree, you could use the close hook, like:
So there are various options. But, probably, in the end, you're running into problems trying to keep the old system working and the new alongside one another. At some point, it will be simpler just to cut loose from the older system, I suppose. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I solved it by filtering the children collection and using
Yip it's broken =/ |
This comment was marked as off-topic.
This comment was marked as off-topic.
We need to resolve these grammar issues, please I need your help: Assignment, PrimaryExpression and a missing AmbiguousName: String Here are some more examples. Expected: 1+1;
Actual:
Expected for: System.out.println("Hello Congo");
Actual:
Completed ambiguous names and the operators now which reminds me I have a question about that will make a separate post. Here is the getBshPrompt.bsh now Expected: src/main/resources/bsh/commands/getBshPrompt.bsh
Actual:
Expected: Integer.parseInt("1");
Actual:
Expected: src/main/resources/bsh/commands/pwd.bsh
Actual:
Expected: src/main/resources/bsh/commands/exit.bsh
Actual:
Expected: src/main/resources/bsh/commands/error.bsh
Actual:
Oops!! =) The major hold up is beanshell/src/main/jjtree/bsh.jjt Lines 789 to 800 in 47cbe83
And the beanshell/src/main/jjtree/bsh.jjt Lines 973 to 978 in 47cbe83
|
RE: I know how to get the enum type constant from an operator |
New commit 6098986 updating changes. |
Okay, well, I assume you know that the Java
can't be parsed because the
I tried to adjust things so the cases where Beanshell is looser than regular Java worked, but I somehow neglected this one. It could be that the solution would be to redefine the
to redefine it in your grammar to make the
However, I have to admit that I'm not 100% sure that this wouldn't have some implications elsewhere! You'd have to try it, I guess.
As for As for
Both the above look feasible, but my tendency would be to think that the second approach will have better (more robust) results. By the way, there is a feature I am considering adding (actually much certainly will add) that could be useful to you. At the moment, you cannot redefine the tree-building annotation on a production without copy-pasting the entire production. So, let's say, for example, you had:
So you build a new node if there are more than 2 nodes on tree-building stack. Otherwise, you leave them there. Now, suppose you want to change the tree-building annotation, let's say to But that currently isn't implemented. It's on my mental TODO list. Well, I think that's all I can answer for now. I'm glad you're getting into this, Nick. I would say that even if this is a rough slog right now, at some point, the whole thing will unravel, kind of like doing a crossword puzzle or something. You get past some key point and the rest of the thing just kind of falls into place. And I think you will end up seeing that getting this transition done is not the multi-month project that you have thought it is. Though, I could be wrong, but you would have to accept that I have more prior experience with this sort of stuff -- transitioning grammars and adjusting the API and all that. So my point estimate on how much time this entails is likely to be more accurate. But I could still be wrong! |
Well, if you have the operator object, then the string representation of the operator is just |
Wow we clearly have a huge communication gap, happened several times yesterday as well. I am always eager to blame myself as obviously I did not express myself clearly enough otherwise you would understood me. Reading my question again and again I am unable to see how saying that I have the token constant and want the operator leads you to understand I have an operator and want an operator. You are trolling me right? Since I am desperate for an answer I have to repeat the question again. I have a token constant, for example JavaCC provides an array lookup under |
Actually it's the other way around, the classes can already identify correctly interpret multiple types of similar form, what we need now is to massage the congo grammar to also label the same content for us as before. This still leaves us with a ton of work, all the data injections need to be reimplemented, the node traversal has changed, with or without tokens as nodes, and the data types have changed of which Operator and TokenType was hopefully the biggest, which is done. Every one of these fixes requires research and investigation to find a solution which is very time consuming. But at least each class will remain contained within their specific domains along with the flow diagram between the objects. Ripping the logic apart to accommodate the AST will be a futile effort, and humpty dumpty will never be put back together again. Which is why these are raised as show stoppers and why I am begging for your expertise to assist. If there is one thing which this migration can prove, yes congo is opinionated, abandoned backwards compatibility, with many unique and fantastic new features, that it is flexible enough that BeanShell could migrate without changing their existing logic. That seems to me like a worthwhile goal to strive for. Making this an accomplished case study worthy to follow with your own javacc projects. |
No, absolutely not. I found the question perplexing. It honestly doesn't occur to me a context where you have the TokenType but don't have the Token. If you are storing the tokens as terminal child nodes, then it's something like:
You see, you have to understand that I am just so accustomed to the idea that you have the Tokens themselves as terminal children in the Node, so I would always be able to write:
assuming I wanted the string image of the token in question.
Yeah, I remember that now. I guess I removed it as some point because I didn't think it was that useful. I think, BTW, I think that that tokenImage thingy was mostly used to generate error messages, and since I wasn't using that mechanism any more, I eventually got rid of it (rightly or wrongly) because it just looked like code bloat. But look, if you really want a lookup for TokenType->String, it's easy to have it. You could just inject this somewhere, most likely into Token or BaseNode:
But, you know, when you have OR... just store the string as a field in the appropriate place.
|
Here is an alternative pattern:
and then in the appropriate place in the grammar:
Something like that. As you doubtless surmise, the
i mean, this is a much more powerful and you'll see that there are many ways to skin a cat, once you really come to some familiarity with the feature set, anyway. BTW, maybe I mentioned this, I have the new C# 11 raw string literals working. (In the C# grammar obviously.) See here. I don't honestly think that legacy JavaCC is powerful enough to express this. (Particularly the interpolated raw strings.) Well, it might be possible to get it working, but the implementation would just be something horrific. The main implementation with Congo is pretty clean but some messy details are hidden in a TOKEN_HOOK routine. Well, there, I'm just showing off, I guess, since this is not directly relevant to anything you're doing or were asking me about. Well, I guess I do want to convey how much more powerful the tool you'll be using is. It will be an especially good situation if you manage to stay on good terms with the implementor. Like, you know, welcome to real software development. (As opposed to nothingburger-ism. Oh, and by the way, I did experiment with tweaking the But I mean, if you just insert this:
into the Beanshell.ccc, it does seem to handle certain Beanshell constructs that it didn't handle before. It just makes the |
Well, it's some work for sure. This isn't for the work-shy! But I'd say it's still more or less moderate. You can quantify it. More or less. There are 42 bsh.BSHXXX classes, right? Though, some of them are quite trivial. My scheme for transition, which I thought was pretty clever, providing a basis to migrate, was to generate Node classes that are subclasses of the corresponding BSHXXX classes. So, in principle, they have the legacy functionality inside of them. And so, during an initial stage, you could keep the old system working and gradually get the new parser to replace the old parser... BUT... gradually That's the idea basically. So, as best I can figure, the best approach is that you pick the low-hanging fruit first. For example, the most simple thing to interpret, I guess, is just a plain variable, like:
And that's probably basically just a hash lookup. Or maybe multiple hash lookup. It looks in a hash that represents local variables and then failing that, checks the more outer scopes. More or less.. Right? Well, I guess evaluating But, in principle, the Node generated by the newer parser should be able to do that stuff pretty easily. One would have to think that switch-case or a for-loop is more difficult to deal with. But the basic approach should be clear, I think. But the thing is, I guess, that in an initial stage of the process, you find yourself adding code to the BSHXXX classes to deal with the fact that you either have that class or are in a generated subclass. But then, you'll finally reach an inversion point, where you're not adding code any more. And then when you're confident that you have the new parser working, then a whole bunch of old crufty code just melts away, because so much of what you have is a kind of scaffolding that was necessary to keep the legacy system working. So when you switch entirely to the new system, a whole bunch of scaffolding code isn't needed any more. And, you know, that kind of thing can be very (very very) satisfying in the end, when you reach the point where you're largely geting rid of the old scaffolding. Because, after that, you reach this point where you step back and just look at the code you now have, and you'll see how much better structured it is! That extremely satisfying moment is kind of like orgasm or something. (Except that it lasts longer!) That's what happened when I was refactoring/rewriting all the legacy JavaCC code. There would be some part of it that was just horrific and I'd manage to come up with an incremental process to replace it, but you reach a point where you look at what you now have, and it's just... well... like the last piece I rewrote was the regexp part. And I swear, after I got that rewritten, at times, I'd just look at it and it was like... "man, this is fucking beautiful". (Sorry for my language.) But the thing is that you can't reach these incredibly satisfying points if you approach the code with some sort of ultra fearful attitude. There has to be the courage and belief in oneself to get in there and hack away. I think that once you hack away at this some more, the fog will clear and the path foward will just get much clearer.
Well, correct. Basically. But really, the word "opinionated" is a bit loaded. It implies controversy. Actually, the things that Congo is opinionated about are not really very controversial for the most part... fields should be properly encapsulated.... All those static final int things should be type-safe enums, post-editing generated files is a no-no.... That last point I mention should not be controversial, but I guess sometimes it is, because people get so accustomed to a totally f'ed up situation, that it seems normal to them that you post-edit generated files! But, when you have
Well, one point to bear in mind about this is that what we are doing now is attempting to leverage the existing Java grammar that is built into the tool. We're doing BUT... there is a lot more long-term gain if you succeed in (largely) reusing the existing Java grammar. Because, obviously, for one thing, you just immediately have all sorts of things like Lambdas and Annotations and all the rest of it. And also, you free yourself from the need to maintain all these things yourself! But the transition is more difficult. Well, also, I have to admit that Beanshell is much more different syntactically from standard Java than I had initially thought. So, when I initially volunteered to do the initial work on this, I really thought it was much easier than it turned out to be. BUT.. live and learn. And it's a good thing to meet difficult challenges. |
@revusky In the absence of seeing any commits from you, not wanting to rush to conclusion, to be absolutely clear that everybody understands exactly where we stand. You have no intention of fixing any problems or shortcomings with the work you have delivered? |
Okay, I did a few things since you wrote the above. At that point in time, the parser I gave you parsed about 62% of the scripts in src/**/*.bsh. I honestly thought it was higher than that, but I quantified it. (I certainly knew it was far short of 100%!) But, do understand that I put far more work into this than I ever anticipated when I made my initial offer. Originally, when I made the offer up top, I thought it would involve a day or two. Maybe three days at the most. Then I spent a much longer time on it. The thing is that, when I made the offer to get the new parser working for you, I didn't realize just how much the Beanshell syntax diverged from standard Java. I must have thought that it was just a question of including the standard Java grammar and then making a few adjustments to have a Beanshell grammar. In any case, after the final round of work on the parser, it really does parse just about everything. There seems to be one legitimate parsing failure (relating to the ELVIS operator) but I leave that for you to fix, since you really know how it should work, and really, to be honest, I'm ******* tired of this now. So, here. Do this. (And this is addressed not to just you, Nick, but to everybody in this community.)
Actually, I would suggest that everybody reading these lines execute the above magic incantation in a command-line shell and report back success/failure. So, you or anybody can pose any question about this, including even questions on the order of: WTF is this? Now, I have to say that my position is that this ends my involvement with all this. I have to say (I actually don't have to say it but I will...) that if you guys are handed off this and can't pick up the ball and run with it, then maybe you should find some other branch of activity to devote yourselves to. (Maybe professional tiddlywinks or something...) Now, I don't want to get in some kind of amateur lawyer conversation, but if you look up top in this conversation, I wrote on 4 January (3 months ago +) the following:
With what I just handed you, your parser (I say your, because you now have to take ownership of this) can handle all sorts of things that your older parser does not.
(All the above is just off the top of my head. I might be forgetting a couple of things. Oh, I did forget to mention that the machinery is there to generate a fault-tolerant parser. That's actually a biggie. Legacy JavaCC has basically zero concept of error recovery or backtracking. (And it never will!)) NOTA BENE: When I say above it can "handle" these things, I just mean it can parse these constructs. Obviously, if you want to handle them in a real sense, i.e. the semantics, not just parsing, you have to write the code. This, after all, is just a parser, which is all I ever offered you. Anyway, to answer your question, Nick, I think this is just about it. I am now really just handing this off to you. And you have to run with it now. I've done what I promised to do (the parser) and it is surely pretty obvious that I never promised to adjust all your client code to work against a somewhat different parse tree. I just committed to giving you a parser. Now, I would also add that, though I have no intention of doing any more work on this, our community is certainly available to answer any questions. But really they should be specific questions, where you tried your best to figure something out and need a leg-up. (Nothing wrong with that.) Preferably any such questions should be asked here. That's my preference anyway.... So, this was a belated Christmas gift, I guess. Christmas is now 8 months away, at least by the Gregorian calendar. So, kids, behave yourselves and maybe Santa will bring you something next Christmas. But, for now, I have to get back to my own work. I hope I've made myself clear. |
Wow, ok! Was going for a yes or no answer, I really didn't expect you to jump in and finish up. But much appreciated, many thanks! Awesome!!! You rock!!!! Not sure why you are so adamant to be done with this, in fact you haven't actually answered my question. You say it was more work than expected, and I certainly won't deny that, but that has nothing to do with it. It is not like you are scared of work, nor did it bother you while you were working it. You did twice remember, once on the congo branch attempting a walker, until the light bulb lit up and you started again with extended model on the congocc branch. You obviously won't be expected to do semantic stuff, you made that clear, but the threat of having to work on BeanShell code is not the reason either. On the congocc branch you reimplemented SimpleNode which is at the very heart of BeanShell, and had no problem doing migrations on several of the nodes you started mapping. That is great but your time is best spent on congo stuff, which you know. I don't know, and even simple migrations like where am I going to get the operator, takes trial and error, meaning time. Implementing the new parser, from the TestHarness example, almost took no time. You certainly can't say you are not invested, there are volumes in this thread alone that says otherwise. A bystander might say it is because you already made so much effort but it goes way beyond just the payload. You are actually eager to see it work. I bet it crosses your mind often, something akin to that song you just can't get out of your head, and why shouldn't it? Congo is meant to be implemented and BeanShell is pretty neat implementation. Something you can be proud of, even if you had no part in this project. I think you were invested long before you even thought of contributing to BeanShell. When I suggested to add you as committer to the project, it never even crossed your mind. Lol if I recall it took another week or two for you to come around to the idea that it would probably be easier to have commit access =) The fact that you were invested, is what planted the seed, which grew uncontrollably, prompting you...no compelling you to start this thread with: I'll do it for you. So you are done hey? But in the meantime, will everyone who comes across this post please run the test, and let us know if it breaks. It is certainly not out of hubris, nothing quite like writing code to keep a man humble, if the last commit shows anything it is an ability to accept imperfection. We are getting close though because this "thing" does overwhelm the desire to strive for perfection, at least to some degree. Not more important, something else... One thing is for sure if someone finds a bug you won't be able to resist. =) Whatever it is doesn't really matter, I just don't get it. It is an important milestone for congo, it really can't fail, it is that important. Personally I prefer working with someone, perhaps you don't, we motivate each other, stick it through the grind together, share the trials and tribulations, and someone to celebrate the small things with. Only a teammate understands, you can try and tell others who may even be able to relate, but they just won't get it. I can't say it's fun when you're acting like it doesn't matter: you can try this, or maybe that, it may cause other issues but hey you should try it, it's really up to you, you know what completely rewriting BeanShell over from scratch to work with congo is going to get rid of the scaffolding and produce the best stable code. Ahh go f... Maybe it doesn't matter? It just isn't a priority for me right now, I would make time to work on it with you but on a list of things I want to do myself, it doesn't rank very high. I will definitely do it at some point, and we have made enough progress that starting from scratch with something new or even attempting to hack the javacc grammar to fix a bug or add new feature wouldn't make much sense. It hasn't been a waste, we did accomplish substantially. It will get done eventually, I just wouldn't hold my breath if I were you. |
Output from TestHarness:
Which is probably what everyone gets. Busy running more tests and will report findings... |
BeanShell script source .bsh parser fails on
Can be treated the same as BOOL_OR
Should just work once NULLCOALESCE is understood
Allows an optional
|
Yep, that is no surprise. Thanks for reporting back.
Well, I think it probably is what everybody gets, assuming they try it! But, as seems to be the usual situation, the only person who is reporting back is you. Why is that?
Here is another point about this that I forgot to mention.... This new Congo-based beanshell parser can also parse regular Java. For example, if, in the above, you do:
(i.e. run it over all the Java files instead of all the .bsh files, it parses them all. And that includes at least some new syntax with lambdas and so on.) I've tried it over quite a range of test input, by the way, and it is not perfect. It fails in certain cases. For example, there was a case, kind of like:
and that was failing. It took me some effort to figure out why. Here is why. The "formal comment" above, which is the "and now for something completely different" is actually taken to be a Statement! But what that means is that in the case above, the if statement terminates because there is a new "statement" after the first block. And then the I had resolved not to put any more effort into this, but just a little while ago, I addressed a couple of other issues I had become aware of. Due to an oversight on my part, it was not handling hexadecimal floating point literals. (That is something that I never used in my life anyway, but...). It also was not handling I found the above issues running over some Java standard library code. (From JDK 19!).
The |
I know it's failing on those things, but I decided to let you fix it. |
JAVA source .java parser fails in several scenarios Beanshell.ccc.exp.but.fnd.txt - Was expecting Found report fail.zip.txt - Zip archive of java source files unable to be parsed with Beanshell grammar |
I'm good thanks, you can treat them like BOOL_OR |
Java needs to parse, even more important than bean script syntax
Not my idea but I did fix it in jjtree grammar.
and
Should let them through...
I knew you wouldn't be able to resist =)
Also binary and octal...
I never knew about the underscores
|
I looked at this a bit. For starters, there are 969 .java files in there but 417 of them are module-info.java files. There is no reason to think that the test harness I handed to you deals with those. (It could, since I have the grammar to parse them but it doesn't because the test harness just assumes that the input is a series of Beanshell/Java "statements". So it's automatically failing on all these module-info.java files! So that leaves 552 Java files. My own (really quite robust) Java parser is failing on 171 of them, so I think those ones are pretty much all invalid Java. I suppose you mean they are supposed to be valid Beanshell. But, you know, as far as I can tell, there is no really formal spec of what is valid (or not) in Beanshell. By the way, you know how to test the Java parser that is built into Congo?
You can see what the original Java parser can parse or not. My point here is that the Beanshell parser I gave you is basically just hacked/modified version of this Java parser, so it is, by and large, going to fail to parse things that the actual Java parser rejects. This is going to be the case unless there is some specific adjust for it to accept things that the Java parser does not... Or, to put it another way, I guess you could say it is a design goal that the Beanshell parser should accept standard Java AND in addition, some other things that are not standard Java. But it doesn't seem to be very well specified what it is supposed to accept or reject. What I'm handing off to you is based on the sample of .bsh files that I had to use as a test suite, i.e. what you get from |
Did not look at the exceptions only ran the parser through all the java files on my system, copied the offending files to a faults directory, each into their own folder as to not mistakenly overwrite duplicate filenames. It is certainly easier to now only run against the 969 known to be offending files instead of the 10 minute test to parse everything one by one.
Not sure what you consider to be valid JAVA, for BeanShell we require everything which the JAVA compiler accepts to be parseable at the very least, even if not functional. I have no reason to suspect that any of the offending files would not compile based on erroneous syntax, they all come from existing public projects. Certainly not to the degree suggested by 171 of them which you say the standard congo JAVA grammar chokes on, this is disconcerting.
Don't know why you are suggesting this, the only motivation for moving to congo is because it can parse valid JAVA and promises to continue to do so as new language features are added. Do you suggest to say that this is an unreasonable expectation?
I would rather use "valid" instead of "standard" in so far as valid JAVA syntax is accepted by the JAVA compiler, the latter could suggest conforming to a code style and is not readily verifiable. The "other things" are mostly just much looser JAVA
I have to assume you think you are helping but you are not. The purpose of this project is to migrate to congo, and there is enough outstanding without you adding to the workload by thinking up "exercises" for me to do. I can just as easily learn by looking at what would've taken you only a minute to implement, how is me wasting time figuring it out helping? If only avoiding parse errors was the goal it is setting a very low bar. Then it is easy to take shortcuts like ignoring beanshell/src/main/congo/Beanshell.ccc Lines 541 to 543 in 53b0ec6
The bottom line is, as long as we are unable to produce node instances which at least resemble the structure that BeanShell expects to find, this project stays dead in the water. But 10 points for effort, well done. |
New commit b0232a5 Parser now captures PrimaryExpression nodes. Now even parses the syntaxerrors.bsh script lol Expected:
Actual:
Getting closer... |
Here's the proposal. I'll do it for you.
By that, I mean the migration to JavaCC 21, specifically adapting this file. Since most of the file is devoted to parsing what is basically a Java grammar, and with JavaCC 21, there is an up-to-date Java grammar that you can just
INCLUDE
, this would take quite a bit of burden off of you. For example, issues like supporting lambda expressions, #675, at least as regards the parsing side (not the runtime side, I grant) will be taken care of, because you can automatically parse (and build AST for) anything in the current Java language. In reality, the current situation where you (or any broadly similar project) have to maintain this yourselves just doesn't make much sense. I made that comment in this article: https://javacc.com/2021/03/01/reference-java-grammar/The only thing I would ask in return is a serious commitment that, once I do the work, you review it and starting using it ASAP. You do understand that if I do this for you and then you don't use it or possibly even look at it, then that's kind of a crisis. You'd be putting me in a very uncomfortable position.
Oh,I should mention that another thing that you will get for free moving to JavaCC 21 is a much more serious treatment of fault-tolerant and error recovery. You might want to consider this article: https://javacc.com/2021/02/18/the-promised-land-fault-tolerant-parsing/ and you'll realize that the legacy JavaCC has simply zero concept of error recovery really. Meanwhile, the stuff that that article describes is all implemented in JavaCC 21. That said, there may be some glitches, but for that very reason, I would love to have a demanding "customer" that can help me get all the kinks out.
The text was updated successfully, but these errors were encountered: