New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some general questions/suggestions #1702
Comments
Some simple issues/bugs to clarify my point: Issue1:var _tokenizer = __BRYTHON__.tokenizer("\rtest");
var a;
while (a = l_tokenizer.next()) {
if(a.done){
break;
}
console.log(a.value);
} The 3rd token which is the name token prints:
which is wrong, the name should be 'test'. (The fix is quite simple just remove this line brython/www/src/python_tokenizer.js Line 128 in 048601d
Issue2:The line property is really broken as you can see for the previous example (after fixing the bug the line is still wrong, almost every time the first character of the line is omitted after the first token on a given line) Issue3Tons of variables are unused but still reside and are somehow confusing. Lines 10937 to 10955 in 048601d
the tokenizer never throws a $_SyntaxError why even test against it then?
Issue4py2js.js is 99.9% py2js_tokenizer.js with few debug functions. Why not implementing a logger or a simple flag and merge these 2 files into one? Or there is any purpose of doing this? This is all just by quick review of the project, but overall interesting one. |
From your original post, (1) directly translating the Python tokenizer module to JS, and/or (2) using ANTLR for parsing seem like reasonable ideas to me. But these are big projects. Are you interested in implementing them? @PierreQuentel may have more thoughts, since he just did an overhaul of the tokenization code. |
Interesting questions, thanks ! 1 - there are 2 tokenizers in CPython, the pure-Python module My first idea was to convert the pure-Python module to Javascript using Brython and optimize it. Unfortunately, as often, it has dependencies : 34 other modules need to be imported to make it work... There are probably very few parts of these modules that are actually used, but idenfiying them and replacing the imported code by Javascript equivalents would have required at least as much work as writing a Javascript implementation from scratch. Another option would have been to use the C implementation, but, even assuming that it's possible, I don't know C enough for that. This is why I decided to write the Javascript version in To sum up, unless someone can come with a version derived from one of the CPython implementations - you are right, it would certainly be 100% compliant - that is at least as fast as this one, I prefer to keep it and fix the bugs that you or other developers would report. Many thanks for the ones you reported in your second post, I will fix them ASAP. 2 - and 3 - I didn't know ANTLR. It is mentioned in PEP 617 as an alternative for a new Python parser and it was not selected. It seems that it's an LL parser, and I wonder if it would be possible to use it for new Python constructs such as the "soft keywords" Anyway, I actually gave a try at implementing a PEG parser, based on the simplified Python grammar and the left recursion algorithm referenced in PEP 617. The result is in The next step would have been to generate an AST from the tree produced by this parser, based on the complete grammar. I must admit that the amount of work needed for that frightened me... and I suppose that this step would probably be necessary with ANTLR. For AST, another option might be to use the tree built by the current implementation by 4 - If I started the project now, the choice between Javascript and Typescript would not be obvious. But it started in 2012, I'm afraid it's too late to switch to another language, however close to JS as it might be. |
The commits referenced above fix most of the issues you have reported. I didn't understand Issue 3. The tokenizer throws Javascript |
The tokenizer in python_tokenizer.js throws 4 exceptions, 2 of type Lines 10940 to 10943 in 048601d
By the way I come from C/C++. C#, Java even Python, Lua etc... languages background, I know Javascript of course but I prefer Typescript or the new ECMAScript standards with better OOP because the dynamism of Javascript can be heaven or hell, nothing between. So for small webpages nothing is wrong with JS, but for larger projects I believe Typescript is better at least from structuring point of view, at least won't compile anything you throw at it. The way I am finding such bugs is that I am analyzing the project as much as I can now because I need such functionality, not for the web necessarily. By the way, I am finding many other bugs or possible ones which I will report soon. If somebody is willing to review PR's I can try a shot on providing a transpiration of the tokenize module.(maybe some experimental branch with CPython ANTLR stuff) |
Issue3 is fixed - you were right, I probably removed the attribute |
After some little digging/searches, I found this project: Skulpt which also tries to transpile Python 2/3 to JavaScript. It has most of libs unimplemented compared to Brython but it has a So basically each of these projects generate the ast module from an asdl file and implement a compiler as a node visitor, by doing that they avoid the hassle of maintaining handwritten code (the ast module they auto generate with a python script which parses the asdl). |
Thanks for the pointers to the Skulpt scripts, I didn't think of looking for this. What you suggest makes sense, obviously, but it means a very radical change in Brython code - probably an almost complete rewriting of py2js.js. I would suggest that you start a separate project, relying on the new PEG parser, with Javascript code to generate a similar AST to that of CPython 3.10. I and other contributors would follow the project and we could discuss its progress and a possible integration in Brython. What do you think ? |
Yes exactly my thought. This needs to be done in a separate branch or a completely separate project and see how things can go from there. So I have no problem with starting a new project. |
Skulpt is also currently experimenting with the pegen parser which is skulpt agnostic and so it may be useful for Brython. We're very much in experimental phase right now. (Full disclosure - I'm a skulpt contributor and working on the parser) |
It's more complex than that :-( In short :
|
Yes I see now, by the way I am kind of transpiling the whole CPython parser/tokenizer and about to do some tests with it. When I am done I will create a repo and let you know, if Brython could make use of it somehow. (It is a straightforward port but compared with Brython's parser, it is bigger but with some obfuscation/minifying it could be reduced). |
For your information, the latest Brython version (3.10.4) includes the ast module, which produces the same results as the CPython module. |
Hi, it is been a while. Actually I remember writing a big chunk of a parser using CPython as base code, but for many reasons I had to abandon the project before polishing it. So now the parsing is done internally using the CPython's compliant parser? Because it would be much easier to write a simple visitor around it and generate the JS code. All and all, great news for such a promising library. |
By the way, does |
I am not too sure what you mean by "separating code into modules" but if you mean different Python modules that can import each other, possibly in a package, yes this is supported. I am currently working on an engine that generates Javascript from the AST representation instead of the Brython-specific tree, in module If it succeeds, it will make the Python-to-JS engine much easier to understand : since version 3.9.4 the tokenizer produces the same tokens as CPython, since 3.10.4 the AST tree is also the same, and both are documented in Python docs. The generation engine will also be much shorter (notably, no need for the |
Yeah I meant
Exactly, what I think. It would be much easier to write a visitor which generates |
Since release 3.10.6, the Javascript code is generated by the Python-compliant ATS. This is done in I have written a PEG parser based on the standard Python grammar to generate the AST, as described in PEP 617. It works, but it's much slower than the current Brython parser... More information in this post. I think we can close this issue. If you prefer to leave it open please add a comment. |
Hi, great and interesting project, just few questions/suggestions:
1- Why you don't use the default python tokenizer module converted to Javascript because it is much simpler and battle tested, the current implementation is a bit bloated, plus there are many bugs which I can of course contribute/report (mainly related with error tokens and the start/end range).
2- Why not using ANTLR to generate the parse tree (ANTLR python grammar is already defined and battle tested and the Javascript or python runtimes are also available+ possibility to interface any python version like 2.x which is out the scope of this project + possibility to get inspired from other projects like Jython) and thus the whole project will consist of a visitor which will have all the
transform
methods migrated to.3- Implementing 2 will give an easier out-of-the-box implementation of the ast module which is unimplemented currently.
4- Why not using a safer typed language like Typescript and divide the transpiler into smaller chunks using better OOP paradigms(an ultimate goal is the whole project written in python and which it can transpile itself to Javascript)
The text was updated successfully, but these errors were encountered: