-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A better idea. #1
Comments
Thanks for the ping! Looks interesting, I see that it provides some utilities for AST transformation, will need to look in more detail into that.
Not exactly. You describe your package as: "Allows one to use/create custom syntax extensions for python as simple as possible." But I would like to do even more general things - being able as easy as possible import absolutely anything into Python. And I'm making headway with that idea. I hope you're familiar with https://github.com/dabeaz/bitey - it allows to import LLVM bitcode as if it was a Python module. So, here's a quick remake of that in my import framework: https://github.com/pfalcon/ubitey/blob/master/ubitey.py Example how it works with Pycopy:
As you can see, the hook is installed with:
The import hook itself is:
Some time later, I'll be looking into custom-loading specifically .py files, and it will be interesting to see what kind of AST utils you come up with. But in my version, those would be in separate package. Because, well, you don't need Python AST utils to load LLVM bitcode. |
But isn't this also perfectly possible with adding a new meta path finder? In my understanding you wanted to actually replace the import of normal modules, which is what this package is supposed to do. I would also strongly discourage implementation having to manually call down to the next Note that regarding AST utils, I am still trying to decide on how to best allow modification of the AST:
|
Kinda. But as you remember, my primary usecase in Pycopy, and that doesn't have any "meta path finders" and alike mumbo-jumbo ;-). Back to a new meta path finder, that's how the original The above cases shouldn't happen often, but that only aggravates the situation, for when it happens, an average user will have totally no idea for wtf it works like that, and will end up just cursing CPython's import hooking system, which is total mess. And I'm with an average user here ;-). That's why in my |
That's why I posted to https://mail.python.org/archives/list/python-ideas@python.org/thread/NIQQVA3OJFUHL3INBMBUBMTP24W74XEO/ , to get a feedback like that. But I got none relevant. While you're nailed it right on the spot at once ;-). Right, I agree. I actually wanted to implemented it like that as native Pycopy API, but when I didn't finish implementing it yet, it was already +0.5K to code size (x86 32-bit) of Pycopy. While this "call recursive yourself" impl was +250 bytes. As import hooking is adhoc feature, I decided to go with the most minimal impl, and provide "more streamlined API" as a separate Python module on top of it. So, now you know all the dirty details of my plan ;-). |
That's exactly what I can't much help with, and what I'm wondering myself either. So again, I'll be interested to see what you'll come up with. As you can see, I'm a bit behind on things, still fighting with basics on how to do import hooking right, so can't even offer a good discussion now. I however could say that I definitely would start with existing transformation means, e.g. ast.NodeTransformer. If that appears to be too low-level/too cumbersome, then look for "utils" for common usecases (would require to enumerate, what are these common usecases!). Another approach, "from the top", would be to see how to let user define "macros". Reading existing proposals, e.g. https://www.python.org/dev/peps/pep-0638/ might help with some ideas (including the bad ones, e.g. "meta path import hooks" above goes in the "bad ideas" department for me ;-) ). |
I now even know how it will be called: Which returns us to |
This works perfect if you already have a predefined AST which can represent the syntax. But this is exactly what I don't want. I want extensions to actual be able to define new syntax structures like the PEP-634 Pattern matching. This means I have to somehow let extensions define their own AST nodes and how to get them. The current 'parser' just generates a slightly structured version of a Token list. |
This is also a problem I will run into. My solution is to simply replace the entire Path based meta finder and replace it with my own that does the corresponding preprocessing required for this project. If that is done, loading none python files at the same time where the python file would be loaded of course is not a problem. What you want is a step between meta hooks and import path hooks (Note that your name is almost already take. So be aware with naming). While I do not agree that your solution is perfect for the more general CPython, it is probably the best for pycopy. |
Note that after rereading the import specs, for official python you just have to subclass |
Kinda same for me, except I do much finer-grained monkey-patching. And you need to do monkey-patching, because importlib isn't really designed with extensibility in mind: it either allows you to do add som completely new handling, but fine-grained overriding of existing behavior is hardly supported. In my case, I don't to add or replace any meta-finders. I don't want even to replace any path-finder. I just want add new loader to the existing finder. And that's more or less what I do in https://github.com/pfalcon/pycopy-lib/blob/master/cpython-pycopy/pycopy_imphook.py . If you have better/cleaner idea how to do that, while preserving the proper import semantics as described above, I'd be interested to look.
How I design these things, is look for the behavior needed, then adjust for code size constraints (one of the big ideas is "what can be done in Python, should be done in Python"). I don't agree that these things are "more general" in CPython. They're clearly overdesigned (in a Java way), and on multiple occasions, the CPython core developers spilled the beans that they do, and keep, these things complicated way to preclude many people to use it. Which is exactly the opposite to our aim here (well, my aim for sure). |
This now has been posted too: This effectively contains the same code (adjusted of course) as linked above for the CPython case - I decided to go this way instead of doing layered implementation on top of |
These seems to be 2 orthogonal matters, let's go over them one by one.
That's the reason why many people follow "munge with new syntax on tokenizer level and convert it to existing syntax, to let existing hardcoded parser do its job, and then pick up AST and do 2nd part of the needed transformation (where actual handling of new features is implemented)". And it doesn't seem that bad. Like, it's a hack, and I (just like as you, it seems) would prefer to have a subclassable parser, but it's a matter of tomorrow. While token-level hacking is a matter of today (and yesterday, i.e. it's fully backward compatible, and compatible across impls). I personally made myself to "feel people's pain", and try to do various transformations on token stream level. Again, there's definitely a feeling of "ugly hacks" present, but again, it's not that bad. Likely, just a good toolbox to deal with that is missing. After all, it's effectively parsing too - just not complete parsing of entire token stream, like with normal parser, but pattern matching to find interesting syntax pieces, then partial parsing of them. And you need to generate not AST as with normal parser, but token stream, just as input. Well, that's what I wanted to ask - do you know good Python token stream manipulation libs/tools? |
My solution for import hooks is here: https://github.com/MegaIng/syntax-extensions-base/blob/master/src/syntax_extensions/activate/import_hooks.py . It is similar to yours, except that it replace the handling of |
No. But I also don't want that. It doesn't really help with my case here. This is just replacing on hack with another hack, which is more complicated, more error prone and a little more powerful. I want to actually create a subclassable parser as you called it. And here are different paths I had in mind:
I will probably go with Option 2, but that means that this project will be dormant for a moment |
(Couldn't you use |
Thanks. I trust your word that it's "similar", because to me it looks "sufficiently different". Which again just shows what the convoluted mess CPython's importlib is...
But do you know that if you want to override loading of .py files, it's literally 3 lines. Of course, 3 lines of monkey-patching. |
Good question. The apparent answer "that module is not in my usual toolbox". Live and learn, they say (then forget the noise again, and I wonder if the "runpy" module falls under the category of "noise" ;-). We'll see.) |
Same for me. I run accross this because I was wondering if there really isn't a somewhat simple way of writing a wrapper script (e.g. like |
These lines do almost the same job, but not quite. They also use deprecated features, and might not work on PyPy. My solution should be within the python spec and PyPy compatible. (your solution is also not within the CPython specs)
We are both having the same steps:
I also have a few have a few extra features: |
I see, this seems to be half-way between "partial token stream pseudo-parsing" and full-fledged recursive-descent parsing. In my experiments I also concluded that beyond the very simple transformations (like replacement of a single token with 1+ another), the best way is to accumulate the whole line of tokens (until token.NEWLINE) and then match on it. You go beyond that and accumulate whole "Module" of "Statements" of "Paren" expressions. Of course, that's still rather adhoc, when you use real recursive descent, you don't need to do such buffering (unless you really need to do "unlimited" look-ahead).
Oh, reminds me myself from 3 years ago: erezsh/plyplus#46 (that's another project from the same author as lark). But as you can see, even then topic was more like "Lark seems to do what I can't [so far], but can I do it my way nonetheless?" For myself being an adept of the sect of Witnesses of Recursive Descent. I'm sure you're familiar with such. The problem of EBNF is that simple things are (relatively) simple, and more complex things are usually impossible. And it's only "relatively" simple, because even if you have an EBNF concept in your head (everyone has), expressing it in a DSL of a particular tool is a separate task still. And there many such tools, and there's no clear leader (well, All these problems don't apply to recursive descent - its primitive operations are well-known, and no matter how you call them, they're immediately visible. And simple thing are even simpler than with EBNF (no DSL), and many more complex thing range from "trivial" to "possible". So, since those noob lark questions, I wrote my recursive-descent parser for Python, as part of
Just to make sure that you have these options in the dark corner of your toolbox too as possible sources for Python parsers, in addition to Lark:
The last option would be actually the easiest way to get a Python parser which follows the latest CPython version (including whatever madness they may add to the language based on PEG NP-completeness). |
Yeah, I noticed that. That's an interesting (hmm, boring?) practical problem of macro-like processing:
@markshannon doesn't elaborate beyond that how exactly that "version" (versions!) is used in .pyc. Likewise, it's a mystery why it's "integer version" and not just an automatic mtime of a file contaning the macro definition. I'd say, let's wait and see till (C)Python scientific thought comes to an idea that single .pyc may depend on multiple source files (or should that be "source modules"?), and changes to them should be tracked similarly to how "a single source" of a .pyc is tracked now. |
Only as long as the extensions/the library is in development. Afterwards, only changes in the source file should be that relevant, since the extensions are listed in the file. That is why I made it an optional mode, to be used when developing, but deactivate when just using the extensions |
But sadly, this isn't enough. If I would have to use such a preexisting tool, I would have to somehow generate the correct grammar (based on the extensions loaded in each module) + parser at runtime, then use it. Same goes for stuff like yacc, which are designed to generate a standalone parser, and not to use dynamically generated grammars.
I have yet to run into any real world example for a PL that has problem expressing itself in EBNF. I am also of the believe that if it can't, the language should change it specs, since it is probably to complex to understand anyway (I assume of course a smart lexer/postlexer like Lark has to handle indentations/matching tags)
And I am currently a de-facto co maintainer of Lark, so I might be biased as well. But I am still not sure if a half-manual recursive decent implementation is really a bad idea. But if I go that way, it will probably be also factored out into a separate package. |
That's exactly my approach (and why I'm going to use just the 3-line monkey-patch). Why,
Sadly, no. Softwares come in "versions". When you ship v2 of your macro module to users, it may generate significantly different code, and so any cached bytecode must be invalidated. The problem is real, but as I said, boring. Let's offload that to specially trained CPython developers. And if they fail, it's not rocket science to add custom .pyc format either (make-style standalone ".d" files are even easier). |
Yeah, but it is a bigger problem when developing an extension, because during that time there is a higher chance the the source files don't get modified. Also, it is probably possible to keep an additional set of files containing infos about the version of the macros and force regeneration based on that, so no need to mess with the original pycache format. On the other hand, it might also be possible to extend the default version information to include the macros. |
That's what I'm talking about above, this problem has well-known (human-readable) solution for C, where a particular .c file may depend on any number of .h files (recursively): https://stackoverflow.com/a/19114663/496009 . Of course, it's a bit less efficient than patching that info into .pyc format. |
I see. I have to admit that I read just the beginning of the ticket you mentioned, lark-parser/lark#803, and got an impression it talks about something like
+1, I always do it like that, and suggest everyone to do the same. Remember, how our conversation started? I suggesed to factor out just the bare syntax-overriding package, and not put any specific syntax hooks in the same package (these would go into separate package(s)). Or take my |
Wanted to brag about a pretty major (as for such a simple matter) screw-up I managed to make with And the scheme above worked well for when we import not a Python source. But it's all different for Python modules, which have toplevel code, which must be called with module |
This sadly doesn't work either, since the central package needs to let the different extensions speak with each other (e.g. if one adds a syntax for expressions and another one adds syntax for a statement that uses expression syntax they need to know of each other). This means I need put a syntax hook in the base package, and this one needs to be a good one. The rest of the base package is essentially implementation details and can change a lot, but this part should be solid. |
I also looked at your imphook after all and also completely forget that that is required. Now you already got two parameters to pass, and the CPython version just has a few more parameters (wrapped in a spec instance) to allow different parts to be a bit more general/ to also support stuff like namespace packages. |
@pfalcon
Just pinging you to make you aware of the fact that I decided to create a somewhat general system for creating extensions for python.
Currently, alot is still TODO, but the (very) basic framework is in place:
Saving this as
test.py
and either creating animport syntax_extensions.activate; syntax_extensions.activate.activate_encoding()
entry forsitecustomize.py
or creating a main file that does that and then executestest
will work. (If it works, an additional line will be printed at the end).What do you think? Is this more in line with your ideas?
The text was updated successfully, but these errors were encountered: