Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better idea. #1

Open
MegaIng opened this issue Dec 17, 2020 · 30 comments
Open

A better idea. #1

MegaIng opened this issue Dec 17, 2020 · 30 comments

Comments

@MegaIng
Copy link
Owner

MegaIng commented Dec 17, 2020

@pfalcon

Just pinging you to make you aware of the fact that I decided to create a somewhat general system for creating extensions for python.

Currently, alot is still TODO, but the (very) basic framework is in place:

# -*- coding: syntax-extensions -*-

from __syntax_extensions__ import test_base

print(test_base)

Saving this as test.py and either creating an import syntax_extensions.activate; syntax_extensions.activate.activate_encoding() entry for sitecustomize.py or creating a main file that does that and then executes test will work. (If it works, an additional line will be printed at the end).

What do you think? Is this more in line with your ideas?

@pfalcon
Copy link

pfalcon commented Dec 19, 2020

Thanks for the ping! Looks interesting, I see that it provides some utilities for AST transformation, will need to look in more detail into that.

Is this more in line with your ideas?

Not exactly. You describe your package as: "Allows one to use/create custom syntax extensions for python as simple as possible." But I would like to do even more general things - being able as easy as possible import absolutely anything into Python.

And I'm making headway with that idea. I hope you're familiar with https://github.com/dabeaz/bitey - it allows to import LLVM bitcode as if it was a Python module. So, here's a quick remake of that in my import framework: https://github.com/pfalcon/ubitey/blob/master/ubitey.py

Example how it works with Pycopy:

$ ls -1
out.bc
ubitey.py

$ pycopy-dev
Pycopy v3.4.2-69-g43c9af91a on 2020-12-20; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode

# Install import hook
>>> import ubitey

# Now just import LLVM bitcode module as if it was a Python module
>>> import out
# Debug output
; ModuleID = 'out.bc'
source_filename = "mod"

define i32 @sum(i32 %a, i32 %b) {
entry:
  %tmp = add i32 %a, %b
  ret i32 %tmp
}

# Call a function from the LLVM bitcode module
>>> out.sum(10, 20)
30
>>> 

As you can see, the hook is installed with:

_old_hook = sys.set_import_hook(import_llvm_bc, ("bc",))

The import hook itself is:

def import_llvm_bc(path):
    my_path = path + ".bc"
    if not os.path.isfile(my_path):
        return _old_hook(path) if _old_hook else None

   ... do the loading ...

Some time later, I'll be looking into custom-loading specifically .py files, and it will be interesting to see what kind of AST utils you come up with. But in my version, those would be in separate package. Because, well, you don't need Python AST utils to load LLVM bitcode.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 19, 2020

But isn't this also perfectly possible with adding a new meta path finder? In my understanding you wanted to actually replace the import of normal modules, which is what this package is supposed to do.

I would also strongly discourage implementation having to manually call down to the next import_hook. That will only lead to problems.


Note that regarding AST utils, I am still trying to decide on how to best allow modification of the AST:

  • The current system of each extension having a single transform interface doesn't work that well when more than one extensions try working on one Statement.
  • A more complex system where each extension defines a collection of rules they implement for different categories (statement, expr, pattern, etc...) of code on the other hand is a lot of work for every single extension, even if it does trivial things.

@pfalcon
Copy link

pfalcon commented Dec 20, 2020

But isn't this also perfectly possible with adding a new meta path finder?

Kinda. But as you remember, my primary usecase in Pycopy, and that doesn't have any "meta path finders" and alike mumbo-jumbo ;-). sys.set_import_hook() is what I settled on to be "native" import hooking on Pycopy. Forward-compatible implementation for CPython exists (need to clean-up in push, it'll be activated by import pycopy when run in CPython).

Back to a new meta path finder, that's how the original bitey implements it, yeah. But that's not equivalent to loading a module with different extension instead of .py module. That's because metapath hooks are executed in sequence. As a rough example, suppose you have audioop.bc in your current directory (first entry on sys.path). What I want is that "import audioop" imported it. Instead, bitey imports me stdlib module. That happens because bitey appends its hook to metapath. Now if it prepended to it, import would err on the side .bc (it could import a .bc later in sys.path in preference of .py earlier on sys.path).

The above cases shouldn't happen often, but that only aggravates the situation, for when it happens, an average user will have totally no idea for wtf it works like that, and will end up just cursing CPython's import hooking system, which is total mess. And I'm with an average user here ;-). That's why in my sys.set_import_hook() implementation for CPython, I don't really install a new hook, but patch existing hooks installed by CPython itself, to achieve hooked file import exactly at the same time when a corresponding .py file would be imported (but before, to allow to override .py import too).

@pfalcon
Copy link

pfalcon commented Dec 20, 2020

I would also strongly discourage implementation having to manually call down to the next import_hook. That will only lead to problems.

That's why I posted to https://mail.python.org/archives/list/python-ideas@python.org/thread/NIQQVA3OJFUHL3INBMBUBMTP24W74XEO/ , to get a feedback like that. But I got none relevant. While you're nailed it right on the spot at once ;-).

Right, I agree. I actually wanted to implemented it like that as native Pycopy API, but when I didn't finish implementing it yet, it was already +0.5K to code size (x86 32-bit) of Pycopy. While this "call recursive yourself" impl was +250 bytes. As import hooking is adhoc feature, I decided to go with the most minimal impl, and provide "more streamlined API" as a separate Python module on top of it.

So, now you know all the dirty details of my plan ;-).

@pfalcon
Copy link

pfalcon commented Dec 20, 2020

Note that regarding AST utils, I am still trying to decide on how to best allow modification of the AST:

That's exactly what I can't much help with, and what I'm wondering myself either. So again, I'll be interested to see what you'll come up with. As you can see, I'm a bit behind on things, still fighting with basics on how to do import hooking right, so can't even offer a good discussion now.

I however could say that I definitely would start with existing transformation means, e.g. ast.NodeTransformer. If that appears to be too low-level/too cumbersome, then look for "utils" for common usecases (would require to enumerate, what are these common usecases!).

Another approach, "from the top", would be to see how to let user define "macros". Reading existing proposals, e.g. https://www.python.org/dev/peps/pep-0638/ might help with some ideas (including the bad ones, e.g. "meta path import hooks" above goes in the "bad ideas" department for me ;-) ).

@pfalcon
Copy link

pfalcon commented Dec 20, 2020

and provide "more streamlined API" as a separate Python module on top of it.

I now even know how it will be called: imphook. It's hard to believe that name wasn't taken, but that's it.

Which returns us to sys.set_import_hook(). It's called sanely, set_import_hook(), because initially I didn't target it to sys module, but something like pycopy.set_import_hook(). But as I went bold with monkey-patching sys, can as well go to its onw conventions (settrace() and friends), and go with sys.setimphook(). That saves whole 5 bytes on the Pycopy side too!

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 20, 2020

I however could say that I definitely would start with existing transformation means, e.g. ast.NodeTransformer.

This works perfect if you already have a predefined AST which can represent the syntax. But this is exactly what I don't want. I want extensions to actual be able to define new syntax structures like the PEP-634 Pattern matching. This means I have to somehow let extensions define their own AST nodes and how to get them. The current 'parser' just generates a slightly structured version of a Token list.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 20, 2020

That happens because bitey appends its hook to metapath

This is also a problem I will run into. My solution is to simply replace the entire Path based meta finder and replace it with my own that does the corresponding preprocessing required for this project. If that is done, loading none python files at the same time where the python file would be loaded of course is not a problem.

What you want is a step between meta hooks and import path hooks (Note that your name is almost already take. So be aware with naming). While I do not agree that your solution is perfect for the more general CPython, it is probably the best for pycopy.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 20, 2020

Note that after rereading the import specs, for official python you just have to subclass FileFinder and replace the corresponding entry in sys.path_hooks. This makes it actually relatively easy.

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

This is also a problem I will run into. My solution is to simply replace the entire Path based meta finder and replace it with my own that does the corresponding preprocessing required for this project.

Note that after rereading the import specs, for official python you just have to subclass FileFinder and replace the corresponding entry in sys.path_hooks. This makes it actually relatively easy.

Kinda same for me, except I do much finer-grained monkey-patching. And you need to do monkey-patching, because importlib isn't really designed with extensibility in mind: it either allows you to do add som completely new handling, but fine-grained overriding of existing behavior is hardly supported. In my case, I don't to add or replace any meta-finders. I don't want even to replace any path-finder. I just want add new loader to the existing finder. And that's more or less what I do in https://github.com/pfalcon/pycopy-lib/blob/master/cpython-pycopy/pycopy_imphook.py . If you have better/cleaner idea how to do that, while preserving the proper import semantics as described above, I'd be interested to look.

While I do not agree that your solution is perfect for the more general CPython, it is probably the best for pycopy.

How I design these things, is look for the behavior needed, then adjust for code size constraints (one of the big ideas is "what can be done in Python, should be done in Python"). I don't agree that these things are "more general" in CPython. They're clearly overdesigned (in a Java way), and on multiple occasions, the CPython core developers spilled the beans that they do, and keep, these things complicated way to preclude many people to use it. Which is exactly the opposite to our aim here (well, my aim for sure).

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

As import hooking is adhoc feature, I decided to go with the most minimal impl, and provide "more streamlined API" as a separate Python module on top of it.
So, now you know all the dirty details of my plan ;-).

This now has been posted too:

This effectively contains the same code (adjusted of course) as linked above for the CPython case - I decided to go this way instead of doing layered implementation on top of setimphook() (which is still used for the Pycopy case of course), to benefit from the import stat cache CPython already bloats its (well, user's) memory with. In other word, in CPython case, there's no need to do os.path.exists().

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

I however could say that I definitely would start with existing transformation means, e.g. ast.NodeTransformer.

This works perfect if you already have a predefined AST which can represent the syntax. But this is exactly what I don't want. I want extensions to actual be able to define new syntax structures like the PEP-634 Pattern matching. This means I have to somehow let extensions define their own AST nodes and how to get them. The current 'parser' just generates a slightly structured version of a Token list.

These seems to be 2 orthogonal matters, let's go over them one by one.

  1. It's a matter of fact that that AST's in Python are represented by ast.AST. And the standard way to transform them is using ast.NodeTransformer. So, that should be the baseline of what should be supported/recommended. I can give MacroPy as a counter-example - it pushes its "walkers" in your face, which only adds cognitive load when learning it (and learning just bare ast is quite a load already - again, the committee purposely keep if hard to deter people from using it). Note that it's not that you can't use ast.NodeTransformer, it's the same AST trees after all. Just the fact that MacroPy doesn't start with them (or even mentions at all) makes learning curve more steep and gives a feeling this "thing is disconnected from the rest of ecosystem". (Note that why MacroPy has it that way is probably because it predates ast.NodeTransformer.) So, to emphasize the point again - you certainly can do better than ast.NodeTransformer (I'm looking for Scheme-style quasiquotes for example), it's just what one shoukd start with.

  2. Back to your "This works perfect if you already have a predefined AST which can represent the syntax. But this is exactly what I don't want. I want extensions to actual be able to define new syntax structures like the PEP-634 Pattern matching." Well, everyone wants that. But to do that, you need to implement your own parser. No, that puts it wrong. You need to have an existing parser which you can easily extend to handle your constructs. Do you have such a parser? I for one, have my own Python parser (part of Pycopy's ast module implementation), but I didn't yet try to make it extensible (subclassable/overridable). You can find other pure-Python Python parsers, but the question is the same - do you need to define a complete new grammar for them and generate a complete parser, or can you subclass it to just define handlers for syntactic elements which interest you?

That's the reason why many people follow "munge with new syntax on tokenizer level and convert it to existing syntax, to let existing hardcoded parser do its job, and then pick up AST and do 2nd part of the needed transformation (where actual handling of new features is implemented)".

And it doesn't seem that bad. Like, it's a hack, and I (just like as you, it seems) would prefer to have a subclassable parser, but it's a matter of tomorrow. While token-level hacking is a matter of today (and yesterday, i.e. it's fully backward compatible, and compatible across impls). I personally made myself to "feel people's pain", and try to do various transformations on token stream level. Again, there's definitely a feeling of "ugly hacks" present, but again, it's not that bad. Likely, just a good toolbox to deal with that is missing. After all, it's effectively parsing too - just not complete parsing of entire token stream, like with normal parser, but pattern matching to find interesting syntax pieces, then partial parsing of them. And you need to generate not AST as with normal parser, but token stream, just as input.

Well, that's what I wanted to ask - do you know good Python token stream manipulation libs/tools?

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

My solution for import hooks is here: https://github.com/MegaIng/syntax-extensions-base/blob/master/src/syntax_extensions/activate/import_hooks.py . It is similar to yours, except that it replace the handling of .py files and that I decided to use a different method to find the old FileFinder.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

Well, that's what I wanted to ask - do you know good Python token stream manipulation libs/tools?

No. But I also don't want that. It doesn't really help with my case here. This is just replacing on hack with another hack, which is more complicated, more error prone and a little more powerful. I want to actually create a subclassable parser as you called it. And here are different paths I had in mind:

  • Use the preparser I have till now, add a new .base.parser module that allows one to analyze a specific line to a real AST, containing what any extensions want to add as Nodes. This is in my believe the best solution, but it is a lot of work. To correctly work, each extensions has to manually define every single syntax element they add (and I also have to do this with base python grammar in a very verbose way). This is what is currently half implemented on my PC, but it really is a lot of work.
  • Use lark, implement these pending features and also add a decent plugin system to lark. This is probably the best solution, since it allows extension module writers to just use normal EBNF and simple Transformer to modify the tree however they want. (there is also an almost finished python3 grammar). It however adds a dependency, so probably not an option for you. I also will need to invest at least a few days in implementing these features.

I will probably go with Option 2, but that means that this project will be dormant for a moment

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

This now has been posted too: https://github.com/pfalcon/python-imphook

(Couldn't you use runpy to execute the script?)

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

My solution for import hooks is here: https://github.com/MegaIng/syntax-extensions-base/blob/master/src/syntax_extensions/activate/import_hooks.py . It is similar to yours,

Thanks. I trust your word that it's "similar", because to me it looks "sufficiently different". Which again just shows what the convoluted mess CPython's importlib is...

except that it replace the handling of .py files and that I decided to use a different method to find the old FileFinder.

But do you know that if you want to override loading of .py files, it's literally 3 lines. Of course, 3 lines of monkey-patching. CPython core developer (update: no, the guy is not a core developer, just an advanced user) spills the beans how the CPython committee runs of their legs to keep importlib usage complex for mere humans: https://mail.python.org/pipermail/python-ideas/2016-January/038135.html . That's total nonsense. Just imagine that with another language it would be the same. I don't know, let's take C++ and Boost libs as an example: just imagine, that you could use it, just before that, you would need to apply patches to its headers. It's of course cool that CPython allows dynamic patching, but that's not a normal programming practice, and not public interface. In particular, if enough people will use, they purposely will break it, just you wait.

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

(Couldn't you use runpy to execute the script?)

Good question. The apparent answer "that module is not in my usual toolbox". Live and learn, they say (then forget the noise again, and I wonder if the "runpy" module falls under the category of "noise" ;-). We'll see.)

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

The apparent answer "that module is not in my usual toolbox".

Same for me. I run accross this because I was wondering if there really isn't a somewhat simple way of writing a wrapper script (e.g. like coverage.py does it) and only then I discoverered this std-lib module.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

Of course, 3 lines of monkey-patching

These lines do almost the same job, but not quite. They also use deprecated features, and might not work on PyPy. My solution should be within the python spec and PyPy compatible. (your solution is also not within the CPython specs)

I trust your word that it's "similar"

We are both having the same steps:

  • Create a Loader (here are the major differences, because our loaders do different jobs)
  • Use FileFinder.path_hook with a list of loaders to use (here I inlined _get_supported_file_loaders because I need to replace an entry)
  • Find the old FileFinder entry in path_hooks. (Here you do a typecheck, which is somewhat inefficient since you always instantiate the loaders since the type can never be an entry, where as I am not sure if it is guarantee that the name will be the same)
  • Then we both replace the entry (you with pop and insert, I with []=)
  • The we clear the cache (which as example is missing from the 3 line hack)

I also have a few have a few extra features: no_cache preventing the usage of __pycache__, and exclude_path preventing the overwriting of some paths. This is important for me since I need to not handle stdlib to not let the performance drop to far.

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

  1. Use the preparser I have till now, add a new .base.parser

I see, this seems to be half-way between "partial token stream pseudo-parsing" and full-fledged recursive-descent parsing. In my experiments I also concluded that beyond the very simple transformations (like replacement of a single token with 1+ another), the best way is to accumulate the whole line of tokens (until token.NEWLINE) and then match on it. You go beyond that and accumulate whole "Module" of "Statements" of "Paren" expressions. Of course, that's still rather adhoc, when you use real recursive descent, you don't need to do such buffering (unless you really need to do "unlimited" look-ahead).

  1. Use lark, implement these pending features and also add a decent plugin system to lark. This is probably the best solution, since it allows extension module writers to just use normal EBNF and simple

Oh, reminds me myself from 3 years ago: erezsh/plyplus#46 (that's another project from the same author as lark). But as you can see, even then topic was more like "Lark seems to do what I can't [so far], but can I do it my way nonetheless?" For myself being an adept of the sect of Witnesses of Recursive Descent. I'm sure you're familiar with such. The problem of EBNF is that simple things are (relatively) simple, and more complex things are usually impossible. And it's only "relatively" simple, because even if you have an EBNF concept in your head (everyone has), expressing it in a DSL of a particular tool is a separate task still. And there many such tools, and there's no clear leader (well, yacc is the clear leader, I wonder how many of other tools follow its syntax close enough. My bet is "not many", as there's always an itch to "improve" some trivial aspect, like make syntax different (they think "better") ;-). )

All these problems don't apply to recursive descent - its primitive operations are well-known, and no matter how you call them, they're immediately visible. And simple thing are even simpler than with EBNF (no DSL), and many more complex thing range from "trivial" to "possible". So, since those noob lark questions, I wrote my recursive-descent parser for Python, as part of ast.parse() implementation for Pycopy: https://github.com/pfalcon/pycopy-lib/blob/master/ast/ast/parser.py . Well, to not grow too rusty, and learn something new, I implemented Pratt parser for expressions (including stuff like lambda and foo if bar else baz, which is a notch above trivial).

I will probably go with Option 2, but that means that this project will be dormant for a moment

Just to make sure that you have these options in the dark corner of your toolbox too as possible sources for Python parsers, in addition to Lark:

The last option would be actually the easiest way to get a Python parser which follows the latest CPython version (including whatever madness they may add to the language based on PEG NP-completeness).

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

I also have a few have a few extra features: no_cache preventing the usage of pycache

Yeah, I noticed that. That's an interesting (hmm, boring?) practical problem of macro-like processing:

  • Quoting MycroPy:

Although PycExporter automatically does the recompilation of the macro-expanded files when they are modified, it notably does not do recompilation of the macro-expanded files when the macros are modified. (https://macropy3.readthedocs.io/en/latest/export_code.html)

  • Quoting PEP0638:

version is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer.

@markshannon doesn't elaborate beyond that how exactly that "version" (versions!) is used in .pyc. Likewise, it's a mystery why it's "integer version" and not just an automatic mtime of a file contaning the macro definition. I'd say, let's wait and see till (C)Python scientific thought comes to an idea that single .pyc may depend on multiple source files (or should that be "source modules"?), and changes to them should be tracked similarly to how "a single source" of a .pyc is tracked now.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

That's an interesting (hmm, boring?) practical problem of macro-like processing.

Only as long as the extensions/the library is in development. Afterwards, only changes in the source file should be that relevant, since the extensions are listed in the file. That is why I made it an optional mode, to be used when developing, but deactivate when just using the extensions

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

The last option would be actually the easiest way to get a Python parser which follows the latest CPython version

But sadly, this isn't enough. If I would have to use such a preexisting tool, I would have to somehow generate the correct grammar (based on the extensions loaded in each module) + parser at runtime, then use it. Same goes for stuff like yacc, which are designed to generate a standalone parser, and not to use dynamically generated grammars.

The problem of EBNF is that simple things are (relatively) simple, and more complex things are usually impossible

I have yet to run into any real world example for a PL that has problem expressing itself in EBNF. I am also of the believe that if it can't, the language should change it specs, since it is probably to complex to understand anyway (I assume of course a smart lexer/postlexer like Lark has to handle indentations/matching tags)

For myself being an adept of the sect of Witnesses of Recursive Descent

And I am currently a de-facto co maintainer of Lark, so I might be biased as well.

But I am still not sure if a half-manual recursive decent implementation is really a bad idea. But if I go that way, it will probably be also factored out into a separate package.

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

Only as long as the extensions/the library is in development.

That's exactly my approach (and why I'm going to use just the 3-line monkey-patch). Why, rm -rf __pycache__ is everyone's best friend!

Afterwards ...

Sadly, no. Softwares come in "versions". When you ship v2 of your macro module to users, it may generate significantly different code, and so any cached bytecode must be invalidated. The problem is real, but as I said, boring. Let's offload that to specially trained CPython developers. And if they fail, it's not rocket science to add custom .pyc format either (make-style standalone ".d" files are even easier).

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 25, 2020

Softwares come in "versions". When you ship v2 of your macro module to users, it may generate significantly different code

Yeah, but it is a bigger problem when developing an extension, because during that time there is a higher chance the the source files don't get modified. Also, it is probably possible to keep an additional set of files containing infos about the version of the macros and force regeneration based on that, so no need to mess with the original pycache format. On the other hand, it might also be possible to extend the default version information to include the macros.

@pfalcon
Copy link

pfalcon commented Dec 25, 2020

Also, it is probably possible to keep an additional set of files containing infos about the version of the macros and force regeneration based on that, so no need to mess with the original pycache format.

That's what I'm talking about above, this problem has well-known (human-readable) solution for C, where a particular .c file may depend on any number of .h files (recursively): https://stackoverflow.com/a/19114663/496009 .

Of course, it's a bit less efficient than patching that info into .pyc format.

@pfalcon
Copy link

pfalcon commented Dec 26, 2020

But sadly, this isn't enough. If I would have to use such a preexisting tool, I would have to somehow generate the correct grammar (based on the extensions loaded in each module) + parser at runtime, then use it. Same goes for stuff like yacc, which are designed to generate a standalone parser, and not to use dynamically generated grammars.

I see. I have to admit that I read just the beginning of the ticket you mentioned, lark-parser/lark#803, and got an impression it talks about something like #include for grammar files. I missed that you talk about actual patching of already loaded grammar with new "grammar fragment" objects. Well, that sounds pretty cool. Again, I myself wouldn't do it this way still, but it's inspiring to know someone works on such pretty advanced topics.

But I am still not sure if a half-manual recursive decent implementation is really a bad idea. But if I go that way, it will probably be also factored out into a separate package.

+1, I always do it like that, and suggest everyone to do the same. Remember, how our conversation started? I suggesed to factor out just the bare syntax-overriding package, and not put any specific syntax hooks in the same package (these would go into separate package(s)).

Or take my ast package implementation as an example: https://github.com/pfalcon/pycopy-lib/tree/master/ast/ast . To start with, while its part of Pycopy stdlib, it's not Pycopy-specific. I run tests against CPython either. Then you can see that it's exactly structured into submodules: types (autogenerated from Python.asdl), __init__, which contains utilities like NodeVisitor (could really put it in utils, and keep __init__ solely as package import-all, but decided to not go bread-crumb way here), and what we're talking about: parser. My parser is also extensible by design, e.g. it's very obvious how to add new simple statement: just subclass parser.Parser and override match_small_stmt() https://github.com/pfalcon/pycopy-lib/blob/a2a344d1160f5b2a09ea9403463eaf2d33a9fa06/ast/ast/parser.py#L748, matching whatever you want, and if it didn't, call super method. It still requires extra pass to make it truly flexible. E.g., if you look at match_compound_stmt() https://github.com/pfalcon/pycopy-lib/blob/a2a344d1160f5b2a09ea9403463eaf2d33a9fa06/ast/ast/parser.py#L826, it starts with matching decorators. Of course, that should be factored out into separate method, to allow to override just the part you want.

@pfalcon
Copy link

pfalcon commented Dec 26, 2020

Wanted to brag about a pretty major (as for such a simple matter) screw-up I managed to make with imphook (I like finding mistakes in other people's stuff, so should enjoy doing the same in my own). There was a push to "pass as little as possible" to hooks, to keep it simple, that's why I decided to pass only a file path to hooks, and not a module name. "We can set the module name (__name__) afterwards", I thought (and said in the docs). But I have to admit there was another side of the story - module name was not readily available from where I call the hook in Pycopy C code.

And the scheme above worked well for when we import not a Python source. But it's all different for Python modules, which have toplevel code, which must be called with module __name__ already set to the right value (because the code may check it), so we can't "set it afterwards". Fixing my crap...

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 26, 2020

not put any specific syntax hooks in the same package

This sadly doesn't work either, since the central package needs to let the different extensions speak with each other (e.g. if one adds a syntax for expressions and another one adds syntax for a statement that uses expression syntax they need to know of each other). This means I need put a syntax hook in the base package, and this one needs to be a good one. The rest of the base package is essentially implementation details and can change a lot, but this part should be solid.

@MegaIng
Copy link
Owner Author

MegaIng commented Dec 26, 2020

and not a module name

I also looked at your imphook after all and also completely forget that that is required. Now you already got two parameters to pass, and the CPython version just has a few more parameters (wrapped in a spec instance) to allow different parts to be a bit more general/ to also support stuff like namespace packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants