Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index node finder #64

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft

Conversation

15r10nk
Copy link
Collaborator

@15r10nk 15r10nk commented Dec 18, 2022

This is still work in progress.

The goal is an NodeFinder with similar capabilities like PositionNodeFinder, but for cpython <= 3.10.

see #42

todo:

@15r10nk 15r10nk force-pushed the index_node_finder branch 5 times, most recently from f148326 to 554504f Compare December 20, 2022 20:45
@15r10nk 15r10nk self-assigned this Jan 26, 2023
@15r10nk 15r10nk force-pushed the index_node_finder branch 2 times, most recently from 28a88ea to 637521e Compare January 26, 2023 21:42
@15r10nk 15r10nk force-pushed the index_node_finder branch 2 times, most recently from 8ad604b to 0a22823 Compare February 14, 2023 22:42
@15r10nk
Copy link
Collaborator Author

15r10nk commented Feb 19, 2023

hi @alexmojaki. I have some good news for you. The new NodeFinder is near completion and works so far, but I need some more time to finish and to optimize it.

Here you can get some impression what it can do:

https://github.com/15r10nk/executing/blob/index_node_finder/Features.md

This list gets generated by pytest and probably still misses some features.

@alexmojaki
Copy link
Owner

We're going to use executing in https://github.com/pydantic/logfire (btw you can see inline-snapshot being used there). But there are cases where logfire rewrites the AST and this prevents executing from working in Python<3.11. If this branch works better in the presence of pytest then I expect it to work better in our case as well.

It sounded like this was going well, can it be revived?

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

cool that inline-snapshot is used in logfire 😄

If this branch works better in the presence of pytest then I expect it to work better in our case as well.

kind of. This branch works not out of the box with pytest. I had to reapply the pytest changes on top of the original ast to make the mapping work. We would have to do the same with logfire / any other library which modifies the ast (like lazy-imports-lite for example).

It sounded like this was going well, can it be revived?

Yes, the concept worked, but I would merge it with the current main and add some comments to make it easier to understand before you review it.

I have also a general question to ast-rewriting with a custom MetaPathFinder.
I see the concept in several libraries (pytest/logfire/lazy-import-lite/ and probably many others). Is there any concept which would allow to combine multiple ast transformations? Like a library which installs one MetaPathFinder and calls multiple "plugins" which transform the ast before it gets converted into bytecode?

The auto_trace feature of logfire can currently not be used in combination with assertion rewriting of pytest for example. And I'm also not sure which rewriting would be used.

@alexmojaki
Copy link
Owner

kind of. This branch works not out of the box with pytest. I had to reapply the pytest changes on top of the original ast to make the mapping work.

Wouldn't that work equally well with the other node finders? It sounds like this branch doesn't actually help with pytest.

Would this branch work better when the statement being executed hasn't been modified, i.e. all the modifications are 'around' the executed statement?

I have also a general question to ast-rewriting with a custom MetaPathFinder.
I see the concept in several libraries (pytest/logfire/lazy-import-lite/ and probably many others). Is there any concept which would allow to combine multiple ast transformations? Like a library which installs one MetaPathFinder and calls multiple "plugins" which transform the ast before it gets converted into bytecode?

I have also thought such a thing would be nice. I imagine it doesn't exist because AST rewriting isn't common enough that people are bothered by them not cooperating.

Another library that does this is birdseye.

I could make executing work with auto-tracing if I did the AST transformation on the tree of an executing.Source. The problem is that @logfire.instrument also uses AST rewriting which depends on arguments, meaning that there is no longer a single Source associated with a file.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

Wouldn't that work equally well with the other node finders? It sounds like this branch doesn't actually help with pytest.

This branch has almost the same functionality like the PositionNodeFinder.
But I had to introduce a dependency to pytest to make it work.

https://github.com/15r10nk/executing/blob/6e2d947ae80960f5efc432bef358bf7b70526157/executing/_index_node_finder.py#L500

We would have to add something similar for logfire.

The problem is that @logfire.instrument also uses AST rewriting which depends on arguments, meaning that there is no longer a single Source associated with a file.

I hope that I'm correct here, but this should not be a problem. logfire could memorize every ast it has created for a source file and we could check each of them for a matching bytecode.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

Would this branch work better when the statement being executed hasn't been modified, i.e. all the modifications are 'around' the executed statement?

A modification at root level messes with the bytecode structure and the algo can not map the bytecodes (and child bytecodes) any more.
It could handle changes inside functions and be able to map expressions outside the changed functions, but pytest adds imports at the root level.

@alexmojaki
Copy link
Owner

Ah, after rereading #31 (comment) I see that this method requires the bytecode to match the source code exactly, my mistake for thinking otherwise.

I could store a Source for each modified code object produced by instrument, but at that point it's getting messy. I was hoping that there was a better solution here.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

I could store a Source for each modified code object produced by instrument, but at that point it's getting messy. I was hoping that there was a better solution here.

It might be enough to store the args. You should be able to generate the Source with this args and the original ast.

I was thinking about something like this:

@executing.register_ast_provider
def get_logfire_ast(path,original_ast):
   for args in all_args_for_file(path):
       yield transform(original_ast,args)

Could this also be used to make the SentinelNodeFinder work?

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

Can you tell me why you need the args to generate the code?
Are there cases where you have to generate different code for different args?

I read some source of logfire but I did not understood why instrument should generate different code for different args.

@alexmojaki
Copy link
Owner

Are there cases where you have to generate different code for different args?

https://github.com/pydantic/logfire/blob/8ce1cfe31208e5dc87f35221ef0d7eb2401348e4/logfire/_internal/instrument.py#L129

It might be possible to always generate the same code every time, but probably with some small performance cost.

I can consider various approaches to solving this problem if I get to it. For now I'm just going to only fully support this feature in 3.11+.

@15r10nk
Copy link
Collaborator Author

15r10nk commented May 8, 2024

It is just an idea and I probably don't know enough about logfire, but you could generate something like this:

@logfire.instrument
def some_function(arg_a,arg_b,__logfire_args__):
    if __logfire_args__.extract_args:
        __logfire_args__.method({"arg_a":arg_a,"arg_b":arg_b})
    ...

The instrument decorator should pass the __logfire_args__ at runtime.

This would also save some time if you have multiple instrumented functions, because you would transform the source only once.

But you have probably similar ideas.

It is ok for me if this branch does not get merged.
The new algo is nice but quiet complex. We can keep the current implementation if we don't need more features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants