Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect submodules without follow imports in them. #2384

Closed
byehack opened this issue Aug 12, 2023 · 14 comments · May be fixed by #2383
Closed

Collect submodules without follow imports in them. #2384

byehack opened this issue Aug 12, 2023 · 14 comments · May be fixed by #2383
Assignees
Labels

Comments

@byehack
Copy link
Contributor

byehack commented Aug 12, 2023

in standalone mode currently it is blocking to add package folder manually after partial of package included in final executable.
Actually, I need --collect-submodules that will include other module files without checking for imports in them, what currently --include-package doing is also includes what's imported inside submodules.

This will prevent multiple import search paths (one in binary and one in directory):

Module A:
    a.py -> no depends
    b.py -> depends on large package B

here app.py only imports A.a and nuitka only collects this.
app.py also in the future would dynamically import A.b.
if try to create folder A and put b.py in executable dir, still you cannot import it, bcz namespace A already exists inside of executable (first path in sys.path).

I tried --include-package-data, I was expecting to include .py files, but it didn't.

byehack added a commit to byehack/Nuitka that referenced this issue Aug 13, 2023
@kayhayen
Copy link
Member

Code is not data files. Data file options ignore DLLs, extension modules and Python code. You can specify a pattern to force it to include code files, but then you are on your own when you try to use them in any way.

@kayhayen
Copy link
Member

I am not sure, I like the collect approach. But ever after I added no-auto-follow to the Yaml, plugins now get told which package wants to include a module, and the decision is actually limited to that module, and so Nuitka actually is capable of doing what you say you want here. But I somehow feel you are probably just confused, and tell me a technical mechanism that you believe to be the solution, rather than what the actual problem is.

@byehack
Copy link
Contributor Author

byehack commented Aug 13, 2023

Nuitka actually is capable of doing what you say you want here.

Yes as simple as adding few lines.

But I somehow feel you are probably just confused, and tell me a technical mechanism that you believe to be the solution, rather than what the actual problem is.

As I said in description above, by this way, without need B to be included in binary, Later I'll able to manually put package B in app directory, then dynamic import A.b would not fail when app.py tries it.

I did it in linked PR, actually currently it implemented only for interpreter deps.
(encodings and importlib submodules auto included at here)

@kayhayen
Copy link
Member

The PR changes are not general, they only affect the stdlib packages require for startup. Also I need to review it, but encodings are already included by default, it seemed to be done that way. Not sure, why importlib should be treated that way, so far I never had an issue with it being incomplete.

Maybe you are encoutering a bug that needs a resolution? The noautofollow is being used in stdlib a lot these days, e.g. we are doing these.

- module-name: 'json.decoder'
  anti-bloat:
    - description: 'avoid _json module usage'
      no-auto-follow:
        '_json': 'may slow down by using fallback implementation'
      when: 'not has_builtin_module("_json")'

So, if I get you correctly, you have something like an external Python application, that you want to include, and have or not have its dependencies included. You do not want or know to use multidist for that. And quite generally, I guess, it would be sweet if Nuitka finally allowed to include modules as source code rather than as compiled code, since you do not care as much about that other program, than say the main one.

@kayhayen
Copy link
Member

If this is about how only needed stdlib is included, e.g. with importlib and the external program is supposed to use that, what I would like to see, is that we end up using multidist, doing a full dependency analysis of second program, and then exclude things that you do not want to follow. But in the mean time, if you included data files with /*.py it will do it, and you just need to add your own --include-package=importlib which may or may not have bugs.

The yaml package configuration e.g. has this

- module-name: 'pysnmp.smi'
  data-files:
    patterns:
      - 'mibs/**/*.py'

I believe command line patterns do that as well, I recall using it recently like that for a customer. It will however, not work to use * or just directory names, and the patterns in --include-data-dir I believe ought to work, to allow code, but I am not 100% sure of that now.

I do not see, where your change comes in handy in any of that. Not including all of stdlib anymore, seems to be a problem for you, and I can see how that is bad. I could see how we add that as an option back, but I am not fond of doing that, happy to have gotten rid of that.

@byehack
Copy link
Contributor Author

byehack commented Aug 13, 2023

encodings are already included by default, it seemed to be done that way.

Yes partly included, some submodules excluded by conditions.

Not sure, why importlib should be treated that way, so far I never had an issue with it being incomplete.

Assume I'll want to manually put submodules in app dir, but the current approach blocking that bcz of multiple search path!

Maybe you are encoutering a bug that needs a resolution? The noautofollow is being used in stdlib a lot these days, e.g. we are doing these.

I guess you misunderstood, Let me tell more obvious example:
using nuitka 1.8rc9 and app.py:

import mypack.mysub
exec(input("> "))
mypack/
    __init__.py -> empty
    mysub.py -> print("mysub hello")
    extra.py -> print("extra hello")

try create standalone of app.py, then put the test.py in app.dist that contains:

import mypack.extra

run app.exe and enter import test:

E;\app.dist>app
mysub hello
> import test
Traceback (most recent call last):
  File "E:\app.py", line 3, in <module>
  File "<string>", line 1, in <module>
  File "E:\app.dist\test.py", line 1, in <module>
    import mypack.extra
ModuleNotFoundError: No module named 'mypack.extra'

here test.py added manually, but I can't add mypack.extra manually.

@byehack
Copy link
Contributor Author

byehack commented Aug 13, 2023

It will however, not work to use * or just directory names,

Correct, I tried all of these and yaml file before. yaml file process per module, I want that --collect-submodules command to be apply for all collected packages, As I don't want re-compile, I can put files manually in directory.

@byehack
Copy link
Contributor Author

byehack commented Aug 13, 2023

I do not see, where your change comes in handy in any of that. Not including all of stdlib anymore, seems to be a problem for you, and I can see how that is bad. I could see how we add that as an option back, but I am not fond of doing that, happy to have gotten rid of that.

Its WIP and only did on interpreter deps to show that works fine in the base code.
We should expand it to elsewhere as we as adding cli options.

@kayhayen
Copy link
Member

So, do you want to specify what to include, or do you imagine including all modules that you have in your Python installation? You could still do that externally by building a command line, but surely it will explode due to length issues on even the more forgiving OSes, or project options which do not have that kind of limit.

Plugins can at this time not contribute to the list of root modules, which arguably is an omission, and ought to be easy to add. You would walk there with pkgutil.walk_packages() or whatever does the trick, and add all of those, then decide their compilation mode as well, and maybe that will already work. With the command line options of plugins, you can then control that, and we could even add that plugin.

I am not sure, what you said so far really requires core changes. Accepting patterns for the inclusion options like * I am sure will be a not so nice new user trap, because something will tell them, yeah, that's solving my issues, where it probably is not the right solution.

@byehack
Copy link
Contributor Author

byehack commented Aug 15, 2023

I guess with 07140a6 and 474d96e, this is now more understandable. (ModuleName with dont_follow attribute)

So, do you want to specify what to include, or do you imagine including all modules that you have in your Python installation?

We can do both: --sub-collect=[ all | stdlib | specificPackage ]

I am not sure, what you said so far really requires core changes. Accepting patterns for the inclusion options like * I am sure will be a not so nice new user trap, because something will tell them, yeah, that's solving my issues, where it probably is not the right solution.

We don't need * as option, this option is for preventing multiple module search paths for whose adding more packages in directory manually.
Including all (recursive) submodules without following imports in them would be enough. So only package name would be enough.

@kayhayen
Copy link
Member

It seems you are re-implementing nuitka.importing.Recursion.decideRecursion in a worse way (module names are strings, they are not supposed to carry usage information). This is a typical pattern it's used like:

            # This will get back to all other plugins allowing them to inhibit it though.
            decision, decision_reason = Recursion.decideRecursion(
                using_module_name=module.getFullName(),
                module_filename=module_filename,
                module_name=full_name,
                module_kind=module_kind,
            )

            if decision:
                imported_module = Recursion.recurseTo(
                    module_name=full_name,
                    module_filename=module_filename,
                    module_kind=module_kind,
                    source_ref=module.getSourceReference(),
                    reason="implicit import",
                    using_module_name=module.module_name,
                )

                addUsedModule(
                    module=imported_module,
                    using_module=module,
                    usage_tag="plugin:" + plugin.plugin_name,
                    reason=decision_reason,
                    source_ref=module.source_ref,
                )

The point where stdlib is scanned, these decisions should be asked, and used. That is actually a bug to not do it, that makes e.g. -nofollow-import-to=textwrap not have an effect.

Including all of stdlib would be an include option, that make the decision function always return yes. Following is not including, so a --include-stdlib could be added to force including all, and --noinclude-stdlib to disable the compromise Nuitka is currently implementing in the hard coded way, and that should be used when none of those is given.

The stdlib scan has historically 2 phases, one where it picks technically needed stuff, plus one picking up the stdlib module names for inclusion generally even without anything else using it, where the later is based on a file system scan.

@ArtBIT
Copy link

ArtBIT commented Jun 7, 2024

I have a similar issue with pygame package.
All submodules like pygame.sprite, pygame.display, pygame.mixer etc., are missing and result with ModuleNotFoundError

@KRRT7
Copy link
Contributor

KRRT7 commented Jun 7, 2024

@ArtBIT can you open a new issue with a minimal reproducible example? thank you.

@Nuitka Nuitka locked as resolved and limited conversation to collaborators Jun 8, 2024
@kayhayen
Copy link
Member

kayhayen commented Jun 8, 2024

This issue never gave any fruit. From my understanding, it was attempted to have an extra implementation of how to decide recursion rather than using a plugin.

@kayhayen kayhayen closed this as completed Jun 8, 2024
@kayhayen kayhayen self-assigned this Jun 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants