Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sub-packages, modules, and imports #45

Closed
brandonwillard opened this issue Sep 26, 2020 · 4 comments
Closed

Refactor sub-packages, modules, and imports #45

brandonwillard opened this issue Sep 26, 2020 · 4 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed important
Milestone

Comments

@brandonwillard
Copy link
Member

The current package/module structure leads to numerous circular dependencies and an overall unwieldy import process. We need to refactor the imports and module/subpackage layouts entirely.

For instance, it should be possible to import the core graph types (e.g. Node, Apply, Variable) without importing any Ops, Linkers, and other compilation-based classes. Furthermore, there's rampant use of module references like theano.* to access objects within subpackages. This obfuscates the dependency structure when casually reading source file headers and balloons the cross-dependencies when use of theano.* alone perform numerous automatic imports.

@brandonwillard
Copy link
Member Author

brandonwillard commented Sep 28, 2020

One thing I'm noticing: most sub-packages have extensive imports for modules and objects contained within said sub-package's modules and sub-sub-packages. This forces imports of said modules, objects, and sub-sub-packages whenever one attempts to access anything within the original sub-package. It has also allowed indirect references via module objects and imports to grow all throughout the codebase.

We can probably clear up a lot of this entanglement by simply removing most imports from the __init__.py files (and fixing broken references caused by that, of course). As far as I can tell, the only package-level imports we really need are user-level imports (e.g. access to core tensor types, Ops and their helper-functions, grad, function, and perhaps some others). All imports performed within Theano should otherwise be direct and require no package/sub-package-level "shortcuts" via __init__.pys.

More specifically, any import or reference (e.g. import foo followed by a foo.thing, or from foo import thing) to an object in a __init__.py that wasn't expressly defined in said __init__.py (e.g. foo/__init__.py contains a line like from foo.bar import thing) should be replaced with a "direct" reference (e.g. foo.thing turns into foo.bar.thing, or the import changes to from foo.bar import thing).

From there, if we have the user-level import requirements/structure figured out, then completing this refactoring should be easy.

@brandonwillard
Copy link
Member Author

any import or reference (e.g. import foo followed by a foo.thing, or from foo import thing) to an object in a __init__.py that wasn't expressly defined in said __init__.py (e.g. foo/__init__.py contains a line like from foo.bar import thing) should be replaced with a "direct" reference (e.g. foo.thing turns into foo.bar.thing, or the import changes to from foo.bar import thing).

For anyone who might be interested in doing this, I recommend an automated approach using libcst.

Here's a rough outline of such an automation:

Visit each {pkg}/__init__.py and gather their package-level imports and rewrites as follows:

  • find from {pkg}.{sub_pkg} import {mod} and create the rewrite pair ({pkg}.{mod}, {pkg}.{sub_pkg}.{mod})
  • find from {pkg}.{mod} import {obj} and create the rewrite pair ({pkg}.{obj}, {pkg}.{mod}.{obj})
  • find import {pkg}.{mod} as {mod_ref} and create the rewrite pair ({pkg}.{mod_ref}, {pkg}.{mod}.{mod_ref})

After gathering these rewrites, all the project's modules could be visited and searched for these package-level imports (i.e. the first element in each rewrite pair) and replaced with their direct imports/references (i.e. the second element in each rewrite pair).

@brandonwillard
Copy link
Member Author

@brandonwillard
Copy link
Member Author

brandonwillard commented Oct 8, 2020

Also, I've recently noticed that the config system—operating mostly from the file theano.configdefaults—introduces a considerable tangle of import dependencies.

For instance, the theano.config object is created when theano/__init__.py is loaded, and from there it attempts to do all sorts of config value validation, and this validation explicitly (and perhaps unnecessarily) requires that a large amount of actual class objects be loaded immediately (e.g. everything from theano.compile.mode, which itself requires other theano.config values to load, as well as all the linkers objects from theano.gof.*, etc.) This triggers further imports for modules with overly strict theano.config-dependent top-level conditions (e.g. here a class definition itself is dependent on the value of theano.config.cxx!)

If anything, these module-level dependencies on theano.config shouldn't be necessary and/or as pervasive as they currently are.

Furthermore, aren't there well established config-file libraries we can use instead of maintaining our own?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed important
Projects
None yet
Development

No branches or pull requests

2 participants