# Lecture 26 Module Packages

This lecture explains how to import directories containing multiple sub-directories with Python files. Such directories are referred to as module packages. 

We will cover the following topics:
- [Package Import Basics](#section1)
- [Package Relative Imports](#section2)

## Package Import Basics <a id="section1"/>

When we create programs in Python, it is helpful to organize the individual module files related to an application into sub-directories (similar to organizing our files for different courses into sub-directories in My Documents). A directory of Python code is said to be a ***package*** or ***modules package***. Importing a directory is known as a package import.

For example, consider the `MyMainPackage` directory that is located in the same directory as this jupyter notebook. 
```
MyMainPackage
    ├── __init__.py
    ├── main_script
    ├── MySubPackage
    │   ├── __init__.py
    │   ├── sub_script.py
```

To import the module file `sub_script.py`  which is located inside the directory `MySubPackage`, we can use the dotted syntax in the following cell. In effect, this turns the directory `MyMainPackage` into a Python namespace, which has attributes corresponding to the sub-directories and module files that the directory contains.

<img style="float: left; height:180px;" src="images/pic_1_1.jpg">

In [None]:
import MyMainPackage.MySubPackage.sub_script

As we learned previously, `import` fetches a module as a whole, and the names (variables and functions) that are defined in the module `sub_script.py` become attributes of the imported object.

In [None]:
MyMainPackage.MySubPackage.sub_script.X

In [None]:
MyMainPackage.MySubPackage.sub_script.sub_report()

The dotted path in the cell corresponds to the path through the directory hierarchy that leads to the module file `sub_script.py`, i.e., `MyMainPackage\MySubPackage\sub_script.py`.

On the other hand, note that a syntax with backward slashes does not work with the `import` statement.

In [None]:
import MyMainPackage\MySubPackage\sub_script

Similarly to `import` and `from` statements with modules, to fetch specific names from the `sub_script.py` module, we can use the `from` statement with packages as well.

In [None]:
from MyMainPackage.MySubPackage.sub_script import X

In [None]:
X

In [None]:
from MyMainPackage.MySubPackage.sub_script import sub_report

In [None]:
sub_report()

### Python Path

If the directory `MyMainPackage` is not in the current working directory, then it may need to be added to the Python search path. To do that, either add the full path to the directory to the PYHTONPATH variable (as we explained earlier, by setting the Environment Variables on Windows systems), or the path to the directory can be added to a `.pth` file. Note that if the package is a standard library directory of built-in functions (e.g., `random`, `time`, `sys`, `os`), or if it is located in the site-packages directory (where third-party libraries are installed), it will be automatically found by Python, and it does not need to be added to the Python search path.

Alternatively, the path to the directory can be manually added using `sys.path` (that is, the `path` attribute of the standard library module `sys`). For instance, I can examine the `sys.path` on my computer, as shown in the following cell. Since the `sys.path` is just a list of directories, we can manually add the path of the current working directory, by using the `append` to list method.

In [None]:
import sys
sys.path

In [None]:
sys.path.append('C:\\Users\\Alex\\Desktop\\python\\Lecture 26 Module Packages')

In [None]:
sys.path
# The appended path is listed last

Notice now that the directory `MyMainPackage` is now listed in the `sys.path`. However, this modified `sys.path` is temporary and it is valid only for the duration of the current session; the path is refreshed every time Jupyter Notbook is restarted, or the notebook kernel is shut down. On the other hand, the path configuration in `PYTHONPATH`  is permanent, and it lives after the current session is terminated.

### Package `__init__.py` Files

When using package imports, there is one more constraint that we need to follow: each directory named within the path of a package import statement must contain a file named `__init__.py`. Otherwise, the package import will fail. 

In the example we have been using, note that both `MyMainPackage` and `MySubPacakge` directories contain a file called `__init__.py`. The `__init__.py` names are special, as they declare that a directory is a Python package.

The `__init__.py` files are very often completely empty, and don't contain any code. But, they can also contain Python code, just like other module files. In our `MyMainPackage` example, the `__init__.py` files are empty.  

The `__init__.py`files are run automatically the first time a Python program imports a directory. Because of that, `__init__.py` files can be used to store code to initialize the state required by files in a package (e.g., to create required data files, open connections to databases, and so on).

On a separate note, don’t confuse `__init__.py` files in module packages with the `__init__()` class constructor method that we used before for specifying attributes of class instances. Both have initialization roles, but they are otherwise very different.

### Module Packages Reloading

Just like module files, an already imported directory needs to be passed to `reload` to force re-execution of the code. As shown, `reload` accepts a dotted path name to reload nested directories and files. Also, `reload` returns the module object in the displayed output of the cell.

In [None]:
# Repeated import statements do not produce any output
import MyMainPackage.MySubPackage.sub_script

In [None]:
from imp import reload
reload(MyMainPackage.MySubPackage.sub_script)

Once imported, `sub-script` becomes a module object nested in the object `MySubPackage`, which in turn is nested in the object `MyMainPackage`.

Similarly, `MySubPackage` is a module object that is nested in the object `MyMainPackage`.

In [None]:
MyMainPackage.MySubPackage

### Difference Between `from` and `import` with Packages

The `import` statement can be somewhat inconvenient to use with packages, because we may have to retype the paths to the files and sub-directories frequently in our program. In our example, we must retype and rerun the full path from `MyMainPackage` each time we want to reach the names in the `sub_script.py` file. Otherwise, we will get an error.

In [None]:
sub_script.X

In [None]:
MySubPackage.sub_script.X

In [None]:
MyMainPackage.MySubPackage.sub_script.X

In [None]:
# Use X in our code
print(MyMainPackage.MySubPackage.sub_script.X + 27)
print(MyMainPackage.MySubPackage.sub_script.X % 2)
print((MyMainPackage.MySubPackage.sub_script.X -13)/2)

It is often more convenient to use the `from` statement with packages to avoid retyping the paths at each access. 

In [None]:
from MyMainPackage.MySubPackage.sub_script import X
X

In [None]:
print(X + 27)
print(X % 2)
print((X - 13)/2)

In addition, if we ever restructure or rename the directory tree, the `from` statement requires just one path update in the code, whereas the `import` statement may require updates in many lines in the code. 

However, `import` can be advantageous if there are two modules with the same name that are located in different directories, and are used in a same program. With the `from` statement, we can reach only one of the two modules at a time.

For example, in our `MyMainPackage`, there is a function `sub_report` in both the `main_script` and `sub-script`. If we use `from` statement, the name `sub_report` will change depending on whether it is imported from the `main_script` or the `sub_script`.

<img style="float: left; height:180px;" src="images/pic_1_1.jpg">

<img style="float: left; height:280px;" src="images/pic_1_2.jpg">

In [None]:
from MyMainPackage.MySubPackage.sub_script import sub_report

In [None]:
sub_report()

In [None]:
from MyMainPackage.main_script import sub_report

In [None]:
# Name collision with the sub_report name used in the cell above
sub_report()

But, with the `import` statement, we can use either of the two functions `sub_report`, because their names will involve their full path, and this way, the names will not clash. The only inconvenience is that we need to type the full paths to the two functions.

In [None]:
import MyMainPackage.MySubPackage.sub_script
MyMainPackage.MySubPackage.sub_script.sub_report()

In [None]:
import MyMainPackage.main_script
MyMainPackage.main_script.sub_report()

Another alternative is to use the `as` extension, which will create unique synonyms for the names of the two functions. As we mentioned before, this extension is commonly used to provide short synonyms for longer names, and to avoid name clashes when we are already using a name in a script that would otherwise be overwritten by a regular `import` statement.

In [None]:
from MyMainPackage.MySubPackage.sub_script import sub_report as sub_sub_report
sub_sub_report()

In [None]:
from MyMainPackage.main_script import sub_report as main_sub_report
main_sub_report()

## Package Relative Imports <a id="section2"/>

To illustrate package relative imports in Python we will use the `MyRelativeImportPackage` which is similar to the `MyMainPackage` and contains several simple files.

```
MyRelativeImportPackage
    ├── __init__.py
    ├── relative_import_script_1
    ├── relative_import_script_2
    ├── relative_import_script_5
    ├── relative_import_script_6
    ├── script_1
    ├── script_2
    ├── script_3
    ├── script_4
    ├── MySubPackage
    │   ├── __init__.py
    │   ├── relative_import_script_3
    │   ├── relative_import_script_4
    │   ├── sub_script
```

When modules within a package need to import other names from other modules in the same package, it is still possible to use the full path syntax for importing, as we did in the above section. This is called an ***absolute import***.

For instance, the `relative_import_script_1.py` in the first line imports `script_1` by using the full name of the directory (i.e., `from MyRelativeImportPackage import script_1`).

<img style="float: left; height:120px;" src="images/pic_2_1.jpg">

<img style="float: left; height:120px;" src="images/pic_2_7.jpg">

In [None]:
import MyRelativeImportPackage.relative_import_script_1

However, package files can also make use of a special syntax to simplify import statements within the same package. Instead of directly using the full path to the directory, Python allows to use a leading dot `.` to refer to the current directory in the package. 

Therefore, instead of using `from MyRelativeImportPackage import script_1`, we can use `from . import script_1`. This is implemented in the `relative_import_script_2.py` to import `script_2`.

This syntax is referred to as a ***relative import*** because the path to the module to be imported is related to the current directory in which the module that imports is located.

The convenience of using relative imports is that we don't need to write the name or the path of the current directory.

<img style="float: left; height:120px;" src="images/pic_2_2.jpg">

<img style="float: left; height:120px;" src="images/pic_2_8.jpg">

In [None]:
import MyRelativeImportPackage.relative_import_script_2

One more example is presented in the next cell, where the module `relative_import_script_3.py` is located in the directory `MySubPackage` and it imports the module `sub_script` which is located in the same directory by using the `.` syntax.

<img style="float: left; height:120px;" src="images/pic_2_3.jpg">

<img style="float: left; height:120px;" src="images/pic_2_10.jpg">

In [None]:
import MyRelativeImportPackage.MySubPackage.relative_import_script_3

If we use two dots syntax as in `..`, then a module can import another module that is located in its parent directory of the current package (i.e., the directory above). For example, the `relative_import_script_4.py` is located in the `MySubPackage` directory, and it uses `from .. import script_3` to import the `script_3` module that is located in the parent directory of `MySubPackage`, that is, `MyMainPacakage`.

<img style="float: left; height:120px;" src="images/pic_2_4.jpg">

<img style="float: left; height:120px;" src="images/pic_2_9.jpg">

In [None]:
import MyRelativeImportPackage.MySubPackage.relative_import_script_4

On the other hand, if we tried to use only `import script_3` instead of `from . import script_3`, this will fail. We must use the `from` dotted syntax to import modules located in the same package. This is illustrated in the example in the following cell.

<img style="float: left; height:120px;" src="images/pic_2_5.jpg">

In [None]:
import MyRelativeImportPackage.relative_import_script_5

Another way to use the relative imports is shown in the `relative_import_script_6.py` module, where the syntax `from .script_4 import X` is used to import the name `X` from the `script_4` module which is located in the same directory as the importer module. This way, we can import specific names from modules in the same package.

<img style="float: left; height:120px;" src="images/pic_2_6.jpg">

<img style="float: left; height:120px;" src="images/pic_2_11.jpg">

In [None]:
import MyRelativeImportPackage.relative_import_script_6

Absolute imports are often preferred because they are straightforward, and it is easy to tell exactly where the imported module or name is located, just by looking at the statement. But, they require more typing and writing full names and paths in the code. 

One clear advantage of relative imports is that they are quite succinct, and they can turn a very long import statement into a simple and short statement. Relative imports can be messy, particularly for projects where the organization of the directories is likely to change. Relative imports are also not as readable as absolute ones, and it is not easy to tell the location of the imported names.