# Packaging

In this lesson we'll learn about
* Modules
* Packages
* Sub packages
* What directory structure to use for your packages
* How to write a setup.py file
* How to install packages
* Adding exectuables to your packages

## Modules

### What?

Packages and modules are used in python to facilitate modular programming. 

Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality. [Wikipedia](https://en.wikipedia.org/wiki/Modular_programming)

### Why?

This brings several advantages:

1. Simplicity: It's much easier to focus on one small part of a problem rather than the whole program at once. You just need to understand how the section you are looking at works rather than the entire code base.

2. Maintainability: When you break up a problem into many smaller problems, typically each piece of code is logically separated from the others. This means making changes to one module is less likely to impact the operations of the others. Meaning many people can work on a project without effecting each other, it's even possible that you have no idea how the rest of the application works, but can understand the module you are working on.

3. Reusability: Smaller pieces of code that do one (or a small number of) thing can easily be reused elsewhere. This allows you to reuse the code in the application, or even outside of the application in other solutions that are being developed.

4. Scoping: We can resuse variable and function names by using separate namespaces. This means that we have less chance of having name collisions in our application.

### How can we use them?

There are 3 different types of module that can be used in python, normally when creating your own, it will be the first. 

1. A pure python module
2. A C/C++ module that is compiled and loaded at run time. (e.g. numpy)
3. A built-in module that is part of the interpretter. (asyncio)

The implementation is abstracted from us when we use them in python. They are all brought in using the same keyword `import`.

To create a pure python module all we need to do is create a file with some python code in it and name it with the `.py` extension.

quotes.py

```python
single_quote = "Easy things make life harder, hard things make life easier"
multi_quote = ["Collect underpants", "?", "Profit"]

def foo():
    print("bar")
    
class Bar():
    pass
```

In the quotes module we have defined a string, a list, a function and a class.

We can then use these later (assuming your setup is correct, which we'll come to).

In [5]:

import quotes

print(quotes.single_quote)

print(quotes.multi_quote)

quotes.foo()

my_class = quotes.Bar()
print(my_class)


Easy things make life harder, hard things make life easier
['Collect underpants', '?', 'Profit']
bar
<quotes.Bar object at 0x7f6db82c09b0>


### sys.path

When you run the command

```python
import quotes
```

The interpreter will search for a file called `quotes.py` in

1. The current directory (or directory in which the script resides).
2. If set, a list of directories in the environment variable `PYTHONPATH`.
3. A list of directories configured at installation time for your python environment. Installation time includes when a new virtual environment is created.

The list of directories can be seen in python using the sys module

```python
import sys
print(sys.path)
```
```bash
['', '/Users/mrobinson/source/work/training/python/venv/lib/python37.zip', '/Users/mrobinson/source/work/training/python/venv/lib/python3.7', '/Users/mrobinson/source/work/training/python/venv/lib/python3.7/lib-dynload', '/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7', '/Users/mrobinson/source/work/training/python/venv/lib/python3.7/site-packages']
```

So the easiest way to get your module recognised is to put it in the same directory as the script you are running.

You *can* modify your sys.path, but it's generally not recommended.

### Using import (and namespaces)
#### Simplest form

There are many ways we can use import, the simplest is 
```python
import some_module
```

When importing in this fashion, the symbol table (the list of functions, variables, class) are not loaded into our namespace. This means to reference any of the symbols from the module we must prefix with the module name and a `.`, this is called *dot notation*.

```python
import some_module

some_module.some_function()
print(some_module.some_variable)
```

If you don't do this you will see a `NameError: name 'something' is not defined` error.

It is possible, though *not* recommended to load several modules on the same line by using commas

```python
import re, sys, requests, pandas
```

#### Using the from keyword
This allows us to import specific parts of a module and adds it directly to our namespace, if we go back to our quotes example.

In [8]:
from quotes import multi_quote
print(multi_quote)
print(single_quote)

['Collect underpants', '?', 'Profit']
Work is the curse of the drinking classes.


You can see here, we have not used the *dot notation* when referencing multi_quote. But also, we have not been able to print the single_quote as it's not been imported.

It is fine to import multiple things from a single module using commas while using from

```python
from my_lib import some_function, function_some, funct_some_oin
```

When importing modules this way, because we're adding to the global namespace, things can be overwritten. Take this next example.

In [9]:
single_quote = "Work is the curse of the drinking classes."

from quotes import single_quote

print(single_quote)

Easy things make life harder, hard things make life easier


This would not be the case if we had just `import quotes`.

In [10]:
single_quote = "Work is the curse of the drinking classes."

import quotes

print(single_quote)
print(quotes.single_quote)

Work is the curse of the drinking classes.
Easy things make life harder, hard things make life easier


You can also import everything from a module by using a `*`, however this is *very bad* idea. You can see from the above example that if we imported everything from a module, it would be very easy to overwrite something we already have defined. 

There are other reasons you shouldn't `from blah import *`, such as readability and traceability. 

#### Using the as keyword

There are two ways this can be used. In both ways you are importing somethings `as` another name, such as the name suggests. 

This allows you to change the name it is reference by in your namespace. 

This can be useful for packages with long names or if you already have something in the namespace with the same name. 

In [11]:
from quotes import single_quote as single, multi_quote as multi, Bar as Foo

print(f"{single} \n {multi}")
print(Foo())

Easy things make life harder, hard things make life easier 
 ['Collect underpants', '?', 'Profit']
<quotes.Bar object at 0x7f6db82c0dd8>


In [12]:
import quotes as things_someone_else_said

things_someone_else_said.foo()

bar


#### dir()

This is a very useful built in function that can show the current symbol table. It can be exectuted on an object by passing it in, or executed on the current namespace by running it with no arguments.

In [13]:
dir()

['Foo',
 'In',
 'Out',
 '_',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'get_ipython',
 'less_eggs',
 'module4',
 'multi',
 'multi_quote',
 'my_class',
 'package',
 'quit',
 'quotes',
 'single',
 'single_quote',
 'things_someone_else_said']

In [14]:
import quotes
dir(quotes)

['Bar',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'foo',
 'multi_quote',
 'single_quote']

#### Scripts

Any file that ends in a .py can also be a script. The problem with this is that when we import a module, all the python code inside that module is executed at run time.

So if we had a script like this:

```python
local_zealot = 'Wizard'

def main():
    print(f" We're off to see the {local_zealot}")
    
main()
```

and we wanted to import it in another application so we could reuse the `local_zealot` variable, we'd get this printed to screen once it was imported:

```
We're off to see the Wizard
```

You may have see the following code in places

```python
if __name__ == '__main__':
    main()
```

This is to prevent against code running when we don't want it to. So in this case, we could import the module to get the `local_zealot` variable in one app. But we could still run the module (script) to use it's orginal functionality. The finished script would look like this :

```python
local_zealot = 'Wizard'

def main():
    print(f" We're off to see the {local_zealot}")
    
if __name__ == '__main__':
    main()
```

This is of course a trivial example, but if you had a large set of functions and classes in the file you can see how this could be useful. 

Often this is used for unit testing, where the part that is run via the if statement, executes the tests. 

## Packages

Once the number of modules in project grows, it can become a bit messy if they're all in one directory. This is where packages come in. 

Packages are a way to organise modules and namespaces. For all intents and purposes, a package is a file system directory that contains a number of modules. 

They allow hierarchical structure using *dot notation*. If we have the following: 

```bash
package/
├── module1.py
└── module2.py
```

module1.py
```python
def eggs():
    print('[module1] eggs()')

class Egg:
    pass
```
module2.py
```python
def spam():
    print('[module2] spam()')

class Spam:
    pass
```

This means we can use the import in the same way we have before

In [15]:
import package.module1, package.module2

package.module1.eggs()
package.module2.spam()

[module1] eggs()
[module2] spam()


In [16]:
from package.module1 import eggs
eggs()

[module1] eggs()


In [17]:
from package.module2 import Spam as lunch

print(lunch)

<class 'package.module2.Spam'>


You can import the whole package, but this doesn't really do anything. This is because it *does not add the modules in the package to the local namespace*.

In [6]:
import package

package.module1.eggs()

AttributeError: module 'package' has no attribute 'module1'

There is a way around this though, which we'll talk about next. 

### __init__.py

This is a special file that you may have sometimes seen in a python project. When you import a package, this file is run before anything else happens, i.e. initialisation.

Let have a look at a new package, called imaginatively, new_package.

```bash
new_package/
├── __init__.py
├── module1.py
└── module2.py
```

The two modules are the same, but inside the `__init__.py` we have

```python
print(f"Running __init__.py from {__name__}")
import new_package.module1, new_package.module2
```


So now when we import it

In [18]:
import new_package

new_package.module1.eggs()

Running __init__.py from new_package
[module1] eggs()


You can see that it runs the code in `__init__`, meaning our two modules have been imported. 

Any arbitrary code can be placed inside this file, though it's generally not recommended to put too much in there. Definitely don't put any application logic in there.

Prior to python 3.3 it was a requirement to have a `__init__.py` in any package, this is no longer the case. 

Now we have the concept of [Implicit Namespace Packages](https://www.python.org/dev/peps/pep-0420/).

You can still include them, and of course if you need to do some actual initialisation you need to include it.

Most of the time I still include them anyway.

### Subpackages

We can create a nested structure to any depth using sub packages. If we look at another package, this time with four modules in it:

```bash
final_package/
├── sub_package1
│   ├── module1.py
│   └── module2.py
└── sub_package2
    ├── module3.py
    └── module4.py
```

we can import in the same manner as before, using the *dot notation*

In [19]:
import final_package.sub_package2.module3
from final_package.sub_package2.moule4 import CanOSpam

final_package.sub_package2.module3.more_eggs()
print(CanOSpam())

[module3] more_eggs()
<final_package.sub_package2.module4.CanOSpam object at 0x7f6db8272668>


#### Relative and absolute imports

If you need to reference something in module 1 from module 3 you can use an absolute import

module3.py
```python
from final_package.sub_package1.module1 import eggs
def more_eggs():
    print('[module3] more_eggs()')

class BigEgg:
    pass

def less_eggs():
    eggs()
```

In [20]:
from final_package.sub_package2.module3 import less_eggs

less_eggs()

[module1] eggs()


You can also use relative imports to do the same thing, this time we reference module 2 from module 4

module4.py
```python
from ..sub_package1.module2 import spam
def more_spam():
    print('[module4] more_spam()')

class CanOSpam:
    pass

def one_spam():
    spam()
```

In [22]:
from final_package.sub_package2 import module4

module4.one_spam()

[module2] spam()


### Creating your own package


So, the first thing to do when you want to make your own package is more likely than not come up with a name for it. 

This is not in the interests of vanity, but you're going to have to name the repository after it and some of the directories, so you should probably come up with it first. 

There are a few things to consider though, python module/package names should generally follow the following constraints:

* All lowercase
* Unique on pypi, even if you don’t want to make your package publicly available (you might want to specify it privately as a dependency later)
* Unique on internal repos (artifactory)
* Underscore-separated or no word separators at all (don’t use hyphens)


The next thing your probably going to do is to set out your directory structure. Here is an example of the CMDB library we looked at in a previous lesson:

```bash
essentials_cmdb
├── CHANGELOG.md
├── README.md
├── credentials.example
├── essentials_cmdb
│   ├── __init__.py
│   ├── core.py
│   ├── genders.py
│   ├── hosts.py
│   └── i2csshrc.py
```
```bash
├── other
│   ├── make_defs.sh
│   └── resources.py
├── releaseNotes.sh
├── requirements.txt
├── setup.py
├── tests
│   └── test_core.py
└── tools
    ├── README.md
    ├── change_roles.py
    ├── cmdb-gen-localfiles.py
    ├── cmdb-tk-requirements.txt
    └── cmdb-tk.py
```

This example is probably overkill, but lets have a quick look at it

* The top level `essentials_cmdb` is the root of our git repo
* It has a subdir `essentials_cmdb` which is the actual python module.
* The README is in the top level
* The requirements are in the top level
* It has separate tools, tests and other directories.

So lets say we wanted the least amount of setup we could get away with. We only have one file that we want to publish. We could have

```bash
my_project
├── my_project
│   ├── __init__.py
│   └── cmdb.py
└── setup.py
```

Now we have our layout, we need a way to tell python how to install the package. This is done using the setup.py. Here's a simple one we could use to install the above package:

```python
from setuptools import setup

setup(name='my_project',
      version='0.1',
      description='The best project ever',
      url='http://github.com/fakeuser/my_project',
      author='Mr. Bean',
      author_email='doc@who.com',
      license='MIT',
      packages=['my_project'],
      zip_safe=False)
```

Now you should be able to install it to your system with

```bash
pip install .
```

or as I mentioned in the last lesson, you can install it in editable mode. (creates symlinks so you can still edit the installed version)

```bash
pip install -e .
```

You should probaly be installing these in a virtual environment at this stage. You're going to iterate over the install of this package a lot. If it's done inside a virtual env, you can easily delete the dir and not leave a lot of mess lying around you OS. 