<figure>
   <IMG SRC="https://mamba-python.nl/images/logo_basis.png" WIDTH=125 ALIGN="right">
</figure>

# Modules and Packages
_by Onno Ebbens_
<hr>

This notebook gives a brief introduction into building Python modules and packages. 

### Content<a id="top"></a>
1. [Packages & modules](#1)
2. [Module structure](#structuur_mod)   
3. [Package structure](#structuur_pack)
4. [Package installation](#installatie)
5. [Answers](#Antwoorden)

## 1. [Packages & modules](#top)<a id="1"></a>

The terms packages and modules are often used interchangeably, however there is an important difference:
- A module ia file with Python code and a .py extension.
- A package is a directory with at least an `__init__.py` file and usually other .py files and/or directories. A Python package consist of one or more Pyhton modules.

Both packagas and modules can be imported into a python script using the `import` statement.

### import module
We import the module `example_module.py`. This file contains the code:
```
01    def my_add(argument1, argument2):
02        """
03        adds two input arguments.
04        
05        Parameters
06        ----------
07        argument1 : int, float, str
08            input argument 1
09        argument2 : int, float, str
10            input arguement 2
11            
12        Returns
13        -------
14        results : int, float or str
15            the two added input arguments   
16        """
17        result = argument1 + argument2
18        return result
```

In [5]:
import example_module

Now that we've imported the module we can use the function defined in the module.

In [6]:
example_module.my_add(5,10)

15

If you import a module using `import <module name>` Python will look for a file with the name `<module name>.py`. Python will always look in a number of predefined directories. These directories are defined in `sys.path`. The file `example_module.py` is in one of the directories listed in `sys.path` so Python will be able to import it.

You can easily list the directories in `sys.path`.

In [None]:
import sys
sys.path



**Attention!**<br>
If you import a module Python will look in the `sys.path` directories one by one. The moment it finds the .py file with the requested name Python will stop looking and import the module. If you have multiple modules with the same name it can be hard to know which module from which directory was imported. Therefore it is advised to choose a unique name for you module. You can also find the path of the module using the `__file__` attribute of the module.

In [None]:
print(f'module path is -> {example_module.__file__}')

### Import package
Importing a package is very similar to importing a module. When you import a package using `import <package name>` Python will look for a directory named `<package name>` with a module `__init__.py` inside. Similar to modules, Python will look for the package in the directories listed in `sys.path`.

In the cell below we import a package named `somepackage`. Because a folder name `somepackage` is not available in any of the directories listed in `sys.path` we first add the directory (`example_package`) using `sys.path.append('example_package')`. After we've imported the package we call the `my_add` function defined in the package.

In [7]:
sys.path.append('example_package')
import somepackage

In [8]:
somepackage.my_add(5, 10)

15

## 2. [Module structure](#top)<a id="structuur_mod"></a>

You can define variables, functions and classes in a Python module. The module `example_module` that we've imported earlier only contains a single function. When you create a module it is imported to think about the module structure because there are many ways to structure the code inside a module. There are also some conventions regarding the module structure:
- Put all the import statements at the top of the .py file. This way it is clear which packages, classes, functions and submodules will be used.
- Create a [docstring](https://www.python.org/dev/peps/pep-0257) for each function and class inside the module.
- The name of module should preferably be [short and lower case](https://www.python.org/dev/peps/pep-0008/#package-and-module-names).

#### Exercise 1 <a name="opdr1"></a>

Create a module with a function that uses the `numpy.mean` function to determine the average of every column in this array:

`[[1. 5. 8. 9.]
  [9. 4. 3. 1.]]`
  
Make sure you get this result:

`[5.  4.5 5.5 5. ]`

Import your module and call the function. You shoul create the module in a text editor (e.g. notepad) outside of Jupyter Notebook.

In [9]:
import numpy as np
# use this array to test your function.
arr = np.array([[1., 5., 8., 9.],[9., 4., 3., 1.]])

# import module and call your function

<a href="#antw1">Answer exercise 1</a>

When a module is imported all code inside the module is ran. Therefore a module should not contain parts of a script but only variable, function and class defenitions. When there are still scripts inside the module they are evaluated when importing the module.

Below we import the module `example_module2`, this module still has a bit of code (`print(my_add(2,4))`) inside. When we import the module the code is evaluated and you can see the integer 6 is printed.

In [10]:
import example_module2

6


#### Bonus exercise 1 <a name="bonus1"></a>

For this exercise you need to do some research for yourself. You don't need to answer this exercise to understand the rest of this notebook. Nevertheless, it can be useful to gain some background knowledge.

Modify the file `example_module2.py` in such a way that `print(my_add(2,4))` is only evaluated when this module is ran as the main module and not when it is imported. Use the code `if __name__ == '__main__':` and [this stackoverflow question](https://stackoverflow.com/questions/419163/what-does-if-name-main-do).

Tip: If you try to import a module that has already been imported, Python will not import it again but reuse the previously imported module from memory. The code in the module will not be evaluated again. If you modify the module you should restart the kernel before importing again to see if the modification was done correctly.

<a href="#antwbonus1">Answer bonus exercise 1</a>

#### Dependencies

In the module that you've created for the first exercise you call a function from the `numpy` package. The module has become dependent on the `numpy` package so we say the `numpy` is one of the dependencies.

When you create modules and packages it is important to think about the dependencies because:
- Everyone that uses your module/package should install the dependencies. Some packages are hard to install. If your module/pakcage has a dependency on this package your module/package becomes hard to install as well.
- Packages are continuously modified and updated. Changes to your dependencies could break your module/package. So the more dependecies you have the more time it takes to maintain your package with the newest versions.

## 3. [Package structure](#top)<a id="structuur_pack"></a>

The structure of a Python package determines how you can call functions and classes from within the package. The package `somepackage` that we've imported earlier has the following structure:

```
somepackage/
    __init__.py
    add.py
    shout.py
    version.py
```

#### `__init__.py`
The `__init__.py` file in the "somepackage" directory is the constructor of the package. This file lists all the modules, functions, classes and variables that are part of the package. Our `__init__.py` file contains this code:

```
01    from .version import __version__
02    from .add import my_add
03    from . import shout
```

This implies that the package contains a variable (`__version__`), function (`my_add`) and a module (`shout`). When we import something in a package we use the dot (`.`) notation to indicate that for this import we only want to look for modules/packages within the same directory as the `__init__.py` file and not within all the directories listed in `sys.path`. Using the dot (`.`) notation is recommended because it is more explicit which modules can be imported. Below we explain line by line what happens in the `__init__.py` file.

**`01    from .version import __version__`**

This line indicates that we want to import the `__version__` variable from the `version` module. This variable becomes an attribute of the package and can be accessed using  `somepackage.__version__`. This is the standard way to define a package version.

In [11]:
print(f'somepackage version is {somepackage.__version__}')

somepackage version is 1.2.3


In [12]:
# Most of the Python packages use the __version__ attribute
import numpy as np
print(f'numpy version is {np.__version__}')
import re
print(f're version is {re.__version__}')

numpy version is 1.22.3
re version is 2.2.1


**`02    from .add import my_add`**

We import the `my_add` function from the `add` module. After we can call the function as part of `somepackage`.

In [13]:
somepackage.my_add(5,10)

15

**`03    from . import shout`**

We import the `shout` module. The `shout` module becomes a submodule of the `somepackage`. The function `should_and_repeat` which is defined in the `shout` module can be called using the code below.

In [15]:
somepackage.shout.shout_and_repeat("what a lovely package it is ")

'WHAT A LOVELY PACKAGE IT IS WHAT A LOVELY PACKAGE IT IS '

#### Exercise 2 <a name="opdr2"></a>

The `somepackage` directory contains a file named `visualise.py`. In this file we define the function `make_wordcloud`. Add the `visualise` module to `somepackage` so you can create a wordcloud with the code below.

Tip: If you modify a package and want to import it again you should restart the kernel. When you forget to do this the old version of the package remains in memory and the modified code is not available.

In [None]:
%matplotlib inline
a = somepackage.shout.shout_and_repeat("what a lovely package it is ")
tst = somepackage.visualise.make_wordcloud(a);

<a href="#antw2">Answer exercise 2</a>

#### Exercise 3<a name="opdr3"></a>
Do the same as for exercise 2, only now add only the `make_wordcloud` function to the `somepakcage` instead of the full `visualise` module. You should be able to get the same results using the code below.

In [None]:
%matplotlib inline
a = somepackage.shout.shout_and_repeat("what a lovely package it is ")
tst = somepackage.make_wordcloud(a);

<a href="#antw3">Answer exercise 3</a>

#### Functions or modules?

In the previous exercises you've seen that there are many ways to add a function to a package. You can add the whole module as a submodule or only add a certain function. Which way is the best depends on how you want to use the package.

When you create a package with the goal of performing a single operation it is useful to explicitly import the function in the `__init__.py`. The `wordcloud` package is an example of this.

There are also many package with a set of tools. An example of that is the `numpy` package. With such a package people often choose to add a whole set of functions and submodule in the `__init__.py` file.

In the end it is up to you to create a clear structure for your package. 

## 4. [Installation](#top)<a id="installatie"></a>

Previously we imported `somepackage` without ever have installed it. When we use a package more often, from multiple scripts with different people it is useful to install a package. Installing a package implies that:
1. The package is added to a central directory with package. This directory should be part of the directories listed in `sys.path` so that we can later import the package without modifying `sys.path`.
2. The dependencies of the package are installed as well. It is possible to specify a certain version of a dependent package that should be installed.

Installation of a package is often done using `pip` (Package installer for Python). A package that can be installed using pip should contain a `setup.py` file. In the directory `example_package` we show an example of a `setup.py` file. When we want to install `somepackage` using the `setup.py` file we navigate in (anaconda) prompt to the directory `example_package` and type `pip install -e .`. If everything was done correctly we get this output in the (anaconda) prompt:

```
Obtaining file:///C:/Users/onno_/02_git_repos/course-material/practical_examples/08_modules_and_packages/example_package
Requirement already satisfied: matplotlib>=3.0 in c:\anaconda3\lib\site-packages (from somepackage==1.2.3) (3.1.3)
Requirement already satisfied: wordcloud>=1.8.1 in c:\anaconda3\lib\site-packages (from somepackage==1.2.3) (1.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\anaconda3\lib\site-packages (from matplotlib>=3.0->somepackage==1.2.3) (2.4.6)
Requirement already satisfied: python-dateutil>=2.1 in c:\anaconda3\lib\site-packages (from matplotlib>=3.0->somepackage==1.2.3) (2.8.1)
Requirement already satisfied: numpy>=1.11 in c:\anaconda3\lib\site-packages (from matplotlib>=3.0->somepackage==1.2.3) (1.18.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\anaconda3\lib\site-packages (from matplotlib>=3.0->somepackage==1.2.3) (1.1.0)
Requirement already satisfied: cycler>=0.10 in c:\anaconda3\lib\site-packages (from matplotlib>=3.0->somepackage==1.2.3) (0.10.0)
Requirement already satisfied: pillow in c:\anaconda3\lib\site-packages (from wordcloud>=1.8.1->somepackage==1.2.3) (7.0.0)
Requirement already satisfied: six>=1.5 in c:\anaconda3\lib\site-packages (from python-dateutil>=2.1->matplotlib>=3.0->somepackage==1.2.3) (1.14.0)
Requirement already satisfied: setuptools in c:\anaconda3\lib\site-packages (from kiwisolver>=1.0.1->matplotlib>=3.0->somepackage==1.2.3) (45.2.0.post20200210)
Installing collected packages: somepackage
  Running setup.py develop for somepackage
Successfully installed somepackage
```

The output above can be explained when we have a closer look at the `setup.py` file. In line 27 we define two dependencies `matplotlib` and `wordcloud`. When installing `somepackage` pip checks first if these dependencies are already installed. We see this in the output of (anaconda) prompt. In our case both package were already installed so we get the notification `Requirement already satisfied: matplotlib...` & `Requirement already satisfied: wordcloud...`.

After this we see a number of other checks for packages. These packages are dependencies of `matplotlib` and `wordcloud` (or the dependencies of the dependencies of `matplotlib`). So when we install a package all dependencies are checked and if necesary installed. In our case all the dependencies were already installed so we repeatedly get the message `Requirement already satisfied: ...`.

The last package that will be installed is `somepackage` itself. When everything went smoothly we get:

```
Installing collected packages: somepackage
  Running setup.py develop for somepackage
Successfully installed somepackage
```

Now the package is installed and we can import the package from every Python script or Jupyter Notebook.

#### Exercise 4 <a name="opdr4"></a>
Create a new module with the function definition of `check_sentinment` shown below. Add this function to `somepackage`. Add the required dependencies to `setup.py`. Update the package version number to `1.2.4`, install the new package and check if the required dependencies are installed correctly. Finally check if you can run the code below succesfully.

In [16]:
from textblob import TextBlob

def check_sentiment(text):
    '''
    checks the polarity and subjectivity of a message,
    a polarity > 0 indicates a positive message, 
    a polirity < 0 indicates a negative message
    
    Parameters
    ----------
    text : str
        text to analyse
        
    Returns
    -------
    textblob.en.sentiments.Sentiment
        sentiment analysis of text
    '''
    
    testimonial = TextBlob(text)
    return testimonial.sentiment

In [17]:
# code to check if the package is modified correctly
import somepackage
print(somepackage.check_sentiment("This package is amazing!"))
print(somepackage.check_sentiment("This package is awful!"))

AttributeError: module 'somepackage' has no attribute 'check_sentiment'

<a href="#antw4">Answer exercise 4</a>

## [Answers](#top)<a id="Antwoorden"></a>

#### <a href="#opdr1">Answer exercise 1</a> <a name="antw1"></a>

Create a text file with a .py extension, e.g. `np_func.py`. Make sure this file is in the same directory as this notebook. Create a function in this file that calculates the mean of a column in an array, e.g.:

```
01    import numpy as np
02    
03    def column_mean(arr):
04    """ alculates the mean of a column in a 2d array
05    
06
07    Parameters
08    ----------
09    arr : np.array
10        2d numpy array.
11
12    Returns
13    -------
14    mean_col : np.array
15        1d numpy array with the mean for each column.
16
17    """
18    
19    mean_col = np.mean(arr, axis=0)
20    
21    return mean_col
```

Now you can import the module in this Jupyter Notebook and call the function.

In [18]:
import numpy as np
# use this array to test.
arr = np.array([[1., 5., 8., 9.],[9., 4., 3., 1.]])

# importeer module and call function
import numpy_func
numpy_func.column_mean(arr)

ModuleNotFoundError: No module named 'numpy_func'

#### <a href="#bonus1">Answer bonus exercise 1</a> <a name="antwbonus1"></a>



The file `example_module2.py` should look like this:
    
```
01    def my_add(argument1, argument2):
02        """
03        adds two input arguments.
04        
05        Parameters
06        ----------
07        argument1 : int, float, str
08            input argument 1
09        argument2 : int, float, str
10            input arguement 2
11            
12        Returns
13        -------
14        results : int, float or str
15            the two added input arguments   
16        """
17        result = argument1 + argument2
18        return result
19    
20    if __name__ == '__main__':
21        print(my_add(2,4))
```

After you've modified `example_module2.py`, restarted the kernel and imported the module again it won't print the number 6 anymore. When you run the [example_module2.py](./example_module2.py) file seperately it will print the number 6. If you don't know how to run the .py file seperately please have a look at this [stakoverflow post](https://stackoverflow.com/questions/39995380/how-to-use-anaconda-python-to-execute-a-py-file). 

The `if __name__ == '__main__':` code is used often to test functions in a module. If you run the module seperately all functions will be run, if you import the functions they are only defined. For simple modules this is often a good solution. For more complex modules it is more usefull to seperate code and tests.

#### <a href="#opdr2">Answer exercise  2</a> <a name="antw2"></a>

Add the code `from . import visualise` to the file `__init__.py`.

The file will look like this:

```
01    from .version import __version__
02    from .add import my_add
03    from . import shout
04    from . import visualise
```

Restart the kernel en import `somepackage`. You can now run the code from exercise 2.

In [None]:
%matplotlib inline
a = somepackage.shout.shout_and_repeat("what a lovely package it is "))
tst = somepackage.visualise.make_wordcloud(a);

#### <a href="#opdr3">Answer exercise 3</a> <a name="antw3"></a>

Add the code `from .visualise import make_wordcloud` to the file `__init__.py`.

The file will look like this:

```
01    from .version import __version__
02    from .add import my_add
03    from . import shout
04    from .visualise import make_wordcloud
```

Restart the kernel en import `somepackage`. You can now run the code from exercise 3.

In [None]:
%matplotlib inline
a = somepackage.shout.shout_and_repeat("what a lovely package it is "))
tst = somepackage.make_wordcloud(a);

#### <a href="#opdr4">Answer exercise 4</a> <a name="antw4"></a>

Use the following steps (in random order):
- Create a new module, e.g. `text_analysis.py`. Make sure this file is in the `somepackage` directory. Copy the function definition to this file. The file will look like this:
    ```
    from textblob import TextBlob

    def check_sentiment(text):
        '''
        checks the polarity and subjectivity of a message,
        a polarity > 0 indicates a positive message, 
        a polirity < 0 indicates a negative message

        Parameters
        ----------
        text : str
            text to analyse

        Returns
        -------
        textblob.en.sentiments.Sentiment
            sentiment analysis of text
        '''

        testimonial = TextBlob(text)
        return testimonial.sentiment
    ```
- Add the line `from .text_analysis import check_sentiment` to the `__init__.py` file. Replace `text_analysis` with the name of your module. The file will look like this:
    ```

    from .version import __version__
    from .add import my_add
    from . import shout
    from .visualise import make_wordcloud
    from .text_analysis import check_sentiment
    ```
- Modify the `setup.py` file. Add an extra dependency to line 27: `'textblob>=0.15.3'`. The `setup.py` file will look like this:

    ```
    from setuptools import setup
    import os
    import sys

    _here = os.path.abspath(os.path.dirname(__file__))

    if sys.version_info[0] < 3:
        with open(os.path.join(_here, 'README.rst')) as f:
            long_description = f.read()
    else:
        with open(os.path.join(_here, 'README.rst'), encoding='utf-8') as f:
            long_description = f.read()

    version = {}
    with open(os.path.join(_here, 'somepackage', 'version.py')) as f:
        exec(f.read(), version)

    setup(
        name='somepackage',
        version=version['__version__'],
        description=('Show how to structure a Python project.'),
        long_description=long_description,
        author='Onno Ebbens',
        author_email='onno.ebbens@mamba-python.nl',
        license='MPL-2.0',
        packages=['somepackage'],
        install_requires=['matplotlib>=3.0','wordcloud>=1.8.1', 'textblob>=0.15.3'],
        include_package_data=True,
        classifiers=[
            'Development Status :: 5 - Production/Stable',
            'Intended Audience :: Science/Research',
            'Programming Language :: Python :: 2.7',
            'Programming Language :: Python :: 3.6'],
        )
    ```
- Modify the `version.py` file. Increase the version number from 1.2.3 to 1.2.4 the file will look like this:
    ```
    __version__ = '1.2.4'
    ```
    
Finally take these steps (in this order):
1. Navigate in (anaconda) prompt to the directory "example_package". Install `somepackage` again using `pip install -e .`. Check if during installation the `textblob` will be checked. 
2. Restart the kernel
3. Run the code from the exercise.
4. Celebrate your succes!

In [None]:
# code to check if the modified package works
import somepackage
print(somepackage.check_sentiment("This package is amazing!"))
print(somepackage.check_sentiment("This package is awful!"))

## Acknowledgement

the following sources were used to create this notebook:
- https://github.com/bast/somepackage
