# SLU14 - Modules and Packages 

In this notebook we will be covering the following:   
 
- Modules   
- Packages

## 1 Modules Definition

A software project can have tons of components. Some of those components might depend on each other to execute tasks. 
When projects start to get big, if we don't organize our code well, it can get very hard to debug, maintain and scale. A task that would take a day can easily take a week if the code is not well structured, wasting a lot of time and money. 
To avoid such situations, we should plan our project ahead. It should be organized by component. The components should be independent from each other and have no duplicated code. 

One way of organizing our code is to divide it in modules. We can think on a module as a "sub-program".  Each module should be responsible for one function of the software. And modules can depend on each other to execute a bigger task. 

Think about this analogy. We want to build a circuit that is able to turn a LED on and off. For that, we need 4 components:   
- __Battery__, our source of electricity;   
- __Resistor__, so that our LED doesn't get burned; believe me, we need that;
- __Switch__,  to turn the light on and off;
- __LED__, to see the light;

<img src="data/circuit.png" width="300" height="150" >

All these pieces are responsible for one distinct function, but all of them combined enable us to achieve a final goal, which is to create light!

<img src="data/light.jpg" width="300" height="150" >

With this, we can think about components of a circuit as distinct modules that can be used and combined with other components/modules, in a circuit/software, to achieve a goal.

## 2 Packages Definition

A package is a set of modules, dedicated to a single functionality, that can be installed and used in other software.    

Imagine that we want to produce battery LED lamps that looks like the following: 

<img src="data/lamp.jpg" width="300" height="150" >

In order to produce it, we will need the following elements:   

__Electrical components__    

- Battery;
- Switch;
- LED;
- Resistor;

__Raw Materials__   

- Metal;
- Plastic; 

We have multiple options here. We can make our elements from scratch or we can buy them from providers.

It would be much more viable if we go to one provider that is an expert in developing `electrical components` to buy all the elements, ready to use, that we need to build the electric system: `batteries`, `switches`, `LEDs` and `resistors`. And go to a different provider to buy the `raw materials` needed to support the electric system: `Metal` and `Plastic`. 

Think of the set of tasks that we need to do in order to build our `lamp`, as a piece of software that we are  developing. In that sense, `electrical components` and `raw materials` are the `packages` that we need to install, in our environment, in order to use the set of `modules` {`battery` `switch` `LED`, `Resistor`} and {`metal`, `plastic`},  respectively. These modules are ready to use, the code that we need to develop in our software is what is needed to integrate all these components together. 

## 3 Modules in Python

### 3.1 Definition

When our programs start to scale, in order to keep it organized and easier to maintain, we should separate logical parts of our programs by files. It should be in a way that it doesn't have duplicated code and it can be imported and used in other files.

A python module is nothing more than a file with `.py` extension, a python file. 

In order to use the code inside each module in other pieces of software, our code inside the module should be organized in attributes. Attributes of a module are classes, functions and variables.

These attributes of modules can be imported and used in other pieces of software.      

### 3.2 Types of Modules

We can classify modules in 3 types:

#### Type 1: Modules that belong to our current project    
Modules that are stored on the same directory as you are working on. Because of that, they are only available for the project you are working on. Normally these kinds of modules are the ones developed by us and that belong to the scope of the project.        
__Location:__ The current directory of the project where we want to use the module.

#### Type 2: Modules stored on PYTHONPATH
Modules that are stored in one of the directories assigned to the shell variable `PYTHONPATH`. Here we have the path to  python environments. In python environments, modules are managed by `pip`. These modules are available in the environment on which they were installed. If you switch projects but keep working on the same environment, these modules are still available. These modules are usually external dependencies and can be used in different types of projects.     
__Location__: The directories assigned to the shell variable `PYTHONPATH`.       

#### Type 3: Modules that come with python
Built-in modules, that are common to all python environments that share the same version. We don't need to do anything to have them available;     
__Location:__ The default directory of python, normally `/usr/local/lib/python`;  

### 3.3 Import Statement

In order to use modules' attributes in our code, like notebooks or other python modules, we need to import those attributes in the file where we are working on.

You have seen a lot of `import` statements so far. Import statements are used to find and run a python module so it can be used in the current code.

To maximize clarity and readability, you should always write your import statements at the beginning of the file. 

When we type an import statement, python searches for that module first in the current directory of the file where we have the import, then in the PYTHONPATH and finally in the python default directory.    
The logic of the import statement is the same, regardless of the module's location.

We will use the __type 1__ modules to explain import statements.

Let's use the magic `tree` Unix command to see what are the folders and files that share the same directory as this `Learning Notebook.ipynb`.

In [54]:
#unix command
!tree

[01;34m.[00m
├── [01;34mdata[00m
│   ├── [01;35mcircuit.png[00m
│   ├── [01;35mlamp.jpg[00m
│   └── [01;35mlight.jpg[00m
├── Exercise notebook.ipynb
├── [01;34mfolder_v2[00m
│   └── module_v2.py
├── [01;34mfolder_v3[00m
│   └── [01;34msub_folder_v3[00m
│       └── module_v3.py
├── Learning notebook.ipynb
├── [01;34mlibrary[00m
│   └── [01;34mshelf[00m
│       └── book.py
├── module_v1.py
├── module_v4.py
├── README.md
├── requirements.txt
├── [01;34mshapes[00m
│   ├── circle.py
│   ├── __init__.py
│   ├── LICENSE
│   ├── README.md
│   ├── setup.py
│   └── square.py
└── utils.py

7 directories, 19 files


And the code inside files `module_v1.py`, `module_v2.py` and `module_v3.py` is the following (these files have the same content, so we will just print one of them):

In [2]:
# this unix command prints what is inside a file
!cat module_v1.py

class House:

    def __init__(self, price, area, location):
        self.price = price
        self.area = area
        self.location = location

    def calculate_price_square_meter(self):
        return self.price/self.area

def avg_price(list_houses, location):
    count = 0
    sum_price = 0
    for house in list_houses:
        if house.location == location:
            count+=1
            sum_price+=house.calculate_price_square_meter()
    return sum_price/count

location = "Coimbra"  
 


 These files have `.py` extensions. That means that they are python modules, module `module_v1`, `module_v2` and `module_v3` respectively.

Those modules have as attributes a class `House`, a function `avg_price` and a variable `location`. 

#### 3.3.1 Import module_name

If we want to use all the attributes from `module_v1`(that is inside the same directory as this Notebook), we need to import them in the following way:

In [3]:
import module_v1

Now we can use the attributes inside the file in the following way:

In [4]:
house_coimbra_n1 = module_v1.House(100000, 100, "Coimbra")
house_coimbra_n2 = module_v1.House(130000, 125, "Coimbra")

In [5]:
# now let's calculate the house prices in Coimbra
houses_coimbra = [house_coimbra_n1, house_coimbra_n2]
module_v1.avg_price(houses_coimbra, module_v1.location)

1020.0

In [6]:
# print location
module_v1.location

'Coimbra'

In [7]:
# let's remove the module so we can import the module again with another syntax
del module_v1

#### 3.3.2 Import module_name as other_name

If for better readability, you prefer to refer to your module as `module` instead of `module_v1`, we can use the `as` statement to create an __alias__:

In [8]:
import module_v1 as module

In [9]:
module.location

'Coimbra'

As we can see, it still works in the same way, but now the module has a different name.

In [10]:
# let's remove the module so we can import the module again with another syntax
del module

#### 3.3.3 from module_name import attribute

We can also explicitly import only the attributes we want to use. Imagine we only want to use the class `House` on our code, we can simply do:

In [11]:
from module_v1 import House

In [12]:
# now we need to write module_v1.House, just House
house_coimbra_n3 = House(200000, "1200", "Coimbra")
house_coimbra_n4 = House(300000, "2200", "Coimbra")

But if we try to use `location` or `avg_price`, it will fail.

In [13]:
try:
    print(avg_price([house_coimbra_n3, house_coimbra_n4]))
except Exception as e:
    print("Error: " + str(e))

try:
    print(location)
except Exception as e:
    print("Error: " + str(e))

Error: name 'avg_price' is not defined
Error: name 'location' is not defined


From the cell above we can see that `avg_price` and `location` were not defined.

In [14]:
# let's remove the attribute imported from module_v1
del House

#### 3.3.4 from module_name import *

Instead of explicitly importing each attribute of a module, we can also say that we want to import all the attributes inside the module. However, this is considered a bad practice, because we lose track of what is being imported and where it is being imported from, if we have multiple of these statements.

In [15]:
# bad practice
from module_v1 import *

In [16]:
location

'Coimbra'

It the case above, we are importing all attributes (classes, functions and variables) that are declared in the `module_v1.py` file. 

#### 3.3.5 from folder_name Import module_name

As we saw before, `module_v1` is inside the same directory as this `Learning Notebook.ipynb`. But that is not the case for `module_v2` and `module_v3`. These two modules are inside child directories.
In order to access them, we need to tell python how to get there with the `from` statement and/or `.` notation. 

`module_2` is inside `folder_v2`. The folder where the module is stored belongs to the directory of the current file (project directory). 

In [17]:
#from notation
from folder_v2 import module_v2

In [18]:
module_v2.location

'Coimbra'

In [19]:
#delete imported module
del module_v2

#### 3.3.6 Import folder_name.module_name

In [20]:
# dot notation
import folder_v2.module_v2

In [21]:
# this way is not very readable
folder_v2.module_v2.location

'Coimbra'

#### 3.3.7 from folder_name.module_name import attribute

In [22]:
# from and dot notation
from folder_v2.module_v2 import location

In [23]:
location

'Coimbra'

In [24]:
# delete location
del location

#### 3.3.8 import folder_name

Let's now try to import the entire `folder_v2` at once.

In [25]:
import folder_v2

In [26]:
folder_v2.module_v2.location

'Coimbra'

In [27]:
del folder_v2

#### 3.3.9 from folder_name.sub_folder_name import module_name

`module_3` is inside `sub_folder_v3`, and this folder is inside `folder_v3`. `folder_v3` belongs to the directory of the current file (project directory). 

In [28]:
# from and dot notation
from folder_v3.sub_folder_v3 import module_v3

In [29]:
module_v3.location

'Coimbra'

In [30]:
del module_v3

#### 3.3.10 import folder_name.sub_folder_name.module_name

In [31]:
# dot notation
import folder_v3.sub_folder_v3.module_v3

In [32]:
# again, this way is not very readable
folder_v3.sub_folder_v3.module_v3.location

'Coimbra'

In [33]:
del folder_v3.sub_folder_v3.module_v3

### 3.4 Check where the module is being Imported from

If we are not sure what type of module we are dealing with, we can use the `__file__` attribute to check where the module is stored.

#### 3.4.1 Type 1

Let's first check `module_v1`. Like we have seen before, this is a __type 1__ module. 

In [34]:
import module_v1

In [35]:
path = module_v1.__file__ #try to print the path variable 
"/".join(path.split("/")[-3:]) #this is to ignore the path to Week 8 folder

'Week 8/SLU14/module_v1.py'

The relative directory of the module is the folder we are currently working on.   
The `path` (i.e. the absolute path), should be something close to `.../ds-prep-course-workspace/Week 9/module_v1.py`

If we check the directory of the file we are working on, we have the confirmation that this module belongs to the directory of our current file.

#### 3.4.2 Type 2

The second type of modules are the ones that are stored on `PYTHONPATH`. This type includes the ones that we manage with pip using the command line.    
__note 1:__ Every time we do `pip install -r requirements.txt`, we are installing in our environment the packages (and respective modules) that are specified in the requirements file. We can also install a package using its name, with `pip install <package name>`.    
__note 2:__ Packages in python are folders with modules, we will look at this in more detail in the following sections.

Let's use the numpy package as an example.

One of the many modules that package `numpy` has is the `polynomial` module:

In [36]:
from numpy import polynomial

In [37]:
path = polynomial.__file__ #try to print the path variable 
"/".join(path.split("/")[-8:])  #this is to ignore the path to .virtualenvs

'.virtualenvs/prep-venv/lib/python3.7/site-packages/numpy/polynomial/__init__.py'

Polynomial module belongs to the parent directory where virtual environment `prep-venv` is located.   
The `path` (i.e. the absolute path), should be something close to `home/user/.virtualenvs/prep-venv/lib/python/site-packages/polynomial/__init__.py`

#### 3.4.3 Type 3

The last type of modules are the ones that already come with the current python version. We don't need to have them in our current directory or store them on our python environment. They are stored inside the directory of our python version.

One example is the `json` package:

In [7]:
import json

In [9]:
path = json.__file__ #try to print the path variable 
relative_path = "/".join(path.split("/")[-4:])   
relative_path  #this is to ignore the path to lib

'lib/python3.7/json/__init__.py'

The module belongs to the parent directory where python 3.7 is located.   
The `path` (i.e. the absolute path), should be something close to `home/user/lib/python/json/__init__.py`.

### 3.5 Check the attributes of a module

In [40]:
import module_v1

If we want to check what attributes (classes, functions or variables) a module has, we can use the `dir` function.

It can be very handy when we want to debug a project. 

Let's check `module_v1`.

In [41]:
dir(module_v1)

['House',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'avg_price',
 'location']

We now have a list of every attribute that __module_v1__ has. We can see that there are more attributes that the ones that are set on the code. Those attributes are automatically set for every python file, and we already saw one of them: the `__file__` attribute.

### 3.6 Executing modules as scripts

Until now we were importing and using attributes defined inside some python files (with the `.py`extension) on other python files. But we can also run modules as standalone files, like scripts. 

In [42]:
import module_v4

Let's see first what is inside `module_v4.py` file with the help of the unix cat command.

In [43]:
# the following unix command prints the content that is inside the module_v4 file
!cat module_v4.py

import numpy as np

class House:

    def __init__(self, price, area, location):
        self.price = price
        self.area = area
        self.location = location

    def calculate_price_square_meter(self):
        return self.price/self.area

def avg_price(list_houses, location):
    count = 0
    sum_price = 0
    for house in list_houses:
        if house.location == location:
            count+=1
            sum_price+=house.calculate_price_square_meter()
    return sum_price/count

location = "Coimbra"  

if __name__ == "__main__":
    
    array_houses_coimbra = np.array([[150000, 200],
                                     [120000, 150], 
                                     [100000, 100],
                                     [250000, 250],
                                     [175000, 200]])   
    
    list_houses = []

    for i in array_houses_coimbra:
        price = i[0]
        area = i[1]
        list_houses.append(House(price, area

Let's use the attribute `__name__` to see what is the namespace of our `module_v4`.

In [44]:
module_v4.__name__

'module_v4'

What happens here is when we use attributes outside the file, the value of attribute `__name__` is the namespace of the module, but, if we run it as a script, the value changes to `__main__`.   

This means that anything inside `if __name__ == "__main__"`, which is the entry point, will only be executed if we run it as a script.

In order to execute it as a script, we can use the following unix command.

In [45]:
!python module_v4.py

Average price of houses in Coimbra per square meter is 885.0


As we can see from the output above, the python module was run as a script, including what was inside the `if __name__ == "__main__"` block. This block **is not run** when we import attributes from this file.

## 4 Packages in Python

### 4.1 Definition

Packages in python are namespaces (directories), that contain modules.    
We can also have a package of packages.

These packages can be installed in a python environment, with `pip` managment system, or stored in the directory of your project.

In this current directory, we have a package named `shapes`. It has two modules inside, `square` and `circle`.

In [46]:
# unix command that shows the content inside shapes directory
!tree shapes

[01;34mshapes[00m
├── circle.py
├── __init__.py
├── LICENSE
├── README.md
├── setup.py
└── square.py

0 directories, 6 files


As we can see, we have more files, other than the module files. We will cover those files in the following section.

### 4.2 Package Files

What distinguish a python package from an ordinary folder is an `__init__.py` file inside of the correspondent directory of the package. It can be empty, but this file tells python that we are looking at a package. We can import it the same way we import a module.

Let's look at our `shapes` folder. It is stored in the same directory as this notebook.

As said before, in order for a directory to be seen as a package, we need an `__init__.py` file.   
There are some other files that are normally added to packages, like: `setup.py`, `README.md` and `LICENSE`, but these are not mandatory.   

`setup.py` imports [setuptools](https://pypi.org/project/setuptools/), a package that enables us to build our python package. Using the `setup` attribute from this package, we can specify information like the name of our package, version, author, description, packages used and so on. This file, with this information, is needed if we want to build our python package and share it with the world.   
Let's see what is inside `setup.py`.

In [47]:
!cat shapes/setup.py

import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="shapes", 
    version="1.0.0",
    author="ldssa",
    author_email="ldssa@ldssa",
    description="modules that represent shapes",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/LDSSA/ds-prep-course",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.7',
)


We can also add a `README.md`. We are already familiar with this kind of file. This is a markdown file that should have information to guide people that want to use our package. 

The `LICENSE` file is important if we want to share our package with the world. Build and upload it to [Python Package Index](https://pypi.org/), the place where all the packages that we can install by doing `pip install <package name>` are stored. This file tells people the terms of usage that they should follow when using this package in their code.

This notebook doesn't cover how to build and upload our packages to PyPI. If you are interested in this, you can follow [this](https://packaging.python.org/tutorials/packaging-projects/) tutorial.    
If you only want to install it in your own environments, [this](https://dev.to/umeshdhakar/how-to-create-and-publish-python-package-62o) article also helps you with that.

### 4.1 Installing packages on our environments

We have seen `pip` a lot. It enables us to manage packages in our python environments (install, update, delete...). 

In order to install packages that are indexed on PyPI, we can use the following unix command:    
`pip install <package namespace>`    

When we execute this command, pip will search on PyPI for a package that matches the namespace.    

We can also install packages in batches by using a `requirements` file. You are already familiar with the unix command to do that, you run it every week.   
`pip install -r requirements.txt`

Let's see how this week's `requirements.txt` file looks like.

In [48]:
!cat requirements.txt

nbgrader==0.6.1
matplotlib==3.1.2
numpy==1.18


It has the namespace of each package we want to install and also the version. `pip` will search for it on the `PyPI` hub.

If we want to install our private package in an environment, we can install it by doing:   
`pip install <package path>`

### 4.1 Using modules from packages on our code

Modules from packages can be imported following the same logic as modules inside folders. 

As an example of this analogy, instead of doing:    

`from <folder name> import <module name>`   

We should do:   

`from <package name> import <module name>`

Let's use as an example package `shapes`, stored on our directory.

In [49]:
from shapes import circle

In [50]:
this_circle = circle.Circle(10)
this_circle.get_area()

9.934588265796101

Or we can also import in the following way:

In [51]:
from shapes.circle import Circle

In [52]:
this_circle = Circle(10)
this_circle.get_area()

9.934588265796101

Let's go back to the `numpy` package, that was installed with `pip` from a `PyPI` location.

In [53]:
from numpy import polynomial

`numpy` is the package namespace and `polynomial` is a module that belongs to this package.

## Recap

Modules in python are `.py` files.     
Modules can be imported by other python files so they can use their attributes.       
Packages are a set of modules dedicated to a single functionality.   
There are a lot of packages available on `PyPI` that we can use in our code.    
We can install packages in our environments with `pip`.

Hopefully we learned, in this SLU, how to keep our projects organized, scalable, and easy to maintain, as well as how to use attributes from external sources. We don't need to reinvent the wheel, there are a lot of good packages out there that we can make use of.

## Further Reading

[Pyhton Modules and Packages-Real Python](https://realpython.com/python-modules-packages/)   
[Python Packages-Askpython](https://www.askpython.com/python/python-packages)   
[Packaging Projects-Packaging Python](https://packaging.python.org/tutorials/packaging-projects/)   
[Project Structure and Imports-dev](https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6)