# Modules

```{note}
This page was not shared with MUDE students in 2023-2024 (year 2).

It may have been a new page, or a modified page from year 1.

There may be pages in year 1 and year 2 that are nearly identical, or have significant modifications. Modifications usually were to reformat the notebooks to fit in a jupyter book framework better.
```

Contents
===
- [Introduction](#Introduction)
- [Modules and classes](#Modules-and-classes)
    - [Storing a single class in a module](#Storing-a-single-class-in-a-module)
    - [Storing multiple classes in a module](#Storing-multiple-classes-in-a-module)
    - [A number of ways to import modules and classes, functions and variables](#A-number-of-ways-to-import-modules-of-classes,-functions-and-variables)
    - [A module of functions and variables](#A-module-of-functions-and-variables)
- [How does module importing work](#How-does-module-importing-work)
- [Modules and PEP8](#Modules-and-PEP8)
    - [Multiple imports](#Multiple-imports)
    - [Ordering imports](#Ordering-imports)
- [Exercises](#Exercises)
- [Optional material](#Optional-material)
    - [Command line arguments with `sys.argv`](#Command-line-arguments-with-sys.argv)
    - [Parsing command line arguments with `argparse`](#Parsing-command-line-arguments-with-argparse)

## Introduction
Programming tasks usually require writing several lines of code which are much better organized in a **modular** fashion, rather than in single, extremely long Jupyter notebooks (or .py script files). Modularization refers to splitting such large programming tasks into smaller, separate, and more manageable subtasks. Python scripts are modularized through **functions**, **classes**, **modules**, and **packages**.

While you should be already familiar with *functions* and *classes*, you should think  of *modules* as a `.py` files containing Python functions, classes, definitions and statements. On the other hand, a package is a set of modules,  i.e., a collection of `.py` files organized in folders and subfolders. Python accesses the modules in a package by referencing the package name.

Modularity has the added advantage of isolating your code blocks into files that can be used in any number of different programs. Futhermore, if you want to extend their functionality, you would not need to modify multiple files, but only the file they reside in.

You have been using Pyhon modularization all along, maybe without even realizing it.

Here is a quick example:

```python
from matplotlib.pyplot import subplots
``` 

which follows the format:

```python
from package_name.module_name import function_name
``` 


Modules and classes
===


Storing classes in a module
---

A module is simply a file that contains one or more classes or functions, so the Shuttle and Rocket classes can also be in the same file. 

Now you can import the Rocket and the Shuttle class, and use them both in a clean uncluttered program file:

In [None]:
from space import Rocket, Shuttle

rocket = Rocket()
print(f"The rocket is at ({rocket.x}, {rocket.y}).")

shuttle = Shuttle()
shuttle.move_rocket()
print(f"The shuttle is at ({shuttle.x}, {shuttle.y}).")
print(f"The shuttle has completed {shuttle.flights_completed} flights.")

print(f"The distance between the rocket and the shuttle is ({rocket.get_distance(shuttle)}).")

The first line tells Python to import both the *Rocket* and the *Shuttle* classes from the *rocket* module. You don't have to import every class in a module; you can pick and choose the classes you care to use, and Python will only spend time processing those particular classes.

A number of ways to import modules of classes, functions and variables
---
There are several ways to import modules, and each has its own merits. We illustrate mainly how you can import classes, however, you can import functions and variables in the exact same way:

### from *module_name* import *ClassName*

The syntax for importing classes that was just shown:
```python
from module_name import ClassName
```
is straightforward, and is used quite commonly. It allows you to use the class names directly in your program, so you have very clean and readable code. 

### import *module_name*

Directly using the class names from a module can be a problem if the names of the classes you are importing conflict with names that have already been used in the program you are working on. For example, if a module contains a function or a class with the same name as one you have defined in your notebook. Have a look at the code cell below, where we have a Rocket class in the current cell and a Rocket class in module `space`:

In [None]:
from space import Rocket

class Rocket:
    def __init__(self, name):
        self.name = name

# Instatiate a class from the current file
rocket = Rocket("Ariance")
print(f"The rocket is called {rocket.name}.")

The Rocket defined in the cell is taking precedence before the Rocket class in module `space`. For instance, the Rocket class in the module has no field `name`. Thus, it is not possible to directly use that class. In order to mitigate this, we can make use of the dot notation:

The general syntax for this kind of import is:
```python
import module_name
```

After this, classes are accessed using dot notation:
```python
module_name.ClassName
```

In [None]:
import space

class Rocket:
    def __init__(self, name):
        self.name = name

# Instatiate a class from the current file
new_rocket = Rocket("Ariance")
print(f"The rocket is called {new_rocket.name}.")

# Instatiate a class from module rocket
module_rocket = space.Rocket()
print(f"\nThe rocket is at ({module_rocket.x}, {module_rocket.y}).")
print(f"The distance between the same rocket is ({module_rocket.get_distance(module_rocket)}).")

This prevents some name conflicts. If you were reading carefully however, you might have noticed that the variable name *rocket* in the previous example had to be changed because it has the same name as the module itself. This is not good, because in a longer program that could mean a lot of renaming.

### import *module_name* as *local_module_name*

There is another syntax for imports that is quite useful:
```python
import module_name as local_module_name
```
When you are importing a module into one of your projects, you are free to choose any name you want for the module in your project. So the last example could be rewritten in a way that the variable name *rocket* would not need to be changed:

In [None]:
import space as space_module

rocket = space_module.Rocket()
print(f"The rocket is at ({rocket.x}, {rocket.y}).")

shuttle = space_module.Shuttle()
shuttle.move_rocket()
print(f"The shuttle is at ({shuttle.x}, {shuttle.y}).")
print(f"The shuttle has completed {shuttle.flights_completed} flights.")

print(f"The distance between the rocket and the shuttle is ({rocket.get_distance(shuttle)}).")

This approach is often used to shorten the name of the module, so you don't have to type a long module name before each class name that you want to use. But it is easy to shorten a name so much that you force people reading your code to scroll to the top of your file and see what the shortened name stands for. In this example, you can abbreviate space to something like:

In [None]:
import space as s

Of course there are well known shortening examples, which you might have already seen:
```python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
```

### from *module_name* import *
There is one more import syntax that you should be aware of, but *you should probably avoid using*. This syntax imports **all of the available classes, all functions in a module and all variables in a module**. Note that functions or variables, which have leading underscore `_` in their name are excluded from this rule. Similarly to encapsulation in OOP, they are considered private:

```python
from module_name import *
```

This is not recommended, for a couple reasons. First of all, you may have no idea what all the names of the classes and functions in a module are. If you accidentally give one of your variables the same name as a name from the module, you will have naming conflicts. Also, you may be importing way more code into your program than you need.

If you really need all the functions and classes from a module, just import the module and use the `module_name.ClassName` syntax in your program.

You will get a sense of how to write your imports as you read more Python code, and as you write and share some of your own code.

How does module importing work
===
Python has built-in modules, which are accessible anywhere. Examples of those are `sys`, `math`, `random`. For a full list, check the following link: https://docs.python.org/3/py-modindex.html

When importing modules, Python searches for modueles in the following order:
1. Looks if the module name matches any of the names in the index above. 
2. Searches for a python file in the same working directory as the file, which imports it
3. Looks at PYTHONPATH - we will not cover this, but you can think of it as the default search path for modules
4. Goes over installed packages if any match

As a result, if you have the `numpy` package installed, but you also have a file `numpy.py` in the same working directory, Python will import the local file instead of the installed package. You should avoid naming modules after standard built-in modules or standard packages such as `numpy` or `matplotlib`.

Modules and PEP8
===
Modules also have their own list of rules in PEP8 Style guide:

Multiple imports
---
In the case that we have multiple modules that we wish to import simultaneously, there are some requirements to follow. **Module imports should be done in multiple lines**. For instance, if you have the modules `os` and `sys`, it is recommended to use multiple `import` statements. In addition, modules within every group should be ordered **alphabetically**:
```python
import os
import sys
```
The wrong way to do this is to import both of the modules on the same line:
```python
import os, sys
```

Ordering imports
---
Apart from separating every import in a new line, it is also important to group modules depending on their type. The correct order to import modules is the following:
1. Standard library imports
2. Related third party imports
3. Local application/library specific imports

Blank lines should be placed between each group.

Here is an example of correct order of module imports:
```python
import os
import sys

import numpy as np
import pandas as pd

import rocket
import shuttle
```

Exercises
===

In [None]:
from jupyterquiz import display_quiz

display_quiz("https://surfdrive.surf.nl/files/index.php/s/wHKH0oP3SmbZHLP/download")

Optional material
===
The material in the subsections below is considered optional. Therefore, it is not mandatory to study and if you wish, you may skip it.

Command line arguments with `sys.argv`
---
Python files can also be used as scripts, which can run speicific tasks. For example, it is possible to create a python file, which when run executes pieces of code. For example, the code below will run a python script(file) and create an image `weather.png`, which displays temperatures over a period of time:

Currently, we are reading from an old dataset, however, imagine that we were getting the data from a server, which adds more data everyday or even every hour to the dataset. Then it would be very convenient to regularly run this script and observe the changes via a graph. 

Although this looks easy to use, it is not very flexible, because we need to modify the script every time we run it if we want different periods of time or different plot file names. Hence, there is a solution to this setback in the `sys` module. More specifically by command line arguments in `sys.argv`. Command line arguments can be thought of as arguments you pass to a function.

For example, suppose we could pick different start and end date of observations simply by passing those two values as arguments to the python file:
```bash
python weather_script.py "01/03/2021" "31/05/2021"
```
Here `weather_script.py` can be thought of as a function, which takes 2 arguments - start and end date.

To achieve this, we the code uses as dates `sys.argv[index]` statements:
```python
import sys
import pandas as pd
import matplotlib.pyplot as plt

...

# set start and end time
start_date = pd.to_datetime(sys.argv[1],dayfirst=True)
end_date = pd.to_datetime(sys.argv[2],dayfirst=True)

# preprocess the data

...

fig.savefig('weather.png')
```

In [None]:
! python weather_script.py "01/03/2021" "31/05/2021"

# Display the generated image
from IPython import display
display.Image("./weather.png")

Parsing command line arguments with `argparse`
---
We can take command line arguements a step further via the module `argparse`, which provides even more flexibility. For instance, you can have optional arguments with default values, arguments help, argument types and much more as described here: https://docs.python.org/3/library/argparse.html. We will briefly cover the basics of this module in this section.

Remember how Git has the command `git help` and also how we pass arguments using dash (`-`)? For example, when we are making a commit, we can pass the message in multiple ways:

```bash
git commit -m "My first commit"
git commit --message "My first commit"
```
`argparse` module adds the same quirks of getting help and also having long and short arguments. To add `argparse` to our script, we need to rewrite the way we are getting the `start_date` and `end_date` as follows:
```python
import argparse
import pandas as pd
import matplotlib.pyplot as plt

...

parser = argparse.ArgumentParser()
# set start and end time
parser.add_argument('-s', '--start', type=str, default="1/1/2019", help="Start time")
parser.add_argument('-e', '--end', type=str, default="1/1/2021", help="End time")

args = parser.parse_args()

start_date = pd.to_datetime(args.start,dayfirst=True)
end_date = pd.to_datetime(args.end,dayfirst=True)

# preprocess the data

...

fig.savefig('weather.png')
```
Note that if an argument is not passed to a script, it has a default value, which will be used

**Exercise:** Copy `weather_script` into a new python file called `weather_script_improved` (note you need to create that file yourself). Then, modify the new file as described above to make use of `argparse`. Finally, run the code cell below to verify your code is working. The expected output is a graph, which contains temperatures only in the range 01/03/2021 - 31/05/2021. Note that we can use both long and short versions of arguments.

In [None]:
! python weather_script_improved.py -s "01/03/2021" --end "31/05/2021" --output "weather.png"

# Display the generated image
from IPython import display
display.Image("./weather.png")

Run the next cell to check what arguments can be passed to the script:

In [None]:
! python weather_script_argparse.py --help

It looks similar to how git shows its help, right?

In [None]:
! git --help

# References and used resources
- http://introtopython.org/
- https://aaltoscicomp.github.io/python-for-scicomp/scripts/#scripts
- https://www.learnpython.org/en/Modules_and_Packages
- https://docs.python.org/3/tutorial/modules.html