### MEDC0106: Bioinformatics in Applied Biomedical Science

<p align="center">
  <img src="../../resources/static/Banner.png" alt="MEDC0106 Banner" width="90%"/>
  <br>
</p>

---------------------------------------------------------------

# 03 - Modules and Packages

*Written by:* Oliver Scott

**This notebook provides a general introduction to using modules and packages in Python.**

Do not be afraid to make changes to the code cells to explore how things work!

### What are modules?

In programming a **module** is part of a piece of software contating code that has a specific function. When developing large projects breaking code into modules helps keep the code **readble** and **maintainable**. For example, if we were building a game, a specific module may handle physics, while another handles what is rendered on screen. 

In Python a module is simply a file with a `.py` extension, containing functions, variables and classes. When we import this code into Python we use the name of the file without the extension. For example, a file called *test.py* defines a module and its corresponding name in Python would be *test*.

Python ships with a relatively large collection of useful modules, such as `math`, however, we can also define our own modules or download them from a third-party.

### What are packages?

Packages are simply collections of modules and packages, within a directory. To be a package a special `__init__.py` file must exist indicating that the directory defines a package. For example:

```
audio/
    __init__.py
    formats/
        __init__.py
        mp3.py
        wav.py
    effects/
        __init__.py
        echo.py
        fade.py  
```

The above package structure defines a package `audio` containing a further two packages; `formats` and `effects`. The `formats` package contains two modules `mp3` and `wav` while the `effects` package defines two modules `echo` and `fade`. In this notebbok we will not dwell on the structure of packages, although it helps to understand how they are laid out. If you would like to learn more about packages and how to create them, I suggest taking a look at the extra resources linked below.

### The Python Standard Library

Python ships with an extensive [collection](https://docs.python.org/3/library/) of modules/packages useful to a programmer. It is important for a programmer to become acquainted with the standard library as it can be used to solve many common programming problems quickly, without having to write and test new code. There is a nice guide to the standard library available on the Python website [here](https://docs.python.org/3.8/tutorial/stdlib.html).

### Third Party Extensions

Of course the Python standard library is by no means comprehensive, hence, third party components can be installed from the [Python Package Index](https://pypi.org/) (PyPI). Here you can find everything from individual programs and modules to packages and entire application development frameworks! The scientific community also often use a software called [Anaconda](https://www.anaconda.com/products/individual) as it also allows the easy installation of more complex packages which may also make use of compiled code (C, C++, Rust, etc...). If you are viewing these notebooks with Binder, it has taken care of installing a number of third-party extensions that we will use in these sessions.

----

## Contents

1. [Importing Modules](#Importing-Modules)
2. [Writing Modules](#Writing-Modules)
3. [Packages](#Packages)
4. [Third-Party Modules/Packages](#Third-Party-Modules/Packages)
5. [Discussion](#Discussion)

----

### Extra Resources

This introduction to Python is by no means comprehensive. Below are some links to resources for learning Python if you are interested.

- [RealPython](https://realpython.com/) - Free python tutorials from beginner to advanced
- [CodeAcademy](https://www.codecademy.com/learn/learn-python-3) - Python lessons
- [Cheat-Sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/) - Python reference sheets

----

## Importing Modules

In Python modules can be imported using the `import` statement:

```python
import mymodule
```

Attributes contained within the imported molecule can then be accessed using the 'dot' `.` syntax:

```python
mymodule.myvariable
mymodule.myfunction()
```

The examples below show the most common ways to import functionallity from modules:

In [None]:
import math

print("Pi:", math.pi)  # Modules can define useful variables

double_pi = math.pi * 2  # We can use the imported variables just like normal variables 

# Modules may also define functions that we can utilise in our own code
print("Degrees in Pi radians:", math.degrees(math.pi))
print("Degrees in 2*Pi radians:", math.degrees(double_pi))

Python's `from` statement allows you import specific attributes from a module with the syntax:

```python
from mymodule import attr1[, attr2[, attr3, ..., attrN]]
```

see the below cell for an example:

In [None]:
from math import pi, degrees  # Importing two attributes (variable, function)

print("Pi:", pi)
print("Degrees in Pi radians:", degrees(pi))

You may also import attributes using custom names. You may want to do this to shorten a module/function/variable name or avoid naming conflicts with your own code. Changing the name has no impact on the functionality.

```python
import mymodule as module
from mymodule import attr1 as attr
```

In the example below we import pi from math as 'PI':

In [None]:
from math import pi as PI

print("Pi:", PI)

**Pro tip:**
Jupyter allows you to see a modules contents in a handy popup window. Just type the module name followed by a dot and hit `TAB`.

<p align="center">
  <img src="../../resources/static/tooltip.png" alt="Tooltip"/>
  <br>
</p>

Try this in the cell block below:

1. Uncomment the line below
2. Move your cursor to after the dot 
3. Hit the `TAB` key

In [None]:
# math.

Using the `help` function also gives us an overview of a modules contents.

***Note:*** *you can also use the help function to give information about functions and classes*

In [None]:
help(math)

## Writing Modules

As previously mentioned, a module is simply a file with the `.py` extension. If you take a look in the current directory: `workshop/session_1` there should be a file called: `mymodule.py` defining a variable and a function. Take a look at the contents of this file before continuing.

In the cell below we import this created module and use its functionallity, of course there is nothing stopping you from editing this file to add extra functions and variables!

***Note:*** *With our current understanding of Python, any modules that we create should be in the same directory as our notebook/script so that Python knows where to find them.*

In [None]:
import mymodule

print(mymodule.choice_list)

n = 0        # Starting index
samples = 5  # How many samples to take?

while n <= samples:
    choice = mymodule.make_choice()
    print('Computer made choice:', choice)
    n += 1  # We must remember to increment n!

## Packages

Packages are a convenient way to compartmentalise a large amount of functionality. The syntax for using packages is very similar to that of modules. We can import a package using the same syntax:

```python
# Import mypackage
import mypackage

# Import a subpackage
from mypackage import subpackage

# Import a module from a subpackage
from mypackage.subpackage import mymodule

# Import a function from mymodule
from mypackage.subpackage.mymodule import myfunction
```

If we refer back to the `audio` package example from earlier importing the module wav would be as simple as:

```python
from audio.formats import wav
```

We could then use functions defined in wav:

```python
wav.read_wav_file(...)
```

Below is an example using a package in the Python standard library to download and print lines in a `.csv` file:

In [None]:
from urllib.request import urlretrieve  # import function from urllib package

url = "http://winterolympicsmedals.com/medals.csv"
path = './data/medals.csv'  # workshop/session_1/data

urlretrieve(url, path)  # download the csv file
lines_to_read = 10

# Open and print some lines from the file
with open(path, 'r') as file:
    n = 0
    lines = file.readlines()
    while n < lines_to_read: 
        print(lines[n])
        n += 1

***Note:*** *After running this cell block medals.csv should be present in the `/data` directory!*

## Third-Party Modules/Packages

We can also import third-party extensions using the `import` statement. Binder has handled all the installation procedure for us in this case, but often we will use a package installer like [pip](https://pypi.org/project/pip/) or [conda](https://docs.conda.io/en/latest/) to install packages. In fact we can install packages from inside jupyter using pip like so:

```
!pip install numpy
```

If you were to try this you would probably get a message similar to this:

```
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.9/site-packages (1.21.2)
```

Two of the most popular Python packages are [numpy](https://numpy.org/) and [pandas](https://pandas.pydata.org/), which we will learn more about in the next notebook. Below is a brief example of importing these packages and using some of their functionality:

In [None]:
import numpy

# Create a numpy array (similar to a list of numbers)
array = numpy.array([1,2,3,4,5,6])

# We can compute some statistics using numpy
print('Mean:', numpy.mean(array))
print('STD:', numpy.std(array))

It is very common to see numpy and pandas shortened to `np` and `pd` respectively:

```python
import pandas as pd
import numpy as np
```

The functionality remains the same while making the code look a little cleaner.

Below is a small example using the pandas package to read the medals data we downloaded in an above cell, notice that pandas formats the data in a much more pleasent way.

***Note:*** *If you did not run the 'Packages' cell then you will need to, to download the required file and define the variable `path`*

In [None]:
import pandas as pd

df = pd.read_csv(path)  # `path` is defined in an above cell
df.head(10)             # Show the first 10 rows of the table

## Discussion

Python has become a very popular programming language in recent years due to its expressiveness and its 'easy-to-read' syntax. The scientific Python stack is now vast and hence it is easy to become productive when using Python in a project. Learning the basics of Python may end up being a major asset in your future career path!

Feel free to add more code cells and experiment with the concepts you have learnt.

In the next session we will learn to use the popular packages [numpy](https://numpy.org/) and [pandas](https://pandas.pydata.org/), which are very powerful tools for data analysis.

Now you should try to complete the **exericises**. Feel free to use these notebooks as references.

If you want to know more there are some extra resources from external sources linked in the beginning section. You can click the link below to go back to the top.

Click [here](#Contents) to go back to the contents.