# Lesson 09: Beyond the Notebook

So far, we have been working within the _Jupyter Notebook_ environment. As you know, Jupyter as an application that is built on top of Python that allows us to run arbitrary Python code within these _cells_.

Now, we are going to learn how to transfer our code written within notebooks into proper Python _modules_. A Python module is _importable_ by other Python code.

### Note: When re-importing a module, you need to restart your Python kernel.

You can also use the Jupyter autoreload extension to automatically reload python modules.

```python
%load_ext autoreload
%autoreload 2
```

## A Basic Module

A _module_ is a single `.py` text file that contains data and/or functions and/or classes.

Let's write a simple function, turn it into a module, and import it.

```python
from typing import List

def create_number_sequence(n: int) -> List[int]:
    """
    Returns a list of consecutive numbers starting from 0 and ending with 'n'.
    """
    return list(range(n + 1)) # Built-in function range() generates consecutive numbers
```

We can run it in a cell but we will now put this into a new text file, `nums.py`. Note, there is nothing special about this text file other than the file extension being changed from `.txt` to `.py`

We will save this `nums.py` file in the same directory as our notebook.

__Now__ we can import this code and run it in a cell as though we had written it in a cell above.

```python
import nums
nums.create_number_sequence(24)
```

### Modules can store functions, classes, and data (variables)

We can add the following to our `nums` module:

```python
from dataclasses import dataclass

@dataclass
class MyClass:
    data: str
        
# Interp. A basic class with generic data
        
MC1 = MyClass("Some Data")
```

## Sub-modules using directories

We can put our `.py` files into a directory to organize them.

You can use and import the `create_number_sequence` a number of ways.

### Simplest

```python
import my_package.nums # Have to import the full "path" to the module
my_package.nums.create_number_sequence(4)
```

**-OR-**

```python
from my_package import nums
nums.create_number_sequence(4)
```

**But not "discoverable" if you do not know the full path**

```python
import my_package
my_package.nums.create_number_sequence(4) # This does not work

dir(my_package) # No 'nums' in this listing
```


### Using an `__init__.py` file
Create an empty text file within your package directory and name it `__init__.py`. 

The presence of an `__init__.py` file tells Python that this directory should be considered as a Python "package" (a directory of modules).

However, to make a module discoverable, we need to add an `import` statement to the `__init__.py` file:

```python
from my_package import nums
```

_Now_, you can see the `nums` module in `dir(my_package)` and you can import just the top-level module, `my_package` and navigate through to nums in the function call:

```python
import my_package

my_package.nums.create_number_sequence(4) # Now this works
```

### Adding an additional sub-module to `my_package`

In `my_package`, we currently only have `nums`. We can create a new `.py` file and add additional functions to it:

```python
MONTHS = [
    "January",
    "February",
    "March",
    "April",
    "May",
    "June",
    "July",
    "August",
    "September",
    "October",
    "November",
    "December",
]

def cycle_months(loi: List[int]) -> List[str]:
    """
    Returns a list of a months corresponding the months that correspond
    to the index numbers in `loi`, modulo 12.
    
    All indexes are zero-indexed.
    """
    acc = []
    for idx in loi:
        acc.append(MONTHS[idx % 12])
    return acc
```

To make the module, `months.py` discoverable to our import system, we need to add the following to the `__init__.py` file:

```python
from my_package import months
```

_Now_, we can access the `months` module by importing `my_package`:

```python
import my_package

dir(my_package)
```

## Many ways to import

You can import modules in various ways:

### 1. Import base level module

```python
import my_package # Requires an __init__.py in 'my_package' that imports nums

my_package.nums.create_number_sequence(3)
```

### 2. Import specific submodule

```python
from my_package import nums

nums.create_number_sequence(3)
```

### 3. Import module or submodule as an alias with `as`

```python
import my_package as pkg

pkg.nums.create_number_sequence(3)

# OR #

import my_package.nums as n

n.create_number_sequence(3)
```

# Example (basic) package folder structure

Every Python project tends to have a folder structure very similar to this:

```
my_package_project/
├─ my_package/
│  ├─ __init__.py
│  ├─ sub_module_1/
│  │  ├─ __init__.py
│  │  ├─ module_1.py
│  ├─ sub_module_2.py
├─ setup.py
├─ tests/
│  ├─ test_sub_module_1.py
│  ├─ test_sub_module_2.py
```

We know about `__init__.py` but what about `setup.py`?

# Converting tests to pytest

Using `cs103` our tests look like this:

```python
start_testing()
# Testing `my_func()`
expect(my_func('some_input'), 'result_value') # Test 1
expect(my_func('another_input'), 'a different value') # Test 2

# Testing `my_other_func()`
expect(my_other_func('some_other_input'), 'result_value2') # Test 1
expect(my_other_func('yet_another_input'), 'a different value2') # Test 2
summary()
```

To use pytest, start by creating a new .py file for each module you want to write tests for. Put it in the `tests` directory inside your package. 

The above tests for `my_func()` would look like this in your new .py file:

```python
def test_my_func(): # With pytest, tests are functions that start with 'test_...'
    assert my_func('some_input') == 'result_value'
    assert my_func('another_input') == 'a different value'
    
def test_my_other_func():
    assert my_other_func('some_other_input') == 'result_value2'
    assert my_other_func('yet_another_input') == 'a different value2'
```


## Creating a (basic) setup.py file

Python packages are, by default, designed to be _installed_ and not just sit in a folder somewhere in your workspace.

When a package is installed, then you can access it from anywhere in your file system. If a package is just sitting in a folder on your machine, you will only be able to access it if your notebook is in the same directory as your package (not convenient).

```python
from setuptools import setup

setup(
   name='my_package',
   version='0.0.1',
   description='A useful module',
   author='Connor Ferster',
   author_email='cferster@rjc.ca',
   packages=['my_package'],  #same as name
   install_requires=[], #external packages as dependencies
)
```

Now that you have created a `setup.py` file and you have the correct folder structure, now you can _install_ your first python module!

Open a new terminal, navigate to your project folder, and run the following command: 

```bash
pip install -e my_package/
```

-OR-

In a Jupyter cell, run the following:

```python
!pip install -e my_package/
```

Here is a breakdown of what you just did:

1. `pip`: This is a program installed with Python whose primary purpose is to install and remove packages. By typing the name of the program, you are starting it up.
2. `install`: You are telling the `pip` program that you want to install something
3. `-e`: This is what is known as a "flag". The `-e` flag tells `pip` that you want to do an _editable_ installation. An editable installation is one where you can continue to work on your package, live, and Python will always link to the latest version (it's like copying a shortcut to your package in the Python "site-packages" directory where new packages are installed instead of making a copy of your package).
4. `my_package/`: This tells `pip` that you want to install the package contained in the directory, `my_package`

## Running tests in pytest

Unlike the Jupyter environment, pytest is a _command-line_ application. You use pytest through a _terminal_.

Navigate to your tests directory by using the `cd` command ("change directory").

List the contents of a directory by using the `ls` command ("list directory").

Once you are in your `my_package` directory, run pytest:

```bash
pytest
```

-OR-

In a Jupyter cell, run the following:

```python
!pytest
```

# Addendum: Useful packages in the Python standard library

Python is often referred to as a "batteries included" language. This is because it has a large "standard library" of pre-installed packages.

These packages allow you be productive quickly because you do not need to "re-invent the wheel" if you want to complete a common task like, say, sending an email (just do `import email` to import the `email` package from the standard library).

You can [click here](https://docs.python.org/3/library/index.html) to see the complete list.

Below is a small list of standard library packages that I find to be useful on a semi-regular basis:

* `math` - For math functions (`sqrt()`, `sin()`, etc.)
* `pathlib` - For accessing files and file paths
* `datetime` - For working with dates, times, or getting the time right now
* `csv` - For easy _parsing_ of CSV/TSV files
* `json` - For easy parsing of _JSON_ files into Python dictionaries
* `xml` - For easy parsing of _XML_ files (such as Decon produces)
* `sys` - Accesses your computer system parameters
* `os` - Accesses your operating system
* `subprocess` - For running other command-line applications from within Python (e.g. spColumn)
* `random` - For generating pseudo-random numbers
* `itertools` - Simple functions for doing efficient loops
* `re` - For creating [regular expressions](https://en.wikipedia.org/wiki/Regular_expression)

In your own programs, you can import any of these standard library packages on any machine that runs Python without ever having to worry about whether or not they are installed or if they will work. They are always installed because they are a part of the standard library.