# NESM Python Workshop Part 1

## Overview

- Introductions
- Conda
- Jupyter notebook overview
- Pure python - loops and lists
- Numpy
- Plotting with matplotlib
- Image data

## Conda Environments

Python has a standard library for doing many useful tasks but you will often want to install other python libraries. At a surface level, `conda` provides an easy way to install packages.

> Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language. ([conda docs](https://docs.conda.io))

**How**

1. Open a terminal (mac or linux) or the Anaconda Prompt (windows)
1. Type the following and press enter: `conda create -n nesm python`. This uses `conda` to create a new environment with the name (`-n`) `nesm`. It will install the latest python version in that environment. Whenever you are starting a new project, it is best to create a new environement.
1. `conda info --envs` -- this will list all the environments on your computer. Note that `base` is the default. The `*` next to an environement means that you are currently using that environment.
1. `conda activate nesm` -- Switch to using the newly created `nesm` environment. (If you run `conda info --envs` again you will see that the `*` has moved.)
1. `conda install -c conda-forge numpy scipy matplotlib jupyterlab xarray scikit-image ipympl` -- This line uses `conda` to  `install` several common python libraries. We will use all of them in this workshop (time permitting). We also specify to install these libraries from `conda-forge` which is the community maintatined repository for conda installable software. This is where the most up to date versions of software are.


### Launch Jupyter Lab

Type `jupyter lab` in the terminal window and press enter. Eventually a window should open in your browser that looks like this one.


### Working in Jupyter Notebooks

**The Essentials**
- `Shift + Enter` executes a cell
- `Shift + Tab` shows the documentation of a function
- `Tab` will attempt to auto-complete the word you are typing

**Cell Operations**
- There are two modes in a jupyter notebook: *Edititing* mode is where you are editing text in a cell. *Command* mode is when you are outside of a cell. `Esc` while in a cell switches to command mode. `Enter` will select a cell and enter editing mode there if you are in command mode.

- `Esc + a` makes a new cell *above* your current position 
- `Esc + b` makes a new cell *below* your current position
- `Esc + m` converts a cell into a markdown cell
- `Esc + y` converts a cell into a code cell.
- `Esc + d + d` deletes a cell
- `Esc + i + i` interrupts the execution of a cell
- `Esc + 0 + 0` stops execution and restarts the kernel
- If you really get into it, you can make custom keyboard shortcuts in `Settings > Advanced Settings > Keyboard Shortcuts`

There are also jupyter lab extensions that can really improve the experience of using jupyter lab. A very good list of these extensions is called [Awesome Jupyter](https://github.com/markusschanta/awesome-jupyter).

## Pure Python 

We are just going to touch on a few key parts here that will most directly build into image analysis. It is defintely worth spending some time to familiarize yourself with the python standard library.  

### Lists

#### Indexing

Lists are ordered groups of objects. You can access the elements of a list with *indexing*.
- Use square brackets to denote indexing operations 
- Python is *zero-indexed* meaning that the first element has index 0
- You can also index relative to the end of the list with negative indexing.

![image.png](attachment:d772d82a-534f-4c36-b236-b79a341b2eac.png)

![image.png](attachment:ab1efb07-cba1-4bf9-9ff0-8e55c33a62b2.png)


In [2]:
my_list = [1,3,7,13,21]

In [None]:
# get the third element from the list

In [None]:
# get the second to last element of the list

#### Essential List Methods
- `len` - return the number of elements in a list
- `.append` - add items to the end of a list

In [None]:
# get the length of the list

In [None]:
# add some items

#### Lists of lists (of lists of....)

Since lists can contain arbitrary python objects, they can also contain more lists. To access specific items 

In [3]:
ptriples = [[3,4,5],[5,12,13],[7,24,25],[8,15,17]]

In [6]:
#look at the first element

In [7]:
#look at an individual number

## For Loops

For loops are a workhorse of python programming (and many other languages). We will look at three common for loop constructions.

**Range**

In [1]:
for i in range(10): #generate all the integers from 0 to 9
    print(i, i*i)

0 0
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81


**Iterables**

A common thing to do is loop over a list. Naively you would do something like:

```python
for i in range(len(my_list)):
    x = my_list[i]
    #do something with x
 ```
 
But there is a more efficient way. Lists are *iterable* so for loops can loop over their elements directly.

In [5]:
for trip in ptriples:
    print(trip, trip[0]**2 + trip[1]**2, trip[-1]**2)

[3, 4, 5] 25 25
[5, 12, 13] 169 169
[7, 24, 25] 625 625
[8, 15, 17] 289 289


**Enumeration**

Sometimes, you also want the index of the list you are iterating over, e.g. you want to refer to a corresponding element in another list.

In [6]:
for i, trip in enumerate(ptriples):
    print(i, trip)

0 [3, 4, 5]
1 [5, 12, 13]
2 [7, 24, 25]
3 [8, 15, 17]


## Breakout exercises

1. Work together!
1. We recommend that one person share their screen so everyone can see a common notebook.
1. We will go around the breakout rooms to answer questions.


### 1. Triangular numbers

The [triangular numbers](https://en.wikipedia.org/wiki/Triangular_number) are given by
$$a(0)=0$$
$$a(1)=1$$
$$a(n)=a(n−1)+n$$


The first 5 triangular numbers are: `[0, 1, 3, 6, 10]`

Create a list of the 1000 triangular numbers. If you get this correct then the final item in the list should be `499500`.

You can print out the final item with (`print(tri[-1])`).

In [None]:
# seed the first element of the sequence
tri = [0]
# start with n=1 because we already defined the first element.
for i in range(1, 1000):
    # your code here


### 2. Tetrahedral numbers


The sum of the first N triangular numbers gives the Nth [Tetrahedral number](https://en.wikipedia.org/wiki/Tetrahedral_number).

As a concrete example the first few tetrahedral numbers are given by:

$$Tetra_1 = a(0) + a(1) =  0 + 1$$
$$Tetra_2 =  a(0) + a(1) + a(2) = 0 + 1 + 3$$

Create an array of the first 999 Tetrahedral numbers. If you did this correctly then the last element should be `166666500`

The first 5 elements of this sequence are: `[1, 4, 10, 20, 35, ...]`

Hint: You may need to use two for loops for this but it will be faster if you can do it with a single loop.

### 4. Temperature Conversion

In the next cell we've defined a variable `room_temp` that is a 2D list. This represents the temperature of a surface measured at equally spaced points. The data is visualized below.

<img src="images/surface_temp.png" alt="surface-temperature" width="400"/>

Unfortunately it's been measured in Farhenheit. So using `for` loops make a new array named `room_temp_celsius` and fill it with the values of `room_temp` but converted to celsius. The conversion formula is:
$$C = \frac{F- 32}{1.8}$$

In [7]:
room_temp = [
    [72, 72, 68, 73, 72],
    [68, 73, 70, 67, 68],
    [72, 68, 69, 73, 72],
    [72, 68, 69, 74, 71],
    [70, 73, 72, 68, 71]
]


## 5. More dimensions

Now assume that we measure the temperature at the same points on the surface over a period of time. 

![room-temp-distribution-multi-times.png](attachment:ec2fb39f-903e-4ca0-9a0f-964a636b932d.png)

The variable `room_temp_3d` has three dimensions. Make a new array of this data converted to celsisus.

In [None]:
room_temp_3d = [
    [[72, 72, 68, 73, 72],
    [68, 73, 70, 67, 68],
    [72, 68, 69, 73, 72],
    [72, 68, 69, 74, 71],
    [70, 73, 72, 68, 71]],

    [[67, 71, 73, 68, 68],
    [72, 74, 69, 67, 72],
    [69, 72, 68, 71, 72],
    [71, 67, 70, 68, 67],
    [71, 68, 68, 69, 68]],

    [[67, 68, 67, 70, 69],
    [67, 67, 74, 68, 67],
    [69, 70, 74, 70, 68],
    [74, 68, 70, 74, 71],
    [69, 73, 68, 74, 68]]
]


## 3. Printing out nested slices

Using `for` loops and `if` statments print out the third column of the second time point of `room_temp_3d`
![room-temp-distribution-multi-column.png](attachment:0024e64f-733d-479a-83fc-b19b448a6147.png)

## Numpy

### What are Numpy Arrays?

Arrays are like lists that are highly optimized for numerical data. They sacrifice some of the flexiblity of python lists for much higher performance.

See https://numpy.org/doc/stable/user/absolute_beginners.html

**Poll**: How much faster do you think arrays are than lists and for loops?

In [8]:
from math import sin
from time import perf_counter # time different operations
import numpy as np

In [9]:
N = 1000000
arr_list = list(range(N))

In [10]:
# time math.sin + for loop
t0 = perf_counter()

tf = perf_counter()
t_loop = tf - t0

In [None]:
t_loop #seconds

In [None]:
# make a numpy array

In [14]:
# time np.sin
t0 = perf_counter()

tf = perf_counter()
t_arr = tf - t0

In [15]:
print(t_loop/t_arr)

0.5396029831103312


### How to create arrays

**From a list**: we can just call np.array(list) to turn that list into a numpy array. But be careful the lists need to be shaped like a valid matrix.


In [None]:
# turn the ptriples list into a numpy array and look at it's shape


Manually typing out lists sucks.

Numpy provides many functions for initializing common arrays.

- Methods giving you specific numbers
    - [arange](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange) Use this to generate lists of integers
    - [linspace](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy.linspace) use this to generate lists of evenly spaced numbers. For example to evaluate a function at many values between 0 and 1
    
- Methods where you give the shape of an array
    - [zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros_like.html#numpy.zeros) give an array of zeros in the shape you specify
    - [zeros_like](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros_like.html#numpy.zeros_like) gives an array of zeros in the shape of anther array
    - [ones](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones_like.html#numpy.ones) same as zeros except filled with ones
    - [ones_like](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones_like.html#numpy.ones_like) sames as zeros_like except filled with ones

### Vectorized operations - No more loops!

Consider the simple problem of multiplying every element in a list/array by 5. For a list we would need to do something like 

```python
data = [3, 10, 0, 4, 8, 1]
new_list = []
for i in range(len(data)):
    new_list.append(data[i] * 5)
```

with Numpy we can do this much more succinctly.

Changing arrays inplace

### Doing math

There are many mathematical functions availiable to perform on arrays: https://docs.scipy.org/doc/numpy/reference/ufuncs.html#math-operations

For this we will use [`np.linspace`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy.linspace) to generate arrays of evenly spaced numbers.


In [None]:
# this extends to an arbitrary number of dimesions.



### using numpy to write math.

$$ x^2 \text{ for } x \in [0, 1, 2, ... 9]$$

$$ f(n) = 5 \cdot n  + \frac{6}{n}$$

for $n \in [1, 1.5, 2., ..., 10]$

## Breakout rooms


### 1. A more complex equation

Using `np.linspace` generate an array of 100 evenly spaced points between $0$ and $2 * \pi$. Then calculate the $sin$ of these values.

Hint: `np.pi` gives you the value of $\pi$


### 2. Triangular numbers

Triangular numbers but with Numpy!


The [triangular numbers](https://en.wikipedia.org/wiki/Triangular_number) are given by
$$T(0)=0$$
$$T(1)=1$$
$$T(n)=a(n−1)+n$$

However there is also an explicit formula. 
$$T_n = \frac{n(n+1)}{2}$$

Calculate the first 1000 triangular numbers.
As a reminder the first 5 triangular numbers are: `[0, 1, 3, 6, 10]` and the 1000th triangular number is`499500`.


Hint: A good start is to make the numbers 1-1000 using `np.arange`

### 3. Tetrahedral numbers

The sum of the first N triangular numbers gives the Nth [Tetrahedral number](https://en.wikipedia.org/wiki/Tetrahedral_number).

Using the numpy function [cumsum](https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html) (`np.cumsum`) which stands for cumulative sum calculate the first 999 Tetrahedral numbers.
As a reminder the first 5 elements of this sequence are `1, 4, 10, 20, 35]` and the 999th element is `166666500`


**End of Breakout**

If you're done, help other people in your breakout room - or feel free to come back to the main zoom room.

-----

## More powerful indexing 

### Indexing numpy arrays.

In general this is easier and more powerful than indexing lists.

A good tutorial for this is: https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing

and the definitive reference is: https://numpy.org/doc/stable/reference/arrays.indexing.html

![image.png](attachment:049ac417-a0e8-40cc-ae3d-f7cbdf0c0598.png)

### Taking a whole dimension at once.

It was super annoying to get the 3 row of the second time point. So lets do it using the `:` symbol when we index.

![image.png](attachment:93501078-176f-4ddd-b296-f4babb96bf01.png)

Various selections of portions of numpy arrays.

![image.png](attachment:89b50cee-774f-4d29-a770-5fdf1a11298f.png)

Let's try:

In [None]:
data = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])

In [None]:
# when you get confused a good thing to do is always to look at the shape of the array



### Applying functions along an axis

Many numpy functions accept an `axis` argument. This allows us to control if they act on specific slices of the array, or over the entire array.

<img src="images/simple_agg.png" alt="Basic aggregation operation" width="350"> <img src="images/matrix_aggregation.png" alt="Aggregation along " width="1100">


## Breakout Room


### 1. Conversion and indexing

Turn the `room_temp_3d` into a Numpy array. Then use this array along with indexing to print out the third column of the second time point.

![room-temp-distribution-multi-column.png](attachment:77448d4d-69ba-4008-ae73-1705ea17e1df.png)

### 2. Temperature Conversion

Apply the Fahrenheit to Celsius conversion formulat to this array using numpy broadcasting. As a reminder the formula is

$$C = \frac{F- 32}{1.8}.$$

### 3. Average Temperature

What is the average temperature over all times and all squares.

### 3. Average temperature using `axis`

Using the `axis` argument to `np.mean` figure out:
1. The average surface temperature at each time point
2. The average surface temperature for each grid point averaged over all three time points.

**End of Breakout**

If you're done, help other people in your breakout room - or feel free to come back to the main zoom room.

-----

### More Advanced Indexing

### Indexing Arrays

In [20]:
# not limited to indexing with single numbers or ranges of numbers. We can also select multiple specific points
x = np.arange(10)**2
idx = np.array([2,2, 5, 8])

In [21]:
x[idx]

array([ 4,  4, 25, 64])

In [22]:
idx2 = np.array([[1, 3],[-2, -1]])

In [23]:
x[idx2]

array([[ 1,  9],
       [64, 81]])

### Boolean Indexing

In [24]:
x = np.arange(25).reshape(5,5)

In [None]:
# generate the elements that are odd

In [None]:
#look at a boolean arrray

In [None]:
#get the odd numbers

## Breakout

### 1. Indexing odd elements

Calculate the sum of all numbers with an odd index in the given array below.

Hint: Use `np.arange(start, stop, step)` to generate the array to use for indexing

In [None]:
arr = np.array([ 1, 10, 17, 11,  7, 17,  6, 10,  5, 13, 14,  4, 16,  1,  0,  5,  5,
       16, 11,  8,  1,  5,  6,  9, 16,  5,  9,  0, 18,  5, 10, 15,  9, 14,
        2, 18, 13,  7, 13, 16, 16,  0,  0, 14, 15,  5, 18,  5,  2, 13, 12,
       16,  4, 12,  3,  7,  7, 10,  8,  6, 13,  9, 10,  4,  1,  7, 12, 13,
        2,  2,  6, 16,  6,  4,  3, 17,  8, 10, 18,  8, 17, 10,  7,  4,  1,
        0, 10, 13,  7, 12,  8, 12, 12,  9, 12, 19,  1,  4,  9,  3])

### 2. Indexing even elements - multidimensional

The data we load in the next cell (`new_data`) is a multidimensional array with 3 dimensions. If the first axis represents a time axis (like the room temperature data) what is the sum of all the data at the even numbered time points?

Hint 1: Start out by looking at `new_data.shape`  
Hint 2: Use `np.arange(start, stop, step)` to generate the array to use for indexing



In [16]:
new_data = np.load('data/odd-elem-multi-idx.npy')

In [17]:
new_data.shape

(10, 5, 5)

### 3. Indexing and preserving axes.

Now use array-indexing to extract the even numbered time points in `new_data` and then use `np.mean` to get the 2D average of each time point.

**End of Breakout**

If you're done, help other people in your breakout room - or feel free to come back to the main zoom room.

-----

# Basic plotting

**Matplotlib cheatsheets:** https://github.com/matplotlib/cheatsheets#cheatsheets


**Tutorials** https://matplotlib.org/3.3.4/tutorials/index.html  
**Good goodling phrase:** "How to make a ___ plot in matplotlib"

**Getting help**

For both of these **make sure** you post a [minimal example](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)


1. https://stackoverflow.com/
2. https://discourse.matplotlib.org/
   - More lenient with what is an acceptable question
   - monitored by the Matplotlib devs
   - Slower to get an answer than stackoverflow

### Line Plots

In [None]:
import matplotlib.pyplot as plt

In [48]:
x = np.linspace(0,5, 200) #for making nice plots you probably want to use at least 50 points.

In [49]:
y = np.sin(3*x)*np.exp(-0.5*x)

In [50]:
plt.plot(x, y)
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x163c487c0>]

In [90]:
import tifffile as tiff
import glob

In [91]:
files = sorted(glob.glob('Fluo-N3DH-CHO/01/*'))
cho_arr = tiff.imread(files)

In [93]:
cho_arr.shape

(92, 5, 443, 512)

For future reference, these dimensions refer to (Time, Z, Y, X).

In [33]:
single_im = cho_arr[0,0]

In [None]:
plt.imshow(single_im)
plt.show()

## Breakout rooms

### 1. Plotting a mathematical function

Plot  $ y(t) = e^{-5 \cdot t}$ for $t \in [0,1]$

Hint 1: Use `np.linspace` and `np.exp` to do this.  
Hint 2: Call `plt.plot` multiple times

## 2. Plotting many lines
Plot  $ y(t) = e^{-at}$ for $t \in [0,1]$ for these values of $a$: [1, 5, 10]

Hint: Use `np.linspace` and `np.exp` to do this.
Hint 2: Call `plt.plot` multiple times (maybe inside a for loop)

*Optional*: Matplotlib provides many functions to style your plots. Look into some of them and add them your plot. You might consider adding a [legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html), a [title](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.title.html), or labels on the axes as shown below.

<img src='images/exponentials.png' width=40%>

## 3. Displaying Images

Use indexing to select a single time point from the `cho_data` array. Make one plot showing the *average* projection (where each pixel is averaged over the z values) and another showing the *maximum* projection (where each pixel is the maxium value of its z values). What are the differences between the two projections?

Hint: Thing about using the `axis` argument to some numpy fucntions to accomplish the projections.

**End Breakout Room**


-----

In [28]:
%matplotlib widget
from mpl_interactions import hyperslicer

## `hyperslicer` - Interactive Image Viewer for Jupyter Notebooks

For image analysis projects, the first step is usually to just look at the data you have. If you are coming from an ImageJ/FIJI background, you probably arent thrilled with just plotting single images. However we have written code to enable that functionality directly in a jupyter notebook.

First we need to do a few things.

In [95]:
from mpl_interactions import hyperslicer # IHI wrote most of mpl-interactions you can make all sorts of interactive plots, check it out!

In [96]:
# set matplotlib to use the interactive backend
%matplotlib widget

In [97]:
plt.figure()
ctrls = hyperslicer(cho_arr)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

VBox(children=(HBox(children=(IntSlider(value=0, description='axis0', max=91, readout=False), Label(value='0')…

In [99]:
# Add axes keyword
plt.figure()
ctrls = hyperslicer(cho_arr, names=('Time', 'Z'))

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

VBox(children=(HBox(children=(IntSlider(value=0, description='Time', max=91, readout=False), Label(value='0'))…

## Labelled Arrays with `xarray`

As you have probably experienced with your own image data and we are beginning to see with the example dataset here, it can be annoying to keep track of what all the different dimensions of an array actually mean. In addition, sometimes they have coordinates rather than just integer values (e.g. distances for the XYZ dimensions, names for different channels, time values for time series, etc. 

[`xarray`](http://xarray.pydata.org/en/stable/) adds this capapbility on top of numpy. Today we will focus on the most direct counterpart to numpy arrays, the`DataArray`.

In [54]:
import xarray as xr

In [65]:
# Voxel size (microns): 0.202 x 0.202 x 1.0  Time step (min): 9.5
coords = {'T':9.5*np.arange(cho_data.shape[0]), 'Z':1.0*np.arange(cho_data.shape[1]), 'Y':0.202*np.arange(cho_data.shape[2]), 'X':0.202*np.arange(cho_data.shape[3])}
# In real life, you should write code that reads your metadata and produces this dictionary

In [None]:
coords

In [62]:
cho_data_arr = xr.DataArray(cho_data, dims=coords.keys(), coords=coords)

In [None]:
cho_data_arr

In [None]:
# max project z

In [71]:
#select the point at the two hour mark

In [None]:
#select the an 80x80 micron square

## `hyperslicer` supports `xarray`

In [72]:
plt.figure()
ctrls = hyperslicer(cho_data_arr)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

VBox(children=(HBox(children=(IntSlider(value=0, description='T', max=91, readout=False), Label(value='0.00'))…

### Other resources

**General python introductions**
- [Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/) (Also covers numpy and matplolib)
- [Learn Python](https://www.learnpython.org/)
- [Learn to Code in Python](https://repl.it/talk/learn/Learn-To-Code-In-Python/7485)

**Documentation**

- [Numpy user guide](https://numpy.org/doc/1.20/user/index.html)
- [Xarray docs](http://xarray.pydata.org/en/stable/index.html) -- *NB* `xarray` came out of the earth science community so the documentation is really focused on stuff like global temperature data. It is increasing in popularity among microscopists and hopefully their docs with reflect that eventually.
- [mpl-interactions](https://mpl-interactions.readthedocs.io/en/stable/) -- [hyperslicer tutorial](https://mpl-interactions.readthedocs.io/en/stable/examples/hyperslicer.html)

## Time Permitting: An aside on reusable code

- If you write code to do something useful you should wrap that code into a function. Its worth taking the time to make sure your code is a good balance of flexible and robust before you wrap it into a function. My person rule about this is that if I end up copying something 3 times it needs to go into a function.

In [76]:
def my_function(x,y,z):
    return (4*x-3*y)/(2*z)

In [78]:
my_function(1,2,5)

-0.2

In [79]:
x = np.linspace(0,5, 50)
y = np.linspace(2, 7, 50)
z = np.linspace(1,6, 50)

In [80]:
my_function(x,y,z)

array([-3.        , -2.67592593, -2.40677966, -2.1796875 , -1.98550725,
       -1.81756757, -1.67088608, -1.54166667, -1.42696629, -1.32446809,
       -1.23232323, -1.14903846, -1.0733945 , -1.00438596, -0.94117647,
       -0.88306452, -0.82945736, -0.77985075, -0.73381295, -0.69097222,
       -0.65100671, -0.61363636, -0.57861635, -0.54573171, -0.5147929 ,
       -0.48563218, -0.45810056, -0.43206522, -0.40740741, -0.38402062,
       -0.36180905, -0.34068627, -0.32057416, -0.30140187, -0.28310502,
       -0.265625  , -0.2489083 , -0.23290598, -0.21757322, -0.20286885,
       -0.18875502, -0.17519685, -0.16216216, -0.14962121, -0.13754647,
       -0.12591241, -0.11469534, -0.10387324, -0.09342561, -0.08333333])

- If you have several functions, you should put them in a `.py` file and import them. (You should also write some documentation so you and others can know what functions do.)
  - With this simple setup, you need to have the `.py` file in the same directory as your notebook. 

In [81]:
from myfuncs import normalize

In [84]:
v = np.random.randn(5)

In [86]:
v

array([-0.87675366,  1.58035457, -0.44198652,  0.71796606,  1.73925941])

In [85]:
normalize(v)

array([-0.33133279,  0.59722966, -0.16703053,  0.27132558,  0.65728118])

- If you want to use your functions across different projects or distribute it to other people, you should make a `module`

1. Make a directory and put your `.py` files inside.
1. Make a file called `__init__.py` and `import *` from each of your files
1. In the folder that contains the module directory make a file called `setup.py`. In this file you need to 
    1. `from setuptools import setup`
    1. Run the setup command with the relevant info. See example.
1. Finally to install the package navigate to the directory containing `setup.py` and at a command prompt run `pip install -e .`. This installs the module in editable mode so any changes you make to the contents will be reflected in code that imports the module.
1. At this point, you could also post your project on something like github so other people could find and use it.

Here is a [good tutorial](https://betterscientificsoftware.github.io/python-for-hpc/tutorials/python-pypi-packaging/#creating-a-python-package) for making a package installable. It even goes into how to submit your project to [PyPI](https://pypi.org/).