# Modules in Python

One of the advantages of Python that makes it so versatile for a wide range of tasks is the broad ecosystem of tools and packages that offer more specialized functionality on top of the "bare" Python.

## Loading Modules: the ``import`` Statement

For loading built-in and third-party modules, Python provides the ``import`` statement.

#### <font color='green'>Good</font>
import <font color='green'>sys</font>

from os import <font color='green'>path</font>

import statistics <font color='green'>as stats</font>

from custom_package import <font color='green'>mode</font>

from statistics import <font color='green'>mean, median</font>

#### <font color='red'>Bad:</font> silently overwrites previous imports
from math import <font color='red'><b>*</b></font>

from pylab import <font color='red'><b>*</b></font>

You can also use widgets in python. The example below (the tqdm module) gives a live progress bar as your code runs.

In [1]:
from tqdm.notebook import tqdm # import the required package

#then use this package in our code
list(tqdm(range(0, 10000000, 2)))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5000000.0), HTML(value='')))




[0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48,
 50,
 52,
 54,
 56,
 58,
 60,
 62,
 64,
 66,
 68,
 70,
 72,
 74,
 76,
 78,
 80,
 82,
 84,
 86,
 88,
 90,
 92,
 94,
 96,
 98,
 100,
 102,
 104,
 106,
 108,
 110,
 112,
 114,
 116,
 118,
 120,
 122,
 124,
 126,
 128,
 130,
 132,
 134,
 136,
 138,
 140,
 142,
 144,
 146,
 148,
 150,
 152,
 154,
 156,
 158,
 160,
 162,
 164,
 166,
 168,
 170,
 172,
 174,
 176,
 178,
 180,
 182,
 184,
 186,
 188,
 190,
 192,
 194,
 196,
 198,
 200,
 202,
 204,
 206,
 208,
 210,
 212,
 214,
 216,
 218,
 220,
 222,
 224,
 226,
 228,
 230,
 232,
 234,
 236,
 238,
 240,
 242,
 244,
 246,
 248,
 250,
 252,
 254,
 256,
 258,
 260,
 262,
 264,
 266,
 268,
 270,
 272,
 274,
 276,
 278,
 280,
 282,
 284,
 286,
 288,
 290,
 292,
 294,
 296,
 298,
 300,
 302,
 304,
 306,
 308,
 310,
 312,
 314,
 316,
 318,
 320,
 322,
 324,
 326,
 328,
 330,
 332,
 334,
 336,
 338,
 340,
 342,
 344,
 346,
 348,
 350,

For today we will import the **NumPy** module. A powerful and flexible maths package

In [2]:
import numpy as np # Because python users are too lazy to write numpy every time

# ![](http://www.numpy.org/_static/numpy_logo.png) 
##### NumPy supports arrays which are very useful to numerical computations
* Arrays are N dimensional: 1d (vector), 2d (plane),...,N dim
* Arrays are (generally) faster than lists
* Many packages use numpy arrays to store data
* Arrays can be used to make calculations in one command, without `for` loops or list compreension

### Looking for help?

* Documentation: http://docs.scipy.org/doc/numpy/reference/
* Google is your friend! Especially links to Stack Overflow "how do I create an empty array in numpy"
* Use the help function (tab will show options available)

In [4]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



### Creating an array from a list

In [5]:
a1d = np.array([3, 4, 5, 6])
a1d

array([3, 4, 5, 6])

In [6]:
a2d = np.array([[10.,   20, 30], [9, 8, 5]])
a2d

array([[10., 20., 30.],
       [ 9.,  8.,  5.]])

Slicing works much like lists, with the different dimenstions of the array seperated by commas. Can you guess what the following slices are equal to? Print them to check your understanding.

In [7]:
a2d[0,0]

10.0

In [8]:
a2d[0,1:]

array([20., 30.])

In [9]:
a2d[:,2]

array([30.,  5.])

**Excercise** Create a 2D NumPy array from the following list and assign it to the variable "a":

In [11]:
a=np.array([[2, 3.2, 5.5, -6.4, -2.2, 2.4],
  [1, 22, 4, 0.1, 5.3, -9],
  [3, 1, 2.1, 21, 1.1, -2]])

**Excercise** Using indexes: Calculate the difference between adjacent items in a list **without** using a loop

In [None]:
x = np.array([1, 2, 3, 4, 5])
# Your code here

### Array attributes

In [12]:
a2d

array([[10., 20., 30.],
       [ 9.,  8.,  5.]])

#### ndarray.ndim
the number of axes (dimensions) of the array. In NumPy, the number of dimensions is referred to as rank.

In [13]:
a2d.ndim

2

#### ndarray.shape
the dimensions of the array

In [14]:
a2d.shape

(2, 3)

### Functions for creating arrays
#### ``arange([start,] stop[, step,], dtype=None)``
#### evenly spaced, defined by step

In [15]:
np.arange(1, 9, 2)

array([1, 3, 5, 7])

###### ``linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)``


#### evenly spaced, defined by length

In [16]:
np.linspace(0, 1, 11)   # start, end, number of points

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

**Excercise**

Create arrays of evenly spaced numbers

In [None]:
# Numbers from 1 to 10 in steps of 1


In [None]:
# From 0 to -2 in steps of -0.4


In [None]:
# 100 steps from - pi to pi (hint, use np.pi)


####  Create array filled with zeros

In [17]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

#### Creates array with random numbers

In [20]:
np.random.rand(4)       # From a uniform distribution beween 0 and 1

array([0.68206639, 0.37062281, 0.49299833, 0.79441821])

In [21]:
np.random.normal(0,1,size=4)      # Gaussian (mean,std dev, num samples)

array([ 0.50940507, -0.97419311, -1.02140414, -2.10387764])

In [23]:
np.random.randint(-10,10,size=(5,5)) # Random integers in a specified range
# How does this function work? Try uncommenting the next line to read the documentation for this function
np.random.randint?

#### Grid generation
* A common task is to generate a pair of arrays that represent data coordinates. 
* Useful for interpolation of mapping contours.
* When orthogonal 1D coordinate arrays already exist, NumPy's `meshgrid` function is very useful:
* A great explanation on how `meshgrid` works can be found [here](https://www.geeksforgeeks.org/numpy-meshgrid-function/)

In [24]:
x = np.linspace(-5, 5, 3)
y = np.linspace(10, 40, 4)
print(x)
print(y)

[-5.  0.  5.]
[10. 20. 30. 40.]


In [25]:
x2d, y2d = np.meshgrid(x, y)
print(x2d)

[[-5.  0.  5.]
 [-5.  0.  5.]
 [-5.  0.  5.]
 [-5.  0.  5.]]


In [26]:
print(y2d)

[[10. 10. 10.]
 [20. 20. 20.]
 [30. 30. 30.]
 [40. 40. 40.]]


Transpose arays with .T

In [27]:
y2d.T

array([[10., 20., 30., 40.],
       [10., 20., 30., 40.],
       [10., 20., 30., 40.]])

### Statistical methods of arrays

In [28]:
print('array a1d                       :', a1d)
print('Minimum and maximum             :', a1d.min(), a1d.max())
print('Index of minimum and maximum    :', a1d.argmin(), a1d.argmax())
print('Sum and product of all elements :', a1d.sum(), a1d.prod())
print('Mean and standard deviation     :', a1d.mean(), a1d.std())
print('Median and 75 percentile           :', np.median(a1d), np.percentile(a1d,75))

array a1d                       : [3 4 5 6]
Minimum and maximum             : 3 6
Index of minimum and maximum    : 0 3
Sum and product of all elements : 18 360
Mean and standard deviation     : 4.5 1.118033988749895
Median and 75 percentile           : 4.5 5.25


### Operations over a given axis

In [29]:
print(a2d)
print('sum array  :',a2d.sum())
print('sum axis 0  :',a2d.sum(axis=0))
print('sum axis 1 :',a2d.sum(axis=1))

[[10. 20. 30.]
 [ 9.  8.  5.]]
sum array  : 82.0
sum axis 0  : [19. 28. 35.]
sum axis 1 : [60. 22.]


**Excercise** Using the array 'a' we created earlier, find: 
* The maximum value
* The 90th percentile 
* The mean along axis 0
* The sum along axis 1

(If you haven't made 'a' yet uncomment and run the following cell)

In [None]:
#a = np.array([[2, 3.2, 5.5, -6.4, -2.2, 2.4],
#              [1, 22, 4, 0.1, 5.3, -9],
#              [3, 1, 2.1, 21, 1.1, -2]])

In [None]:
# Your code here

### Vectorisation: operations on whole arrays

In [30]:
a = np.array([1,2,3,4])
print(a)

[1 2 3 4]


In [31]:
a**2

array([ 1,  4,  9, 16])

In [32]:
np.exp(a) # e raised to the power of a

array([ 2.71828183,  7.3890561 , 20.08553692, 54.59815003])

In [33]:
b = np.array([1, 10, 100, 1000])
a*b

array([   1,   20,  300, 4000])

All the maths we could apply to *ints*, *floats* and *complex* numbers individually we can now apply to arbitrarily large and complex *arrays* of numbers using NumPy. Let's revisit our function for calculating pressure from depth

In [34]:
def under_pressure(d,rho = 1027.5, g = 9.81):
    P = rho*g*d
    return P

You cannot perform a function on an entire list:

In [35]:
depths_list = [10,20,30,40,50]
pressures = under_pressure(depths_list)
print(pressures)

TypeError: can't multiply sequence by non-int of type 'float'

It is possible to loop through the elements of the list. For big datasets this would take a long time, so isnt advisable:

In [36]:
pressures=[] #create an empty list to hold the output
depths_list = [10,20,30,40,50]
for i in depths_list:
    pressures.append(under_pressure(i))
print(pressures)

[100797.75, 201595.5, 302393.25, 403191.0, 503988.75]


Using Python's standard data types we had to pass depths to the function one at a time to get pressures (above). With numpy we can use vectorisation to get a whole array of pressure values simply by passing an array of depths to the function!

In [37]:
depths_array = np.array([10,20,30,40,50])
pressures = under_pressure(depths_array)
print(pressures)

[100797.75 201595.5  302393.25 403191.   503988.75]


## Shape manipulation

In [38]:
b = np.array([[1, 2, 3], [4, 5, 6]])

In [39]:
b.flatten()

array([1, 2, 3, 4, 5, 6])

In [40]:
b.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [41]:
b.repeat(3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6])

In [43]:
np.tile(b,(3,2))

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6],
       [1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6],
       [1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

## Pointers revisited
### Copies vs. in-place operations


From help(numpy):

<code>
Most of the functions in `numpy` return a copy of the array argument
(e.g., `np.sort`).  In-place versions of these functions are often
available as array methods, i.e. ``x = np.array([1,2,3]); x.sort()``.
Exceptions to this rule are documented.
</code>

In [44]:
foo = np.array([99,98,97])
bar = foo
# Method sort()
foo.sort()

In [45]:
print(foo)

[97 98 99]


In [46]:
print(bar)

[97 98 99]


Using the inbuild method var.sort() on `foo` has changed `bar`

In [47]:
foo = np.array([99,98,97])
bar = foo
# Function sort
foo = np.sort(foo)

In [48]:
print(foo)

[97 98 99]


In [49]:
print(bar)

[99 98 97]


using the function np.sort() `bar` remains unchanged

### If you are ever unsure

Prefer use of **functions** that take the form `module.function(variable)` e.g. ` np.sort(variable`

to use of **methods** that take the form `variable.method()` e.g. `variable.sort()`

or use `copy` when making copies of variables to be safe