# Introduction to Scientifc Python

This worksheet gives examples and exercises for the `numpy`, `pandas` and `matplotlib` modules covered in the Scientific Python session. As well as the exercises you can also experiment with any topics you don't feel you understand.

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Numpy

Numpy is the basis of much of the Scientific Python ecosystem. It provides:
* The efficient array structure
* Fine control of numeric types
* Rapid vectorised computations
* Fast numeric function implementations in C, C++ or Fortran

You will usually see it imported as `np`, to save typing. It is usually best to stick to this convention.

In [3]:
import numpy as np

### Arrays
The `np.array` is an ordered data container similar to a list, but where all its members must be the same data type and with the ability to support multiple dimensions. It is created by passing list-like data:

In [5]:
# A 1d array
np.array([1,2,3,4,5])

# A 2d array
np.array([[1,2],
          [3,4]])

# A 3d array
np.array([[[1,2],
           [3,4]],
          [[5,6],
           [7,8]]])

array([1, 2, 3, 4, 5])

array([[1, 2],
       [3, 4]])

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

An array has an associated data type, which can be accessed using its `dtype` attribute.

In [27]:
x = np.array([1,2,3,4])
x.dtype

dtype('int64')

This defaults to `int64` or `float64` if you give python `int` or `float` values, which means each number uses 64 bits of memory. Sometimes you would prefer to use a different type to save memory or time. See [here](https://docs.scipy.org/doc/numpy-1.15.0/user/basics.types.html) for  a list and description of types. You should take some care when assigning types - using a type with the wrong range leads to errors. **Exercise**: Work out why each of the following result happens:

In [35]:
# Default array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512])

array([-512, -256, -128,  -64,  -32,  -16,   -8,   -4,   -2,   -1,    0,
          1,    2,    4,    8,   16,   32,   64,  128,  256,  512])

In [36]:
# Boolean Array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512], dtype=np.bool)

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True])

In [37]:
# Signed 16bit Integer array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512], dtype=np.int16)

array([-512, -256, -128,  -64,  -32,  -16,   -8,   -4,   -2,   -1,    0,
          1,    2,    4,    8,   16,   32,   64,  128,  256,  512],
      dtype=int16)

In [38]:
# Unsigned 16-bit integer array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512], dtype=np.uint16)

array([65024, 65280, 65408, 65472, 65504, 65520, 65528, 65532, 65534,
       65535,     0,     1,     2,     4,     8,    16,    32,    64,
         128,   256,   512], dtype=uint16)

In [39]:
# Signed 8-bit integer array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512], dtype=np.int8)

array([   0,    0, -128,  -64,  -32,  -16,   -8,   -4,   -2,   -1,    0,
          1,    2,    4,    8,   16,   32,   64, -128,    0,    0],
      dtype=int8)

In [40]:
# Unsigned 8-bit integer array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512], dtype=np.uint8)

array([  0,   0, 128, 192, 224, 240, 248, 252, 254, 255,   0,   1,   2,
         4,   8,  16,  32,  64, 128,   0,   0], dtype=uint8)

In [41]:
# Complex array
np.array([-512, -256, -128, -64, -32, -16, -8, -4, -2, -1, 0,
          1, 2, 4, 8, 16, 32, 64, 128, 256, 512], dtype=np.complex64)

array([-512.+0.j, -256.+0.j, -128.+0.j,  -64.+0.j,  -32.+0.j,  -16.+0.j,
         -8.+0.j,   -4.+0.j,   -2.+0.j,   -1.+0.j,    0.+0.j,    1.+0.j,
          2.+0.j,    4.+0.j,    8.+0.j,   16.+0.j,   32.+0.j,   64.+0.j,
        128.+0.j,  256.+0.j,  512.+0.j], dtype=complex64)

Numpy provides functions to create various common arrays easily:
* Zeros
* Ones
* Random numbers (e.g. `np.random.rand` or `np.random.randint`)
* Identity matrices
* Evenly spaced arrays
The first 3 allow creation of arbitary sized arrays, while the latter has only one dimension:

In [42]:
np.zeros(10)
np.ones((2,3))
np.random.randint(10, size=(3,3))
np.identity(3)
np.arange(2, 5, 0.5)
np.linspace(0,10, 4)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

array([[1., 1., 1.],
       [1., 1., 1.]])

array([[8, 4, 5],
       [1, 0, 0],
       [4, 0, 5]])

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

array([2. , 2.5, 3. , 3.5, 4. , 4.5])

array([ 0.        ,  3.33333333,  6.66666667, 10.        ])

You are also able to define the data type of arrays you create:

In [21]:
np.array([1,2,3], dtype=np.float16)
np.zeros((2,2), dtype=np.bool)

array([1., 2., 3.], dtype=float16)

array([[False, False],
       [False, False]])

**Exercise:** Create a 3 dimensional array, consisting of a 4x5 matrix of 0s, a matrix of 1s and a  matrix of random integers between 0 and 1000. Try making the data type np.int8 and np.uint8, what happens?

### Array indexing
Array indexing works similarly to lists, using square brackets. Slicing using the `:` operator allows you to extract sections of the arrays.

In [47]:
x = np.identity(6)
x
x[2,2]
x[1:4, 2:5]

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

1.0

array([[0., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.]])

If a dimension is ommited it is assumed you want all values from it (equivalent to a bare`:`):

In [51]:
x[3]
x[3,:]
x[:,3]

array([0., 0., 0., 1., 0., 0.])

array([0., 0., 0., 1., 0., 0.])

array([0., 0., 0., 1., 0., 0.])

This makes it possible to index using multiple square brackets, accessing one dimension at a time. However this is much less efficient that using a single pair of brackets.

### Vectorised Opperations
Numpy arrays allow you to apply opperations across all elements in an optimized way, including providing highly optimized versions of various maths functions that act in this manner. This allows you to greatly speed up numerical computations:

In [59]:
x = np.arange(0,10)
x
x + 1
x ** 2
np.sin(x)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

When operating using multiple vectors they are operated on elementwise:

In [68]:
y = np.arange(9,-1,-1)
y
x + y
x * y

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

array([9, 9, 9, 9, 9, 9, 9, 9, 9, 9])

array([ 0,  8, 14, 18, 20, 20, 18, 14,  8,  0])

If arrays are different sizes they cannot generally be combined. Numpy compares dimension pair sizes and to be compatible they must either be equal or one must be 1 (this is essentially what happens when working with a scalar).

In [71]:
# a (2,1) and a (10,1) vector aren't compatible because 10 != 2
z = np.array([0,1])
x * z

ValueError: operands could not be broadcast together with shapes (10,) (2,) 

In [76]:
# However a (10,1) and a (1,2) vector are
z = np.array([[0],[1]])
x * z

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

**Exercises**: Compute
* $e^x$
* $|x|$
* Add 1 to the 2nd and 4th rows
* Multiply the 2nd and 4th columns by 0

In [92]:
x = np.random.randn(4,5)
x

array([[ 0.00773649, -0.57822279, -0.15956535,  1.33015332, -1.07134819],
       [ 1.13572138, -1.31065936,  0.56672863, -0.63675607,  0.60453666],
       [-0.02026833, -0.54923799, -0.80499339,  0.55612773,  0.24955838],
       [ 0.60136567,  1.74728499, -0.90063259,  1.39707128, -1.73418691]])

## Pandas

Further details on `pandas` can be found in the [10 Min Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html) tutorial and the wider [documentation](http://pandas.pydata.org/pandas-docs/stable/).

## Matplotlib