# Numerical Python with numpy

NumPy ('Numerical Python') is the defacto standard module for doing numerical work in Python. Its main feature is its array data type which allows very compact and efficient storage of homogenous (of the same type) data.


A lot of the material in this section is based on [SciPy Lecture Notes](http://www.scipy-lectures.org/intro/numpy/array_object.html) ([CC-by 4.0](http://www.scipy-lectures.org/preface.html#license)).

As you go through this material, you'll likely find it useful to refer to the [NumPy documentation](https://docs.scipy.org/doc/numpy/), particularly the [array objects](https://docs.scipy.org/doc/numpy/reference/arrays.html) section.

There is a standard convention for importing `numpy`, and that is as `np`:

In [2]:
import numpy as np

Now that we have access to the `numpy` package we can start using its features.

## Creating arrays

In many ways a NumPy array can be treated like a standard Python `list` and much of the way you interact with it is identical. Given a list, you can create an array as follows:

In [3]:
python_list = [1, 2, 3, 4, 5, 6, 7, 8]
numpy_array = np.array(python_list)
print(numpy_array)

[1 2 3 4 5 6 7 8]


In [4]:
numpy_array

array([1, 2, 3, 4, 5, 6, 7, 8])

In [5]:
type(numpy_array)

numpy.ndarray

In [4]:
#The NumPy array object has a property called dtype that returns the data type of the array:
    
arr = np.array([1, 2, 3, 4])

print(arr.dtype) 

int32


In [5]:
arr = np.array(['apple', 'banana', 'cherry'])

print(arr.dtype) 

# U - unicode string

<U6


#### NumPy are Faster Than Lists

NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.
This behavior is called locality of reference in computer science.
This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.

## dimensions in array

### 1-D array

In [6]:
# ndim give the number of dimensions
numpy_array.ndim

1

In [7]:
# the shape of an array is a tuple of its length in each dimension. In this case it is only 1-dimensional
numpy_array.shape

(8,)

In [8]:
# as in standard Python, len() gives a sensible answer
len(numpy_array)

8

### 2-D array

An array that has 1-D arrays as its elements is called a 2-D array.
These are often used to represent matrix or 2nd order tensors.

In [9]:
nested_list = [[1, 2, 3], [4, 5, 6]]
two_dim_array = np.array(nested_list)
print(two_dim_array)

[[1 2 3]
 [4 5 6]]


In [11]:
nested_list

[[1, 2, 3], [4, 5, 6]]

In [12]:
two_dim_array.ndim

2

In [13]:
two_dim_array.shape

(2, 3)

## Access Array Element

You can access an array element by referring to its index number.

The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.

In [14]:
arr = np.array([1, 2, 3, 4])

print(arr[0]) 

1


In [15]:
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1]) 

2nd element on 1st row:  2


In [16]:
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('5th element on 2nd row: ', arr[1, 4]) 

5th element on 2nd row:  10


In [17]:
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('Last element from 2nd dim: ', arr[1, -1]) 

Last element from 2nd dim:  10


In [16]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5]) 

[2 3 4 5]


In [17]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[4:]) 

[5 6 7]


In [18]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5:2]) 

[2 4]


In [19]:
arr[0]=10

arr

array([10,  2,  3,  4,  5,  6,  7])

## copy

In [20]:
new_arr = arr

In [21]:
arr[0]=100
arr

array([100,   2,   3,   4,   5,   6,   7])

In [22]:
new_arr

array([100,   2,   3,   4,   5,   6,   7])

In [23]:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42

print(arr)
print(x) 

[42  2  3  4  5]
[1 2 3 4 5]


## creating array

It's very common when working with data to not have it already in a Python list but rather to want to create some data from scratch. `numpy` comes with a whole suite of functions for creating arrays. We will now run through some of the most commonly used.

The first is `np.arange` (meaning "array range") which works in a vary similar fashion the the standard Python `range()` function, including how it defaults to starting from zero, doesn't include the number at the top of the range and how it allows you to specify a 'step:

In [20]:
np.arange(20) #0 .. n-1  (!)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [21]:
np.arange(1, 9, 2) # start, end (exclusive), step

array([1, 3, 5, 7])

Next up is the `np.linspace` (meaning "linear space") which generates a given floating point numbers starting from the first argument up to the second argument. The third argument defines how many numbers to create:

In [22]:
np.linspace(0, 1, 6)   # start, end, num-points

array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

Note how it included the end point unlike `arange()`. You can change this feature by using the `endpoint` argument:

In [27]:
np.linspace(0, 1, 5, endpoint=False)

array([0. , 0.2, 0.4, 0.6, 0.8])

`np.ones` creates an n-dimensional array filled with the value `1.0`. The argument you give to the function defines the shape of the array:

In [23]:
np.ones((3, 3))  # reminder: (3, 3) is a tuple

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Likewise, you can create an array of any size filled with zeros:

In [29]:
np.zeros((2, 2))

array([[0., 0.],
       [0., 0.]])

The `np.eye` (referring to the matematical identity matrix, commonly labelled as `I`) creates a square matrix of a given size with `1.0` on the diagonal and `0.0` elsewhere:

In [24]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

The `np.diag` creates a square matrix with the given values on the diagonal and `0.0` elsewhere:

In [25]:
np.diag([1, 2, 3, 4])

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

Finally, you can fill an array with random numbers:

In [32]:
np.random.rand(4)  # uniform in [0, 1]

array([0.59139593, 0.79439402, 0.26235044, 0.50934246])

In [33]:
np.random.randn(4)  # Gaussian

array([-0.516018  , -0.1053697 ,  0.25207324, -0.3242579 ])

We are going to see more in the next lesson

## Reshaping arrays

Behind the scenes, a multi-dimensional NumPy `array` is just stored as a linear segment of memory. The fact that it is presented as having more than one dimension is simply a layer on top of that (sometimes called a *view*). This means that we can simply change that interpretive layer and change the shape of an array very quickly (i.e without NumPy having to copy any data around).

This is mostly done with the `reshape()` method on the array object:

In [34]:
my_array = np.arange(16)
my_array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [35]:
my_array.shape

(16,)

In [36]:
my_array.reshape((2, 8))

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])

In [37]:
my_array.reshape((4, 4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Note that if you check, `my_array.shape` will still return `(16,)` as `reshaped` is simply a *view* on the original data, it hasn't actually *changed* it. If you want to edit the original object in-place then you can use the `resize()` method.

You can also transpose an array using the `transpose()` method which mirrors the array along its diagonal:

In [38]:
my_array.reshape((4,4)).transpose()

array([[ 0,  4,  8, 12],
       [ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15]])

In [39]:
my_array.reshape((2, 8))

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])

In [40]:
my_array.reshape((2, 8)).transpose()

array([[ 0,  8],
       [ 1,  9],
       [ 2, 10],
       [ 3, 11],
       [ 4, 12],
       [ 5, 13],
       [ 6, 14],
       [ 7, 15]])

## Joining NumPy Arrays

In [26]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr) 

[1 2 3 4 5 6]


In [42]:
arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr) 

[[1 2 5 6]
 [3 4 7 8]]


In [43]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr) 

[[1 4]
 [2 5]
 [3 6]]


## searching arrays

In [44]:
arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)

print(x) 

(array([3, 5, 6], dtype=int64),)


## sorting arrays

In [45]:
arr = np.array([3, 2, 0, 1])

print(np.sort(arr)) 

[0 1 2 3]


## Numpy operations

In [27]:
my_list = [3, 6, 8, 4, 10]

If you wanted to double every entry you might try simply multiplying the list by `2`:

In [29]:
my_list * 3

[3, 6, 8, 4, 10, 3, 6, 8, 4, 10, 3, 6, 8, 4, 10]

but as you can see, that simply duplicated the elements. Instead you would have to use a `for` loop or a list comprehension:

In [48]:
[i * 2 for i in my_list]

[6, 12, 16, 8, 20]

With a numpy array, however, you can perform bulk mathematical operations to the whole series in one go:

In [30]:
my_nparray = np.array(my_list)
print(my_nparray)

[ 3  6  8  4 10]


In [31]:
my_nparray * 2

array([ 6, 12, 16,  8, 20])

## Multi-array operations

It is also possible to perform operations between two `numpy array` objects:

In [35]:
s2 = np.array([23,5,34,7,5])
s3 = np.array([7, 6, 5,4,3])

In [36]:
s2 - s3

array([16, -1, 29,  3,  2])

In [37]:
s3*s3

array([49, 36, 25, 16,  9])

## selections

In [32]:
s = np.array([-1, -3, 0, -5, 2, 3])

In [34]:
s < 0

array([ True,  True, False,  True, False, False])

As well as bulk modifications, you can perform bulk selections by putting more complex statements in the square brackets:

In [33]:
s[s < 0]  # All negative entries

array([-1, -3, -5])

In [53]:
s[(s * 2) > 4]  # All entries which, when doubled are greater than 4

array([3])

These operations work because the numpy selection can be passed a series of `True` and `False` values which it then uses to filter the result:

In [54]:
(s * 2) > 4

array([False, False, False, False, False,  True])