# Tutorial 2: Introducing NumPy

> COM2004/COM3004


*Copyright &copy; 2022 University of Sheffield. All rights reserved*.

## What is NumPy?

NumPy is a core Python package for scientific computing

* provides a powerful N-dimensional array object
* highly optimised linear algebra tools
* tight integration with C/C++ and Fortran code
* licensed under a BSD license, i.e., freely reusable

## Importing NumPy

Conventionally imported using,

In [1]:
import numpy as np

The Python keyword 'as' here allows us to use 'np' as a shorthand to refer to the 'numpy' module

## Generating a NumPy ndarray

`numpy.ndarray` is a type for representing N-Dimensional arrays.

Arrays can be generated either,

* from Python lists containing numeric data
* using NumPy array generating functions
* reading data from a file

## Generating arrays from lists

In [2]:
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)  # create a simple 1-D array
print(my_list)
print(my_array)
print(type(my_list))
type(my_array)

[1, 2, 3, 4, 5]
[1 2 3 4 5]
<class 'list'>


numpy.ndarray

In [3]:
my_2d_array = np.array([[1., 2, 3], [4, 5, 6], [7, 8, 9]])
print(my_2d_array)
type(my_2d_array)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


numpy.ndarray

## The ndarray object's properties

The `ndarray` has various properties that we can access,

In [4]:
my_2d_array.shape

(3, 3)

In [5]:
my_2d_array.size

9

In [6]:
my_2d_array.dtype

dtype('float64')

## N-dimensional arrays

Note, NumPy generalises arrays to be N-dimensional.

In [15]:
x2 = np.array([[1, 2], [3, 4]])  # a matrix
x3 = np.array([x2, x2, x2, x2])  # stacking two matrices
x3.shape

(4, 2, 2)

In [8]:
x4 = np.array([x3, x3, x3, x3, x3])  # stacking 5 3-D structures
x4.shape

(5, 4, 2, 2)

In [9]:
x5 = np.array([x4, x4])  # stacking 2 4-D structures!
x5.shape

(2, 5, 4, 2, 2)

But in COM2004/3004 we will only be using N=1 (vectors) and N=2 (matrices).

## Array generating functions

In [10]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:
np.arange(100, 110, 2)  # start, stop, step

array([100, 102, 104, 106, 108])

In [12]:
np.linspace(10, 20, 5)  # start, stop, n-points

array([10. , 12.5, 15. , 17.5, 20. ])

In [13]:
np.zeros( (3, 3, 3) )  # Note, argument is a tuple

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

In [16]:
np.ones((2, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

## More array generating functions

In [17]:
np.diag((4, 5, 3))

array([[4, 0, 0],
       [0, 5, 0],
       [0, 0, 3]])

In [18]:
np.diag((2, 2), k=3)

array([[0, 0, 0, 2, 0],
       [0, 0, 0, 0, 2],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [19]:
np.diag((1, 1, 1)) + np.diag((2, 2), k=1) + np.diag((2, 2), k=-1)

array([[1, 2, 0],
       [2, 1, 2],
       [0, 2, 1]])

In [20]:
np.eye(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

## Arrays initialised with random numbers

In [21]:
np.random.rand(2, 4)  # uniform distribution between 0 and 1

array([[0.61162342, 0.95058022, 0.67430102, 0.2864488 ],
       [0.38936561, 0.4023055 , 0.68442083, 0.4014147 ]])

In [22]:
np.random.randn(2, 4)  # standard normal distribution

array([[-0.60599879,  0.48680762,  0.25610573, -1.67762703],
       [ 0.11991801,  0.56506709, -1.17806031,  0.38213235]])

## Reading arrays from files

* `genfromtxt` and `savetxt` for reading and writing to text files.
* `load` and `save` for reading and writing in NumPy's native format.

In [23]:
%more data/liver_data_20.txt

FileNotFoundError: [Errno 2] No such file or directory: 'data/liver_data_20.txt'

In [24]:
x = np.genfromtxt("data/liver_data_20.txt", delimiter=",")  # for reading a csv file
#x = np.loadtxt("data/liver_data_20.txt", delimiter=",", dtype=float)
print(x)

FileNotFoundError: data/liver_data_20.txt not found.

In [None]:
np.savetxt("data/matrix.csv", x, delimiter="_", fmt="%.5f")

In [None]:
%more data/matrix.csv

## Array manipulation

indexing is similar to Python lists

In [25]:
x = np.array([1, 2, 3, 4, 5, 6, 7])
print(x[0])
print(x[2:5])
print(x[:4])
print(x[4:])

1
[3 4 5]
[1 2 3 4]
[5 6 7]


But it is generalised to n-dimensions

In [26]:
x = np.random.rand(5, 5)
print(x[2:4, :2])

[[0.76127591 0.42831979]
 [0.70924863 0.0274972 ]]


## Extracting a row or column vector from a matrix

In [27]:
A = np.genfromtxt("data/test_matrix.txt")
print(A)

FileNotFoundError: data/test_matrix.txt not found.

In [28]:
print(A[2, 1:4])  # extract row 2  (can also write as A[2])
print(A[2, 1:4].shape)

NameError: name 'A' is not defined

In [None]:
print(A[:, 2])  # extract column 2
print(A[:, 2].shape)

## Data processing

The NumPy ndarray object has many methods.

e.g., `min`, `max`, `sum`, `product`, `mean`

In [29]:
x = np.array([1, 2, 3, 4, 5, 6])

In [30]:
x.min(), x.max()

(1, 6)

In [31]:
x.sum(), x.prod()

(21, 720)

In [32]:
x.mean(), x.var()

(3.5, 2.9166666666666665)

## Data processing on matrices

In [33]:
A = np.genfromtxt("data/test_matrix.txt")
print(A)

FileNotFoundError: data/test_matrix.txt not found.

In [None]:
A.min(), A.max()

In [None]:
A.mean(axis=0).shape  #, A.sum(), A.shape

## Reshaping and resizing

It's sometimes necessary to wrap a vector into a matrix or unwrap a matrix into a vector

In [34]:
M = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(3, 3)
print(M)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [35]:
v = M.reshape(9)
print(v)

[1 2 3 4 5 6 7 8 9]


In [None]:
# The following line will generate an error
# because reshape cannot change the number of elements

#  v = M.reshape(8)

## Adding a new dimension

Can easily turn 1-D vectors in 2-D matrices

In [36]:
v = np.array([1, 2, 3, 4, 5])
print(v)
print(v.shape)

[1 2 3 4 5]
(5,)


In [37]:
v_row = v[np.newaxis, :]  # turn a vector into a 1-row matrix
print(v_row)
print(v_row.shape)

[[1 2 3 4 5]]
(1, 5)


In [39]:
v_col = v[:, np.newaxis]  # turn a vector into a 1-column matrix
print(v_col)
print(v_col.shape)

[[1]
 [2]
 [3]
 [4]
 [5]]
(5, 1)


## Stacking arrays

Arrays with compatible dimensions can be joined horizontally or vertically

In [40]:
x = np.ones((2, 3))
y = np.zeros((2, 2))
z = np.hstack((x, y, x))  # note, arrays passed as a tuple
print(x)
print(y)
print(z)
print(z.shape)

[[1. 1. 1.]
 [1. 1. 1.]]
[[0. 0.]
 [0. 0.]]
[[1. 1. 1. 0. 0. 1. 1. 1.]
 [1. 1. 1. 0. 0. 1. 1. 1.]]
(2, 8)


In [41]:
x = np.ones((2, 2))
y = np.zeros((1, 2))
z = np.vstack((x, y))
print(z)
print(z.shape)

[[1. 1.]
 [1. 1.]
 [0. 0.]]
(3, 2)


## Tiling and repeating

In [42]:
x = np.array([[1, 2], [3, 4]])
np.tile(x, 3)

array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

In [43]:
np.tile(x, (2, 4))

array([[1, 2, 1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4, 3, 4],
       [1, 2, 1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4, 3, 4]])

In [44]:
np.repeat(x, 4)

array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])

In [45]:
np.repeat(x, 4, axis=0)

array([[1, 2],
       [1, 2],
       [1, 2],
       [1, 2],
       [3, 4],
       [3, 4],
       [3, 4],
       [3, 4]])

## Copying

Arrays are handled by reference.

When you do `A = B` you are just copying a reference, not the data itself.

In [47]:
A = np.array([1, 2, 3, 4, 5, 6])
B = A
B[0] = 10
print(A)

[10  2  3  4  5  6]


Note, this is also true for Python lists and objects in general

In [48]:
A = [1, 2, 3, 4]
B = A
B[0] = 10
print(A)

[10, 2, 3, 4]


So how do we make a real copy?

## Deep Copy

To actually copy the data stored in the array we use the NumPy copy method,

In [49]:
A = np.array([1, 2, 3, 4])
B = A.copy()  # can also write, B = np.copy(A)
B[0] = 10
print(A)

[1 2 3 4]


Note, to copy Python *lists*, we first need to import the copy module,



In [50]:
import copy

A = [1, 2, 3, 4]
B = copy.deepcopy(A)

(Don't confuse NumPy ndarrays and Python lists...)

## Matrix operations

NumPy implements all common array operations,

* addition, subtraction,
* transpose
* multiplication,
* inverse

## Array addition and substraction

In [51]:
X = np.array([[1, 2, 3], [4, 5, 6]])
Y = np.ones((2, 3))
print(X)
print(Y)

[[1 2 3]
 [4 5 6]]
[[1. 1. 1.]
 [1. 1. 1.]]


In [52]:
print(X + Y)

[[2. 3. 4.]
 [5. 6. 7.]]


In [53]:
X - 2 * Y  # note, scalar multiplication

array([[-1.,  0.,  1.],
       [ 2.,  3.,  4.]])

In [55]:
# X + np.array([[2, 2], [2, 2]])

ValueError: operands could not be broadcast together with shapes (2,3) (2,2) 

## Broadcasting

During operation NumPy will try to repeat an array to make dimensions fit. This is called 'broadcasting'. It's convenient but it can be confusing.

In [56]:
X = np.array([[1, 2, 3], [4, 5, 6]])
row = np.array([1, 1, 1])
print(X)
print(row)

[[1 2 3]
 [4 5 6]]
[1 1 1]


In [57]:
print(X + row)

[[2 3 4]
 [5 6 7]]


In [58]:
col = np.array([1, 2])
print(col)

[1 2]


In [59]:
print(X + col[:, np.newaxis])

[[2 3 4]
 [6 7 8]]


## Transpose

Transposing a matrix swaps the rows and columns.

In [60]:
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A)

[[1 2 3]
 [4 5 6]]


In [61]:
print(A.T)

[[1 4]
 [2 5]
 [3 6]]


In [62]:
A.shape, A.T.shape

((2, 3), (3, 2))

In [63]:
v = np.array([1, 2, 3, 4, 5])
print(v)

[1 2 3 4 5]


In [64]:
print(v.T)  # Vectors only have one dimension. Transpose does nothing.

[1 2 3 4 5]


In [65]:
v.shape, v.T.shape

((5,), (5,))

## Vectors versus `skinny' matrices

When using NumPy, a vector is **not** the same as a matrix with one column.

In [66]:
v = np.array([1, 2, 3, 4, 5])  # A vector - has 1 dimension
print(v)
print(v.T)
print(v.shape)

[1 2 3 4 5]
[1 2 3 4 5]
(5,)


In [67]:
M_row = np.array([[1, 2, 3, 4, 5]])  # A matrix - has 2 dimensions
print(M_row)
print(M_row.shape)

[[1 2 3 4 5]]
(1, 5)


In [68]:
M_col = np.array([[1, 2, 3, 4, 5]]).T  # A matrix can be transposed
print(M_col)
print(M_col.shape)

[[1]
 [2]
 [3]
 [4]
 [5]]
(5, 1)


## Multiplication

The '*' operator performs 'elementwise' multiplication

In [69]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A * A

array([[ 1,  4,  9],
       [16, 25, 36]])

Standard 'matrix multiplication' is performed using the `dot`
function

In [70]:
np.dot(A, A.T)  # Multipy 2x3 matrix A and 3x2 matrix A.T (AA')

array([[14, 32],
       [32, 77]])

In [71]:
v = np.array([1, 2, 3])
np.dot(A, v)  # Multiply 2x3 matrix A and 3-element vector (Av)

array([14, 32])

## Matrix inverse

Matrix determinant and inverse function are provided by the linalg submodule of NumPy.

In [72]:
A = np.array([[2, 1], [3, 2]])
print(A)

[[2 1]
 [3 2]]


In [73]:
np.linalg.det(A)

0.9999999999999998

In [74]:
np.linalg.inv(A)

array([[ 2., -1.],
       [-3.,  2.]])

## Summary

* NumPy provides tools for numeric computing.
* With NumPy, Python becomes a usable alternative to MATLAB.
* NumPy's basic type is the `ndarray` -- it can represent vectors, matrices, etc.
* Lots of tools for vector and matrix manipulation.
    - This lecture has only reviewed the most commonly used.
* For the full documentation see [http://docs.scipy.org/doc/numpy/](http://docs.scipy.org/doc/numpy/)