# NumPy basics

This notebook stands for introduction to and presentation of NumPy - powerful Python library
for numerical computation.

### What does NumPy contain?

NumPy provides tools for conducting numerical computations in Python. It allows to do it
faster and more conveniently.

Basically, NumPy provides:

- n-dimensional arrays (_ndarrays_)
- functions to manipulate these arrays

By convention, we import NumPy using _np_ alias.

In [1]:
import numpy as np

In [2]:
some_array = np.array([0,1,2,3,4,5,6])
print(some_array)

[0 1 2 3 4 5 6]


### ndarray

Multidimensional array class in NumPy is _ndarray_ (also known as _array_, which is its alias).
It is an array that contain elements of one type and can have many dimensons.

In [3]:
# example array - np.array() function creates array from a sequence
one_dim_array = np.array([1,2,3,4])
print(one_dim_array)

[1 2 3 4]


In [4]:
# two dimensional array - lists of lists
two_dim_array = np.array([[11, 12, 13],
                         [21, 22, 23],
                         [31, 32, 33]])
print(two_dim_array)
print('\n')
# can be more dimensions - analogously
three_dim_array = np.array([
                             [[111, 112, 113],
                             [121, 122, 123],
                             [131, 132, 133]],
                            [[211, 212, 213],
                             [221, 222, 223],
                             [231, 232, 233]]
                           ])
print(three_dim_array)

[[11 12 13]
 [21 22 23]
 [31 32 33]]


[[[111 112 113]
  [121 122 123]
  [131 132 133]]

 [[211 212 213]
  [221 222 223]
  [231 232 233]]]


We have three arrays objects above - _one_dim_array_ has one dimension, _two_dim_array_ has two dimensions and _three_dim_array_ is three-dimensional.

n-dimensional ndarray has n _axes_ (0, 1, ..., n-1), each axes is a _direction_ towards some dimension.

Important thing: the axis printed from left to right is the _last_ axis, next are printed from top to bottom.

For example, _three_dim_array_ can be thought as two matrices with numbered matrices, matrices rows and matrices columns. Axis 2 (last axis) goes through indexes of matrices columns. Axis 1 goes through indexes of matrices rows. Axis 0 goes through indexes of matrices.

We demontrate this by calculating sums of elements over each axis.

In [5]:
# sum over axis 2 - each element stands for one matrix' one row sum
# (because axis goes through numbers of matrix' columns)
np.sum(three_dim_array, axis=2)

array([[336, 366, 396],
       [636, 666, 696]])

In [6]:
np.sum(three_dim_array, axis=1)

array([[363, 366, 369],
       [663, 666, 669]])

In [7]:
np.sum(three_dim_array, axis=0)

array([[322, 324, 326],
       [342, 344, 346],
       [362, 364, 366]])

We can see array's properties.

In [8]:
# number of dimensions
three_dim_array.ndim

3

In [9]:
# number of elements in each dimension (over each axis)
three_dim_array.shape

(2, 3, 3)

In [10]:
# number of bytes one element occupy
three_dim_array.itemsize

8

In [11]:
# total size in bytes
three_dim_array.nbytes

144

### Data types

In a NumPy array all of the elements has to be the same type. The type of an array object is called _dtype_.

In [12]:
# data type
three_dim_array.dtype

dtype('int64')

We can fit datatype to our purpose.

In [13]:
# 16-bit int
sixteen = np.array([5,3,5,3], dtype='int16')
print(sixteen.dtype)
print(sixteen.itemsize)

int16
2


In [14]:
# 32-bit floating point number
single_prec = np.array([[4.5, 7.86,],
                        [5.0, 1.25]], dtype='float32')
print(single_prec.dtype)
print(single_prec.itemsize)

float32
4


In [15]:
string_array = np.array(['gf', 'hh', 'as'])
string_array.dtype

dtype('<U2')

We can create _structured data types_, which are aggregates of other data types.

In [17]:
# data type object
np.dtype([('f1', np.uint32), ('f2', np.float64)])

dtype([('f1', '<u4'), ('f2', '<f8')])

### Arrays indexing

How to access values in an array at desired position (or positions)?

Arrays can be indexed similarly to Python lists. What makes difference is that arrays
has multiple dimensions.

In [18]:
for item in (
    two_dim_array,
    two_dim_array[0],
    two_dim_array[0, :],
    two_dim_array[0, 1]
):
    print(item, '\n')

[[11 12 13]
 [21 22 23]
 [31 32 33]] 

[11 12 13] 

[11 12 13] 

12 



Also an array can be used to index an array - provided it's integer-typed.

In [19]:
print(one_dim_array)
print(
    one_dim_array[
        np.array([0,2,2,1,0], dtype='int8')
    ]
)

[1 2 3 4]
[1 3 3 2 1]


An array can also be indexed using boolean index.

In [20]:
one_dim_array[
    [True, False, True, False]
]

array([1, 3])

Numpy provides some utils to find indices that contain items satisfying given condition.

In [21]:
print(two_dim_array)

# find all indices where an element is greater than 22
print(
    np.argwhere(two_dim_array > 22)
)

[[11 12 13]
 [21 22 23]
 [31 32 33]]
[[1 2]
 [2 0]
 [2 1]
 [2 2]]


### Create an array

There are plenty of ways to create a NumPy array.

In [22]:
# np.array() function
arr = np.array([[1,2,3,4], [3,2,1,3]])
arr

array([[1, 2, 3, 4],
       [3, 2, 1, 3]])

In [23]:
# using function on indices
np.fromfunction(lambda i, j: i >= j, (3,3), dtype=int)

array([[ True, False, False],
       [ True,  True, False],
       [ True,  True,  True]])

In [24]:
# using iterable object (only 1-d)
# count parameter enhances speed
np.fromiter((i**2 for i in range(5)), dtype='int32', count=5)

array([ 0,  1,  4,  9, 16], dtype=int32)

In [25]:
# create array filled with zeros...

# ...of desired shape
print(np.zeros((5,2)))

# ...of the same shape as another array
print(np.zeros_like(two_dim_array))

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]
[[0 0 0]
 [0 0 0]
 [0 0 0]]


In [26]:
# create array filled with ones
print(np.ones((5,2)))
print(np.ones_like(two_dim_array))

[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
[[1 1 1]
 [1 1 1]
 [1 1 1]]


In [27]:
# or filled with any other number
print(np.full((5,2), 3.14))
print(np.full_like(two_dim_array, 3.14))
# caution! data type

[[3.14 3.14]
 [3.14 3.14]
 [3.14 3.14]
 [3.14 3.14]
 [3.14 3.14]]
[[3 3 3]
 [3 3 3]
 [3 3 3]]


In [28]:
# create array with some values dependent on memory that are to be replaced
np.empty((1,2))  # may vary

array([[3.856e-320, 0.000e+000]])

In [29]:
# numpy has its own `range` function
np.arange(-5, 100, 5)

array([-5,  0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,
       80, 85, 90, 95])

In [30]:
print(np.linspace(5, 20, 6))
print(np.logspace(5, 20, 6))

[ 5.  8. 11. 14. 17. 20.]
[1.e+05 1.e+08 1.e+11 1.e+14 1.e+17 1.e+20]


... and there are much more ways to create a new array.

### Maths on arrays

How do we perform mathematical operations on arrays?

In [31]:
ar1 = two_dim_array.copy()
ar1

array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

In [32]:
ar2 = np.array([
    [10, 10, 10],
    [20, 20, 20],
    [30, 30, 30]
])
ar2

array([[10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]])

In [33]:
# addition
ar1 + ar2

array([[21, 22, 23],
       [41, 42, 43],
       [61, 62, 63]])

In [34]:
# subtraction
ar1 - ar2

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [35]:
# elementwise multiplication
ar1 * ar2

array([[110, 120, 130],
       [420, 440, 460],
       [930, 960, 990]])

In [36]:
# elementwise division
ar1 / ar2

array([[1.1       , 1.2       , 1.3       ],
       [1.05      , 1.1       , 1.15      ],
       [1.03333333, 1.06666667, 1.1       ]])

In [37]:
# matrix multiplication
ar1 @ ar2

array([[ 740,  740,  740],
       [1340, 1340, 1340],
       [1940, 1940, 1940]])

In [38]:
# add a column vector to a matrix?
ar1 + np.array([[1], [2], [3]])

array([[12, 13, 14],
       [23, 24, 25],
       [34, 35, 36]])

In [39]:
# ...and a row vector?
ar1 + np.array([1, 2, 3])

array([[12, 14, 16],
       [22, 24, 26],
       [32, 34, 36]])

### Arrays manipulation

Here are examples of some useful functions and methods to perform manipulation on arrays.

In [40]:
# reshape an array - same elements but differently organized
print(one_dim_array)
print(one_dim_array.reshape(1,4))
print(one_dim_array.reshape(4,1))

[1 2 3 4]
[[1 2 3 4]]
[[1]
 [2]
 [3]
 [4]]


In [41]:
# -1 - infer dimension
one_dim_array.reshape(-1, 1)

array([[1],
       [2],
       [3],
       [4]])

In [42]:
# use ravel to 'flatten' an array
three_dim_array.ravel()

array([111, 112, 113, 121, 122, 123, 131, 132, 133, 211, 212, 213, 221,
       222, 223, 231, 232, 233])

In [43]:
three_dim_array.flatten()

array([111, 112, 113, 121, 122, 123, 131, 132, 133, 211, 212, 213, 221,
       222, 223, 231, 232, 233])

In [44]:
# transpositions
ar = np.arange(9).reshape(3, 3)
print(ar)
print(ar.transpose())
print(np.all(ar.T == ar.transpose()))

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[0 3 6]
 [1 4 7]
 [2 5 8]]
True


In [45]:
ar = np.arange(8).reshape(2, 2, 2)
print(ar)
print(ar.transpose())

[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]
[[[0 4]
  [2 6]]

 [[1 5]
  [3 7]]]


In [46]:
# joining arrays

print(ar1, '\n')
print(ar2)

print(
    "\nconcatenate\n",
    np.concatenate((ar1, ar2))
)
print(
    "\nconcatenate axis=1\n",
    np.concatenate((ar1, ar2), axis=1)
)
print(
    "\nhstack\n",
    np.hstack((ar1, ar2))
)
print(
    "\nvstack\n",
    np.vstack((ar1, ar2))
)
print(
    "\nstack\n",
    np.stack((ar1, ar2))
)

[[11 12 13]
 [21 22 23]
 [31 32 33]] 

[[10 10 10]
 [20 20 20]
 [30 30 30]]

concatenate
 [[11 12 13]
 [21 22 23]
 [31 32 33]
 [10 10 10]
 [20 20 20]
 [30 30 30]]

concatenate axis=1
 [[11 12 13 10 10 10]
 [21 22 23 20 20 20]
 [31 32 33 30 30 30]]

hstack
 [[11 12 13 10 10 10]
 [21 22 23 20 20 20]
 [31 32 33 30 30 30]]

vstack
 [[11 12 13]
 [21 22 23]
 [31 32 33]
 [10 10 10]
 [20 20 20]
 [30 30 30]]

stack
 [[[11 12 13]
  [21 22 23]
  [31 32 33]]

 [[10 10 10]
  [20 20 20]
  [30 30 30]]]


In [47]:
# create new array with additional values
np.append(ar1, np.array([[-1, -2, -3]]), axis=0)

array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33],
       [-1, -2, -3]])

In [48]:
# repeating
print(
    np.repeat([1,2,3], 3)
)
print(
    np.tile([1,2,3], 3)
)

[1 1 1 2 2 2 3 3 3]
[1 2 3 1 2 3 1 2 3]


In [49]:
print(
    np.tile(ar1, 3)
)

[[11 12 13 11 12 13 11 12 13]
 [21 22 23 21 22 23 21 22 23]
 [31 32 33 31 32 33 31 32 33]]


In [50]:
# logic functions - useful for checks
ar_with_nan = np.array([[1,2,3], [4,5,np.nan]])
print(ar_with_nan)
print('\n', np.isnan(ar_with_nan))
print('\n', np.any(np.isnan(ar_with_nan)))
print('\n', np.any(np.isnan(ar1)))

[[ 1.  2.  3.]
 [ 4.  5. nan]]

 [[False False False]
 [False False  True]]

 True

 False


In [51]:
np.logical_and([0,1,1], [1,1,0])

array([False,  True, False])

In [52]:
# padding
np.pad(ar1, (1,2), 'constant', constant_values=(-1, 1))

array([[-1, -1, -1, -1,  1,  1],
       [-1, 11, 12, 13,  1,  1],
       [-1, 21, 22, 23,  1,  1],
       [-1, 31, 32, 33,  1,  1],
       [-1,  1,  1,  1,  1,  1],
       [-1,  1,  1,  1,  1,  1]])

In [53]:
# clip an array
# (values outside some range are replaced by the range boundary values)
np.clip(ar1, a_min=15, a_max=30)

array([[15, 15, 15],
       [21, 22, 23],
       [30, 30, 30]])

In [54]:
# if you have an array with an axis of length one,
# you can _squeeze_ it

ar = np.arange(6).reshape(2,1,3)
print(ar, '\n')
print(np.squeeze(ar, axis=1))

[[[0 1 2]]

 [[3 4 5]]] 

[[0 1 2]
 [3 4 5]]


and much more...

### "True" copies and views

Consider an example:

In [55]:
array_a = np.array([0, 1, 2, 3])
array_b = array_a
array_b[2] = -100
array_a

array([   0,    1, -100,    3])

Even though we haven't modify _array\_a_ directly, it's been modified as we modified _array\_b_.
It's because _array\_a_ and _array\_b_ variables in fact indicate the same object.

In [56]:
id(array_a) == id(array_b)

True

Another example...

In [57]:
array_a = np.array([0, 1, 2, 3])
array_a_subset = array_a[1:3]
np.add.at(array_a_subset, [0, 1], -123)
array_a

array([   0, -122, -121,    3])

What happened?

The _array\_a\_subset_ variable contains the subset of _array\_a_ that has been returned by the array indexing.
Basic indexing always creates views. A _view_ is a way to access data in an array without physical copying
of this data. As the data remains the same, any changes made on a view modifies the original array as well.
At the third line, we invoke _np.add.at_ function, which modifies the array subset variable in some way -
what is important here is that the _np.add.at_ works in place, so it has modified the data accessed with the view.

So, is there a way to physically ('truly') copy an array? Use _.copy_ method:

In [58]:
array_a_copy = array_a.copy()
id(array_a_copy) == id(array_a)

False

### Randomness

NumPy provides utilities to generate pseudorandom numbers.

Modern way to generate random numbers in NumPy is via _Generator_ class instance.

In [59]:
rng = np.random.default_rng()
rng

Generator(PCG64) at 0x7B3AE3523680

In [60]:
print(
    rng.random()
)
print(
    rng.integers(low=1, high=10, size=(2,2))
)
print(
    rng.standard_exponential(size=5)
)

0.24166992877004856
[[7 2]
 [9 6]]
[0.75659889 0.40993761 0.05687322 0.17129482 0.27265658]


In [61]:
# random choice from array's elements
rng.choice(np.arange(10), 5)

array([8, 3, 5, 2, 9])

In [62]:
rng.choice(np.arange(10).reshape(2, -1), 3, axis=1)

array([[2, 1, 1],
       [7, 6, 6]])

In [63]:
# shuffle array's element IN PLACE!

ar = np.arange(15).reshape(5, -1)
print(ar, '\n')

rng.shuffle(ar)

print(ar)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]] 

[[ 3  4  5]
 [ 9 10 11]
 [12 13 14]
 [ 0  1  2]
 [ 6  7  8]]


### Universal functions (_ufuncs_)

Functions like _np.add_, _np.subtract_ are examples of NumPy [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html).
They can be used directly, but they have also some convenient utilities.

In [64]:
ar1

array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

In [65]:
ar2

array([[10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]])

In [66]:
np.add(ar1, ar2)

array([[21, 22, 23],
       [41, 42, 43],
       [61, 62, 63]])

In [67]:
np.sin(ar1)

array([[-0.99999021, -0.53657292,  0.42016704],
       [ 0.83665564, -0.00885131, -0.8462204 ],
       [-0.40403765,  0.55142668,  0.99991186]])

In [68]:
# reduce
print(
    np.add.reduce(ar1, axis=0)
)
print(
    np.add.reduce(ar1, axis=(0,1))
)
print(
    np.subtract.reduce(ar1, axis=0)
)

[63 66 69]
198
[-41 -42 -43]


In [68]:
# perform at index (with possible repeats)
ar = np.array([0,1,2])
np.add.at(ar, [0,2,2], 1)  # IN PLACE !!!
ar

array([1, 1, 4])

In [69]:
# accumulate
np.multiply.accumulate(np.arange(1, 6))

array([  1,   2,   6,  24, 120])

In [70]:
# create matrix A, where
# element A[i, j] = vector_a[i] - vector_b[j]

vector_a = np.array([1,2,3,4,5])
vector_b = np.array([10, 20, 30])
np.subtract.outer(vector_a, vector_b)

array([[ -9, -19, -29],
       [ -8, -18, -28],
       [ -7, -17, -27],
       [ -6, -16, -26],
       [ -5, -15, -25]])