# 4.1. NumPy - numerical Python

Module M-227-04: Programming for Data Analytics

Instructor: prof. Dmitry Pavlyuk

## Multidimensional Data

* Geospatial data sets
    * Longitude/latitute (2 dimensions)
    * Time (1 dimension)
    * Indicators - temperature, humidity, pressure, etc. (1 dimension)
* Images
    * Coordinates (2 dimensions)
    * Channels (1 dimensional)
* Video (Images + Time)
* OLAP cubes (hypercubes) - multidimensional
* Tensors in machine learning - multidimensional


## NumPy

NumPy is a library that adding support for large multi-dimensional arrays, along with a large collection of high-level mathematical functions to operate on these arrays

In [1]:
!pip install numpy



In [2]:
import numpy as np

* Efficient multidimensional array (ndarray) providing fast array-oriented arithmetic operations and flexible broadcasting capabilities
* Mathematical functions (linear algebra, statistics) for fast operations on entire arrays of data without having to write loops
* Tools for reading/writing array data to disk and working with memory-mapped files

### One-dimensional arrays - vectors

In [3]:
arr1d = np.zeros(3)
arr1d

array([0., 0., 0.])

### Two-dimensional arrays - matrices

In [4]:
arr2d = np.zeros((3,3))
arr2d

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

### Three-dimensional arrays - cubes

In [5]:
arr3d = np.zeros((3,3,4))
arr3d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

## Efficiency of operations

In [6]:
import timeit

In [7]:
timeit.timeit("np.arange(1000000)", number=100, setup="import numpy as np")

0.09821349999999995

In [8]:
timeit.timeit("list(range(1000000))", number=100)

2.0277386

## ndaray - internal structure

__ndarray__ internally consists of the following:

* Pointer to data - a block of data in RAM or in a memory-mapped file
* Data type or dtype describing fixed-size value cells in the array
* Tuple indicating the array's shape
* Tuple of strides - integers indicating the number of bytes to step in order to advance one element along a dimension

In [9]:
arr = np.ones((2,3))
print(arr)
print(arr.shape)
print(arr.dtype)
print(arr.strides)

[[1. 1. 1.]
 [1. 1. 1.]]
(2, 3)
float64
(24, 8)


## Data types - int, short, uint, ushort

In [10]:
arr = np.ones((2,3), dtype=int)
print(arr)
print(arr.shape)
print(arr.dtype)
print(arr.strides)

[[1 1 1]
 [1 1 1]]
(2, 3)
int32
(12, 4)


In [11]:
arr = np.ones((2,3), dtype=np.short)
print(arr)
print(arr.shape)
print(arr.dtype)
print(arr.strides)

[[1 1 1]
 [1 1 1]]
(2, 3)
int16
(6, 2)


## Data types - hierarchy

<img src="https://numpy.org/doc/stable/_images/dtype-hierarchy.png" alt="NumPy data types">

## Converting data type

In [12]:
arr.astype(float)

array([[1., 1., 1.],
       [1., 1., 1.]])

## Creating arrays

### Standard arrays:
* np.ones - array of ones
* np.zeros - array of ones
* np.eye, np.identity - array of ones


### Examples

In [13]:
np.ones((2,3,2))

array([[[1., 1.],
        [1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.],
        [1., 1.]]])

In [14]:
np.zeros((2,3,2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [15]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### From list or tuple:

In [16]:
np.array([1, 2, 3, 4, 5])

array([1, 2, 3, 4, 5])

### From iterable:

In [17]:
np.array(range(5))

array([0, 1, 2, 3, 4])

In [18]:
np.array(np.arange(5))

array([0, 1, 2, 3, 4])

In [19]:
arr = np.array(np.arange(12))
arr.reshape(3,4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### From CSV

In [20]:
iris_data = np.genfromtxt('iris.csv', delimiter=',', skip_header=1, usecols=(0,1,2,3))
iris_data[0:10]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

## Basic operations

### Reshaping

In [21]:
arr = np.array(np.arange(1,13))
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [22]:
print(arr.reshape(2,6))

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]


In [23]:
print(arr.reshape(6,2))

[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]]


In [24]:
print(arr.reshape(3,4))

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


### Indexing

In [25]:
arr = arr.reshape(3,4)

* First element:

In [26]:
arr[0,0]

1

* First column:

In [27]:
arr[:,0]

array([1, 5, 9])

### Fancy indexing

Fancy indexing allows indexing an array using a sequence of integers

In [28]:
print("Full array:\n",arr)
print("Indexing:\n",arr[[2,0,1]])

Full array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Indexing:
 [[ 9 10 11 12]
 [ 1  2  3  4]
 [ 5  6  7  8]]


### Boolean indexing

In [29]:
arr[[True,False,True]]

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])

In [30]:
large = iris_data[:,0]>7
large

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False,  True, False,  True,
       False,  True, False, False, False, False, False, False, False,
        True,  True, False, False, False,  True, False, False,  True,
       False, False,

In [31]:
iris_data[large]

array([[7.1, 3. , 5.9, 2.1],
       [7.6, 3. , 6.6, 2.1],
       [7.3, 2.9, 6.3, 1.8],
       [7.2, 3.6, 6.1, 2.5],
       [7.7, 3.8, 6.7, 2.2],
       [7.7, 2.6, 6.9, 2.3],
       [7.7, 2.8, 6.7, 2. ],
       [7.2, 3.2, 6. , 1.8],
       [7.2, 3. , 5.8, 1.6],
       [7.4, 2.8, 6.1, 1.9],
       [7.9, 3.8, 6.4, 2. ],
       [7.7, 3. , 6.1, 2.3]])

### Slicing

For every dimension the slice object is __[start:end:step]__
* If we don't pass start its considered 0
* If we don't pass end its considered length of array in that dimension
* If we don't pass step its considered 1

* First column:

In [32]:
arr[:,0:1]

array([[1],
       [5],
       [9]])

* First 2 columns:

In [33]:
arr[:,0:2]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

* First row:

In [34]:
arr[0:1,:]

array([[1, 2, 3, 4]])

* First 2 rows:

In [35]:
arr[0:2,:]

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

* First 2 rows of iris data set:

In [36]:
iris_data[0:2,:]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2]])

## Slicing Vs. Copy

### Slicing / View

In [37]:
arr = np.arange(16).reshape(4,4)
print("Full array:\n",arr)

arr_slice = arr[1:3, 1:3]
print("Slice:\n", arr_slice)

Full array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
Slice:
 [[ 5  6]
 [ 9 10]]


In [38]:
arr_slice[:,:]=999
print("Slice:\n", arr_slice)
print("Full array:\n",arr)

Slice:
 [[999 999]
 [999 999]]
Full array:
 [[  0   1   2   3]
 [  4 999 999   7]
 [  8 999 999  11]
 [ 12  13  14  15]]


### Copying

In [39]:
arr = np.arange(16).reshape(4,4)
arr_slice_copy = arr[1:3, 1:3].copy()
arr_slice_copy[:,:]=999
print("Slice:\n", arr_slice_copy)
print("Full array:\n",arr)

Slice:
 [[999 999]
 [999 999]]
Full array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


## Element-wise Operations

In [40]:
np.random.seed(0)
arr1 = np.random.randint(1,10,(3,3))
print(f"Array 1:\n {arr1}")

arr2 = np.random.randint(1,10,(3,3))
print(f"Array 2:\n {arr2}")

Array 1:
 [[6 1 4]
 [4 8 4]
 [6 3 5]]
Array 2:
 [[8 7 9]
 [9 2 7]
 [8 8 9]]


* Sum/Difference

In [41]:
arr1+arr2

array([[14,  8, 13],
       [13, 10, 11],
       [14, 11, 14]])

In [42]:
arr1-arr2

array([[-2, -6, -5],
       [-5,  6, -3],
       [-2, -5, -4]])

* Join

In [43]:
np.concatenate((arr1, arr2))

array([[6, 1, 4],
       [4, 8, 4],
       [6, 3, 5],
       [8, 7, 9],
       [9, 2, 7],
       [8, 8, 9]])

In [44]:
np.concatenate((arr1, arr2), axis=1)

array([[6, 1, 4, 8, 7, 9],
       [4, 8, 4, 9, 2, 7],
       [6, 3, 5, 8, 8, 9]])

* Element-wise (Hadamard) Multiplication

In [45]:
print(np.multiply(arr1, arr2))

[[48  7 36]
 [36 16 28]
 [48 24 45]]


### Universal functions

Fast Element-Wise Array Functions
* np.sqrt
* np.exp
* np.isnan
* np.where

In [46]:
print(f"Array:\n {arr1}")
print(f"Element-wise square root:\n {np.sqrt(arr1)}")


Array:
 [[6 1 4]
 [4 8 4]
 [6 3 5]]
Element-wise square root:
 [[2.44948974 1.         2.        ]
 [2.         2.82842712 2.        ]
 [2.44948974 1.73205081 2.23606798]]


## Matrix Operations

### Transposition

In [47]:
print(f"Matrix:\n{arr1}")
print(f"Transposed:\n{np.transpose(arr1)}")
print(f"Transposed:\n{arr1.T}")

Matrix:
[[6 1 4]
 [4 8 4]
 [6 3 5]]
Transposed:
[[6 4 6]
 [1 8 3]
 [4 4 5]]
Transposed:
[[6 4 6]
 [1 8 3]
 [4 4 5]]


### Multiplication by a scalar

In [48]:
print(f"Matrix:\n{arr1}")
print(f"Matrix * 10:\n{arr1 * 10}")

Matrix:
[[6 1 4]
 [4 8 4]
 [6 3 5]]
Matrix * 10:
[[60 10 40]
 [40 80 40]
 [60 30 50]]


### Matrix Multiplication

In [49]:
np.matmul(arr1, arr2)

array([[ 89,  76,  97],
       [136,  76, 128],
       [115,  88, 120]])

In [50]:
arr1 @ arr2

array([[ 89,  76,  97],
       [136,  76, 128],
       [115,  88, 120]])

In [51]:
np.dot(arr1, arr2)

array([[ 89,  76,  97],
       [136,  76, 128],
       [115,  88, 120]])

__Note__

__np.matmul__ or __@__ is preferred for matrixes, while __np.dot__ is more general and works for vectors and matrixes of different dimensions

In [52]:
arr_ones = np.ones(3)
print(f"Ones: {arr_ones}")
print(f"Dot product {arr_ones} * {arr_ones} = {np.dot(arr_ones,arr_ones)}")

Ones: [1. 1. 1.]
Dot product [1. 1. 1.] * [1. 1. 1.] = 3.0


### Inverse matrix calculation

In [53]:
arr1_inv = np.linalg.inv(arr1)
print(f"Matrix:\n{arr1}")
print(f"Inversed:\n{arr1_inv}")
print(f"Their product:\n{np.matmul(arr1, arr1_inv)}")

Matrix:
[[6 1 4]
 [4 8 4]
 [6 3 5]]
Inversed:
[[ 1.          0.25       -1.        ]
 [ 0.14285714  0.21428571 -0.28571429]
 [-1.28571429 -0.42857143  1.57142857]]
Their product:
[[ 1.00000000e+00  0.00000000e+00 -8.88178420e-16]
 [ 0.00000000e+00  1.00000000e+00 -8.88178420e-16]
 [-2.22044605e-16 -3.33066907e-16  1.00000000e+00]]


### Solving system of linear equations:


 8x + 3y − 2z = 9
 
−4x + 7y + 5z = 15

 3x + 4y − 12z= 35

In [54]:
A = np.array([[8, 3, -2], [-4, 7, 5], [3, 4, -12]])
b = np.array([9, 15, 35])
x = np.linalg.solve(A, b)
print(f"Solution:: {x}")

Solution:: [-0.58226371  3.22870478 -1.98599767]


$$Ax = b$$

$$x = A^{-1}b$$

In [55]:
x= np.linalg.inv(A).dot(b)
print(f"Solution:: {x}")

Solution:: [-0.58226371  3.22870478 -1.98599767]


## Statistics

### Basic functions

In [56]:
iris_data.mean(axis=0)

array([5.84333333, 3.05733333, 3.758     , 1.19933333])

In [57]:
iris_data.min(axis=0)

array([4.3, 2. , 1. , 0.1])

In [58]:
iris_data.max(axis=0)

array([7.9, 4.4, 6.9, 2.5])

### Cumulative functions

In [59]:
print(iris_data[0:5])

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


In [60]:
print(iris_data.cumsum(axis=0)[0:5])

[[ 5.1  3.5  1.4  0.2]
 [10.   6.5  2.8  0.4]
 [14.7  9.7  4.1  0.6]
 [19.3 12.8  5.6  0.8]
 [24.3 16.4  7.   1. ]]


## Broadcasting

Broadcasting governs how operations work between arrays of different shapes. It can be a powerful feature, but it can cause confusion, even for experienced users.

In [61]:
arr1*4

array([[24,  4, 16],
       [16, 32, 16],
       [24, 12, 20]])

In [62]:
(iris_data - iris_data.mean(axis=0))[:10]

array([[-0.74333333,  0.44266667, -2.358     , -0.99933333],
       [-0.94333333, -0.05733333, -2.358     , -0.99933333],
       [-1.14333333,  0.14266667, -2.458     , -0.99933333],
       [-1.24333333,  0.04266667, -2.258     , -0.99933333],
       [-0.84333333,  0.54266667, -2.358     , -0.99933333],
       [-0.44333333,  0.84266667, -2.058     , -0.79933333],
       [-1.24333333,  0.34266667, -2.358     , -0.89933333],
       [-0.84333333,  0.34266667, -2.258     , -0.99933333],
       [-1.44333333, -0.15733333, -2.358     , -0.99933333],
       [-0.94333333,  0.04266667, -2.258     , -1.09933333]])

## Saving/Restoring arrays

In [63]:
np.save("arr1.npy", arr1)
np.load("arr1.npy")

array([[6, 1, 4],
       [4, 8, 4],
       [6, 3, 5]])

In [64]:
np.savez("arrs.npz", arr1 = arr1, arr2 = arr2)
loaded = np.load("arrs.npz")
print(loaded["arr1"])
print(loaded["arr2"])

[[6 1 4]
 [4 8 4]
 [6 3 5]]
[[8 7 9]
 [9 2 7]
 [8 8 9]]


# Thank you