**ids-pdl04-tut.ipynb**: This Jupyter notebook is provided by Joachim Vogt for the _Python Data Lab_ of the module _Introduction to Data Science_ offered in Fall 2022 at Jacobs University Bremen. Module instructors are Hilke Brockmann, Adalbert Wilhelm, and Joachim Vogt. Jupyter notebooks and other learning resources are available from a dedicated _module platform_.

# NumPy basics

This tutorial introductes the NumPy module. Follow the instructions below to learn to

- [ ] import Python modules in general, and the NumPy module in particular,
- [ ] create NumPy arrays from Python lists, and select elements through indexing and slicing,
- [ ] change the shape of NumPy arrays using `flatten()`, `ravel()`, and `reshape()`,
- [ ] create uniform NumPy arrays and range expressions,
- [ ] filter NumPy arrays using boolean masks,
- [ ] apply functions to NumPy arrays.

If you wish to keep track of your progress, you may edit this markdown cell, check a box in the list above after having worked through the respective part of this notebook, and save the file.

*Short exercises* are embedded in this notebook. *Sample solutions* can be found at the end of the document.

The NumPy project is hosted at [numpy.org](https://numpy.org).

## Importing the NumPy module

Python modules (also called packages or libraries) are containers providing additional functionality (functions, objects, data structures, etc). Using the module *NumPy* (Numerical Python) as an example, we demonstrate different options how to make the content of such a module available to our Python code.

The most basic variant is to import the module as follows. Then the full module name must be used as a prefix in order to access its content. The following instruction evaluates the mathematical expression $\sin \left( \frac{\pi}{2} \right) = 1$.

In [None]:
import numpy
print('sin(pi/2) = {}'.format(numpy.sin(numpy.pi/2)))

The prefix can be avoided completely using the construction `from numpy import item1, item2, ...`, but then each item must be explicitly named.

In [None]:
from numpy import sin, pi
print('sin(pi/2) = {}'.format(sin(pi/2)))

Not recommended is the option ``from numpy import *``: importing all numpy objects without prefix may cause conflicts with standard functions.

In [None]:
# from numpy import * # not recommended

The standard option, and the one employed throughout this course, is to import all of NumPy using the shorter prefix `np`.

In [None]:
import numpy as np
print('sin(pi/2) = {}'.format(np.sin(np.pi/2)))

Uncomment the following instruction to obtain information about NumPy and its help utilities. 

In [None]:
#np?

## NumPy arrays from Python lists

NumPy's efficient data processing capabilities rest on the class of N-dimensional arrays or `ndarray` objects. Such an object is a container for variables of the *same type*. NumPy array may be created from Python lists using the NumPy function `array()`. The `dtype` attribute of a NumPy array shows the type of individual array elements. 

In [None]:
lst1 = [2,3,5,7]
print(lst1,type(lst1))
arr1 = np.array(lst1)
print(arr1,type(arr1))
print('dtype of arr1   : ',arr1.dtype)

Arithmetic operations on NumPy arrays are applied element-by-element. Note the difference between the Python list concatenation operators `+` and `*` and the NumPy array addition and multiplication operators `+` and `*`.

In [None]:
print('lst1+lst1 : ',lst1+lst1)
print('arr1+arr1 : ',arr1+arr1)
print('3*lst1    : ',3*lst1)
print('3*arr1    : ',3*arr1)

Selecting individual array elements and slicing works in the same way as for Python lists. Negative indices and slides are supported.

In [None]:
print('arr1        : ',arr1)
print('arr1[0]     : ',arr1[0])
print('arr1[0:2]   : ',arr1[0:2])
print('arr1[-1]    : ',arr1[-1])
print('arr1[-3:-1] : ',arr1[-3:-1])

This syntax can be used to redefine content of NumPy arrays.

In [None]:
arr1[0] = 17
print(arr1)
arr1[1:3] = [13,11]
print(arr1)

Once the data type of a NumPy array is set, element redefinition attempts using a different type are converted if possible. If not, an error message occurs.

In [None]:
arr1[0] = '-5'
arr1[1] = 3.14
print(arr1,type(arr1))
print('dtype of arr1   : ',arr1.dtype)
print(arr1[0],type(arr1[0]))
arr1[0] = True
arr1[1] = False
print(arr1,type(arr1))
#arr1[2] = 2.3 + 4.5j #.. uncomment this line to see error message

Applying the NumPy function `array()` to a list containing different numerical types yields a NumPy array of the more general numerical type. For instance, mixing integers and floating-numbers results in a NumPy array of floats.

In [None]:
lst2 = [2,3,5.0,7.0]
print(lst2,type(lst2))
arr2 = np.array(lst2)
print(arr2,type(arr2))
print('dtype of arr2: ',arr2.dtype)
print(arr2[0],type(arr2[0]))

Mixing numbers and strings in a list defintion results in a NumPy array of strings.

In [None]:
lst3 = [2,3,'5',7]
print(lst3,type(lst3))
arr3 = np.array(lst3)
print(arr3,type(arr3))
print('dtype of arr3: ',arr3.dtype)
print(arr3[0],type(arr3[0]))

### Exercise: NumPy arrays from Python lists

Inspect the following operations on NumPy arrays. Predict the result and double-check your assessment by uncommenting the line with respective `print()` function.

In [None]:
### Example 01
lst01 = [7,-5,'3']
arr01 = np.array(lst01)
#print(arr01,arr01.dtype)
### Example 02
lst02 = [7,-5,'three']
arr02 = np.array(lst02)
#print(arr02,arr02.dtype)
### Example 03
lst03 = [7,-5.5,3]
arr03 = np.array(lst03)
#print(arr03,arr03.dtype)
### Example 04
lst04 = [7,-5,3]
arr04 = np.array(lst04)
#print(arr04,arr04.dtype)
### Example 05
lst05 = 2*[7,-5,3]
arr05 = np.array(lst05)
#print(arr05)
### Example 06
lst06 = [7,-5,3]
arr06 = 2*np.array(lst06)
#print(arr06)
### Example 07
lst07 = [7,-6,5,-4,3,-2]
arr07 = np.array(lst07)
#print(arr07[1],arr07[-2],arr07[1:-2])
### Example 08
lst08 = [7,-6,5,-4,3,-2]
arr08 = np.array(lst08)
arr08[-2] = '12'
#print(arr08)

## Multi-dimensional arrays

Using lists of lists (of lists ...) we may create multi-dimensional arrays.

In [None]:
arr4 = np.array([[2,3,5,7],[11,13,17,19],[23,29,31,37]])
print(arr4,type(arr4))

The dimensions of an array are also called *axes*. The `shape` attribute is tuple giving the number of elements in the individual axes. The current example is a two-dimensional array (matrix). The first axis (`axis=0`) corresponds to the row dimension, and the second axis (`axis=1`) to the column dimension.

In [None]:
print('arr4.shape : ',arr4.shape)

The function `flatten()` creates a one-dimensional version of a multi-dimensional NumPy array.

In [None]:
arr5 = arr4.flatten()
print(arr5)

The instance created by `flatten()` is a one-dimensional copy, i.e., an array that can be manipulated independently from the original array. Instead, the function `ravel()` creates a one-dimensional view, i.e., another reference to the same array.

In [None]:
arr4.flatten()[0] = -42
print(arr4)
arr4.ravel()[0] = -42
print(arr4)

More generally, rearranging a given NumPy array into a (view of) another shape is achieved by the `reshape()` function.

In [None]:
print(arr5)
print('\nArray reshaped to (2,2,3):')
arr6 = arr5.reshape(2,2,3)
print(arr6)

Indexing and slicing can be applied to individual axes of multi-dimensional arrays.

In [None]:
print(arr6[0,-1,1])

### Exercise: Multi-dimensional arrays

Inspect the following operations on NumPy arrays. Predict the result and double-check your assessment by uncommenting the line with respective `print()` function.

In [None]:
### Example 09
arr09 = np.array([[41,43,47,53],[59,61,67,71]])
#print(arr09.flatten())
### Example 10
arr10 = np.array([[41,43,47,53],[59,61,67,71]])
arr10.flatten()[-2] = 73
#print(arr10)
### Example 11
arr11 = np.array([[41,43,47,53],[59,61,67,71]])
arr11.reshape(4,2)[0,1] = -3
#print(arr11)
### Example 12
arr12 = np.array([[41,43,47,53],[59,61,67,71]])
arr12.reshape(4,2)[-1,0] = -3
#print(arr12)

## Uniform NumPy arrays and range expressions

The NumPy function `full()` creates a uniform `ndarray` object. Important input arguments are shape and fill value.

In [None]:
print(np.full((3,4),2.718))

More specifically, the NumPy functions `zeros()` and `ones()` create uniform arrays filled with zeros and ones, respectively.

In [None]:
print(np.zeros(12))
print(np.ones((3,4)))

The NumPy function `arange()` creates a range of numerical valus, with arguments corresponding to the parameters in slices. 

In [None]:
start = 4
stop = 40
step = 9
print(np.arange(start,stop,step))

For floating-point arrays, the result depends sensitively on the parameter `stop`.

In [None]:
start = 4.0
stop = 40.0
step = 9.0
print(np.arange(start,stop,step))
stop = 40.01
print(np.arange(start,stop,step))

In many cases more robust is the NumPy function `linspace()` with the total number of array elements as the third input argument, and always including the `stop` parameter.

In [None]:
start = 4.0
stop = 40.0
nelem = 5
print(np.linspace(start,stop,nelem))
stop = 40.01
print(np.linspace(start,stop,nelem))

### Exercise: Uniform NumPy arrays and range expressions

Complete the code cell below according to the instructions included as comments.

In [None]:
### Using np.full(), print a uniform NumPy array of shape (2,5) filled with the string 'xyz'.

### Using np.ones(), print a uniform NumPy array of shape (3,2) filled with the float -4.32.

### Using np.arange(), print the one-dimensional integer array [13,11,9,7,5] (in this order).

### Using np.linspace(), print the one-dimensional floating-point array [9.5,8.0,6.5,5.0] (in this order).


## Array filtering using boolean masks

Array operations are efficiently applied to subsets of elements satisfying certain conditions by means of *boolean masks*. To demonstrate how this concept is applied, the code in the following cell takes a small data set of a supposedly positive measurements, checks for negative fill values (`-9.99`) signaling data gaps, then creates a suitable boolean mask indicating data gaps with the truth value `True`, and finally replaces the numerical fill values with NumPy's `NaN` (not a number).

In [None]:
### Define floating-point NumPy array from Python list
lst7 = [2.32,2.52,2.15,-9.99,2.43,-9.99,2.47,2.61]
arr7 = np.array(lst7)
print(arr7,arr7.dtype)
### Check which elements are negative
gap7 = arr7<0
print(gap7,gap7.dtype)
### Replace negative elements by NumPy floating-point NaN (not a number)
arr7[gap7] = np.nan
print(arr7)

Here `<` (lesser than) is an example of a comparison operator. Further comparison operators are `>` (greater than), `>=` (greater than or equal to), `<=` (lesser than or equal to), `==` (equal to) and `!=` (not equal to).

Boolean arrays can be combined using bitwise logical connectives such as `|` (bitwise logical or) and `&` (bitwise logical and). Note that the Python keywords `or` and `and` are usually unsuitable for creating boolean masks from NumPy array operations as they compare entire objects. The following example checks if array elements are in the interval $[-2,5]$.

In [None]:
lst8 = [1,-2,3,-4,5,-6,7,-8]
arr8 = np.array(lst8)
print(arr8)
msk8 = (arr8>=-2) & (arr8<=5)
print(msk8)

The NumPy functions `any()` or `all()` allow for checking if any or all array elements along a given axis satisfy a certain condition, respectively.

In [None]:
lst9 = [1,-2,3,-4,5,-6,7,-8]
arr9 = np.array(lst9).reshape(2,4)
print(arr9)
print(np.any(arr9>5,axis=0))
print(np.all(arr9>-5,axis=1))

### Exercise: Array filtering using boolean masks

Complete the code cell below according to the instructions included as comments.

In [None]:
### Consider the following two-dimensional array of integer random numbers in the interval [0,10].
arr13 = np.random.randint(11,size=(3,5))
print(arr13)
### Create a boolean mask indicating odd numbers (hint: modulo operator %).

### Create a boolean mask indicating numbers smaller than 6.

### Create a boolean mask indicating odd numbers smaller than 6.

### Invert sign of odd numbers smaller than 6.


## Functions of NumPy arrays

The NumPy functions `any()` and `all()` evaluate condtions along dedicated axes of an array and return an array of reduced shape. Further functions showing the same behavior are `sum()`, `mean()`, `max()`, `min()`, `argmax()`, and `argmin()`. Check the NumPy documentation for further information.

In [None]:
lst9 = [1,-2,3,-4,5,-6,7,-8]
arr9 = np.array(lst9).reshape(2,4)
print(arr9)
print(np.max(arr9,axis=0))
print(np.argmax(arr9,axis=0))
print(np.sum(arr9,axis=1))

Arithmetic and other mathematical functions in the NumPy package are usually vectorized operations and thus optimized for computational efficiency. Examples are exponents, logarithms, trigonometric and hyperbolic functions. The output array is of the same shape as the input array.

In [None]:
lst9 = [1,-2,3,-4,5,-6,7,-8]
arr9 = np.array(lst9).reshape(2,4)
print(arr9)
print(arr9**2)
print(1/arr9)
print(np.exp(arr9))
print(np.cos(arr9*np.pi/4))

### Exercise: Functions of NumPy arrays

Complete the code cell below according to the instructions included as comments.

In [None]:
### Consider the two-dimensional array of random numbers.
arr14 = 4 + 2*np.random.rand(4,5)
print(arr14)
### Compute the maxima of individual rows.

### Compute the minima of individual columns.

### Compute the means of individual columns.

### Compute the base-10 logarithm of all array elements.


---
---

## Solutions

### Solution: NumPy arrays from Python lists

In [None]:
### Example 01
lst01 = [7,-5,'3']
arr01 = np.array(lst01)
print(arr01,arr01.dtype)
### Example 02
lst02 = [7,-5,'three']
arr02 = np.array(lst02)
print(arr02,arr02.dtype)
### Example 03
lst03 = [7,-5.5,3]
arr03 = np.array(lst03)
print(arr03,arr03.dtype)
### Example 04
lst04 = [7,-5,3]
arr04 = np.array(lst04)
print(arr04,arr04.dtype)
### Example 05
lst05 = 2*[7,-5,3]
arr05 = np.array(lst05)
print(arr05)
### Example 06
lst06 = [7,-5,3]
arr06 = 2*np.array(lst06)
print(arr06)
### Example 07
lst07 = [7,-6,5,-4,3,-2]
arr07 = np.array(lst07)
print(arr07[1],arr07[-2],arr07[1:-2])
### Example 08
lst08 = [7,-6,5,-4,3,-2]
arr08 = np.array(lst08)
arr08[-2] = '12'
print(arr08)

### Solution: Multi-dimensional arrays

In [None]:
### Example 09
arr09 = np.array([[41,43,47,53],[59,61,67,71]])
print(arr09.flatten())
### Example 10
arr10 = np.array([[41,43,47,53],[59,61,67,71]])
arr10.flatten()[-2] = 73
print(arr10)
### Example 11
arr11 = np.array([[41,43,47,53],[59,61,67,71]])
arr11.reshape(4,2)[0,1] = -3
print(arr11)
### Example 12
arr12 = np.array([[41,43,47,53],[59,61,67,71]])
arr12.reshape(4,2)[-1,0] = -3
print(arr12)

### Solution: Uniform NumPy arrays and range expressions

In [None]:
### Using np.full(), print a uniform NumPy array of shape (2,5) filled with the string 'xyz'.
print(np.full((2,5),'xyz'))
### Using np.ones(), print a uniform NumPy array of shape (3,2) filled with the float -4.32.
print(-4.32*np.ones((3,2)))
### Using np.arange(), print the one-dimensional integer array [13,11,9,7,5] (in this order).
print(np.arange(13,3,-2))
### Using np.linspace(), print the one-dimensional floating-point array [9.5,8.0,6.5,5.0] (in this order).
print(np.linspace(9.5,5,4))

### Solution: Array filtering using boolean masks

In [None]:
### Consider the following two-dimensional array of integer random numbers in the interval [0,10].
arr13 = np.random.randint(11,size=(3,5))
print(arr13)
### Create a boolean mask indicating odd numbers (hint: modulo operator %).
odd = np.array(arr13%2,dtype='bool')
print(odd)
### Create a boolean mask indicating numbers smaller than 6.
small = np.array(arr13<6,dtype='bool')
print(small)
### Create a boolean mask indicating odd numbers smaller than 6.
smodd = small & odd
print(smodd)
### Invert sign of odd numbers smaller than 6.
arr13[smodd] = -arr13[smodd]
print(arr13)

### Solution: Functions of NumPy arrays

In [None]:
### Consider the two-dimensional array of random numbers.
arr14 = 4 + 2*np.random.rand(4,5)
print(arr14)
### Compute the maxima of individual rows.
print(arr14.max(axis=1))
### Compute the minima of individual columns.
print(arr14.min(axis=0))
### Compute the means of individual columns.
print(arr14.mean(axis=0))
### Compute the base-10 logarithm of all array elements.
print(np.log10(arr14))

---
---