<img width="800px" src="../fidle/img/00-Fidle-header-01.svg"></img>

# <!-- TITLE --> [NP1] - A short introduction to Numpy
<!-- DESC --> Numpy is an essential tool for the Scientific Python.
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->

## Objectives :
 - Comprendre les grands principes de Numpy et son potentiel

Note : This notebook is strongly inspired by the UGA Python Introduction Course  
See : **https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017**

## Step 1 - Numpy the beginning

Code using `numpy` usually starts with the import statement

In [1]:
import numpy as np

NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:

In [2]:
# from a list
l = [10.0, 12.5, 15.0, 17.5, 20.0]
np.array(l)

array([10. , 12.5, 15. , 17.5, 20. ])

In [3]:
# fast but the values can be anything
np.empty(4)

array([4.63658425e-310, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000])

In [4]:
# slower than np.empty but the values are all 0.
np.zeros([2, 6])

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [5]:
# multidimensional array
a = np.ones([2, 3, 4])
print(a.shape, a.size, a.dtype)
a

(2, 3, 4) 24 float64


array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

In [6]:
# like range but produce 1D numpy array
np.arange(4)

array([0, 1, 2, 3])

In [7]:
# np.arange can produce arrays of floats
np.arange(4.)

array([0., 1., 2., 3.])

In [8]:
# another convenient function to generate 1D arrays
np.linspace(10, 20, 5)

array([10. , 12.5, 15. , 17.5, 20. ])

A NumPy array can be easily converted to a Python list.

In [9]:
a = np.linspace(10, 20 ,5)
list(a)

[10.0, 12.5, 15.0, 17.5, 20.0]

In [10]:
# Or even better
a.tolist()

[10.0, 12.5, 15.0, 17.5, 20.0]

## Step 2 - Access elements

Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.

### 2.1 - Indexes and slices
For example, we can create an array `A` and perform any kind of selection operations on it.

In [11]:
A = np.random.random([4, 5])
A

array([[0.90023687, 0.11691004, 0.02323043, 0.93645919, 0.90806378],
       [0.47066661, 0.51619476, 0.6331525 , 0.01788091, 0.22400261],
       [0.00907019, 0.09682113, 0.11148512, 0.46883229, 0.15685269],
       [0.51343916, 0.43011841, 0.29050278, 0.83415623, 0.04388041]])

In [12]:
# Get the element from second line, first column
A[1, 0]

0.4706666144244742

In [13]:
# Get the first two lines
A[:2]

array([[0.90023687, 0.11691004, 0.02323043, 0.93645919, 0.90806378],
       [0.47066661, 0.51619476, 0.6331525 , 0.01788091, 0.22400261]])

In [14]:
# Get the last column
A[:, -1]

array([0.90806378, 0.22400261, 0.15685269, 0.04388041])

In [15]:
# Get the first two lines and the columns with an even index
A[:2, ::2]

array([[0.90023687, 0.02323043, 0.90806378],
       [0.47066661, 0.6331525 , 0.22400261]])

### 2.2 -  Using a mask to select elements validating a condition:

In [16]:
cond = A > 0.5
print(cond)
print(A[cond])

[[ True False False  True  True]
 [False  True  True False False]
 [False False False False False]
 [ True False False  True False]]
[0.90023687 0.93645919 0.90806378 0.51619476 0.6331525  0.51343916
 0.83415623]


The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:

In [17]:
# Selecting only particular columns
print(A)
A[:, [0, 1, 4]]

[[0.90023687 0.11691004 0.02323043 0.93645919 0.90806378]
 [0.47066661 0.51619476 0.6331525  0.01788091 0.22400261]
 [0.00907019 0.09682113 0.11148512 0.46883229 0.15685269]
 [0.51343916 0.43011841 0.29050278 0.83415623 0.04388041]]


array([[0.90023687, 0.11691004, 0.90806378],
       [0.47066661, 0.51619476, 0.22400261],
       [0.00907019, 0.09682113, 0.15685269],
       [0.51343916, 0.43011841, 0.04388041]])

## Step 3 -  Perform array manipulations
### 3.1 - Apply arithmetic operations to whole arrays (element-wise):

In [18]:
(A+5)**2

array([[34.81279516, 26.18276832, 25.23284396, 35.24154768, 34.90521766],
       [29.92819321, 30.42840458, 31.73240713, 25.17912879, 27.29020329],
       [25.09078418, 25.97758566, 26.12728014, 29.90812659, 26.59312965],
       [30.39801137, 29.48618596, 27.98941962, 34.0373789 , 25.44072955]])

### 3.2 - Apply functions element-wise:

In [19]:
np.exp(A) # With numpy arrays, use the functions from numpy !

array([[2.46018579, 1.1240183 , 1.02350236, 2.55093303, 2.479517  ],
       [1.60106113, 1.67563929, 1.88353909, 1.01804173, 1.25107429],
       [1.00911145, 1.1016633 , 1.11793711, 1.59812695, 1.16982327],
       [1.67102826, 1.53743956, 1.33709958, 2.30287013, 1.04485739]])

### 3.3 - Setting parts of arrays

In [20]:
A[:, 0] = 0.
print(A)

[[0.         0.11691004 0.02323043 0.93645919 0.90806378]
 [0.         0.51619476 0.6331525  0.01788091 0.22400261]
 [0.         0.09682113 0.11148512 0.46883229 0.15685269]
 [0.         0.43011841 0.29050278 0.83415623 0.04388041]]


In [21]:
# BONUS: Safe element-wise inverse with masks
cond = (A != 0)
A[cond] = 1./A[cond]
print(A)

[[ 0.          8.5535856  43.04698411  1.0678522   1.10124423]
 [ 0.          1.93725331  1.57939832 55.92557835  4.46423365]
 [ 0.         10.32832369  8.96980688  2.1329589   6.37540872]
 [ 0.          2.32494117  3.4423079   1.1988162  22.78921493]]


## Step 4 - Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))

In [22]:
for i,v in enumerate([s for s in dir(A) if not s.startswith('__')]):
    print(f'{v:16}', end='')
    if (i+1) % 6 == 0 :print('')

T               all             any             argmax          argmin          argpartition    
argsort         astype          base            byteswap        choose          clip            
compress        conj            conjugate       copy            ctypes          cumprod         
cumsum          data            diagonal        dot             dtype           dump            
dumps           fill            flags           flat            flatten         getfield        
imag            item            itemset         itemsize        max             mean            
min             nbytes          ndim            newbyteorder    nonzero         partition       
prod            ptp             put             ravel           real            repeat          
reshape         resize          round           searchsorted    setfield        setflags        
shape           size            sort            squeeze         std             strides         
sum             swapaxes      

In [23]:

# Ex1: Get the mean through different dimensions

print(A)
print('Mean value',  A.mean())
print('Mean line',   A.mean(axis=0))
print('Mean column', A.mean(axis=1))

[[ 0.          8.5535856  43.04698411  1.0678522   1.10124423]
 [ 0.          1.93725331  1.57939832 55.92557835  4.46423365]
 [ 0.         10.32832369  8.96980688  2.1329589   6.37540872]
 [ 0.          2.32494117  3.4423079   1.1988162  22.78921493]]
Mean value 8.761895407750362
Mean line [ 0.          5.78602594 14.2596243  15.08130141  8.68252538]
Mean column [10.75393323 12.78129273  5.56129964  5.95105604]


In [24]:

# Ex2: Convert a 2D array in 1D keeping all elements

print(A)
print(A.shape)
A_flat = A.flatten()
print(A_flat, A_flat.shape)

[[ 0.          8.5535856  43.04698411  1.0678522   1.10124423]
 [ 0.          1.93725331  1.57939832 55.92557835  4.46423365]
 [ 0.         10.32832369  8.96980688  2.1329589   6.37540872]
 [ 0.          2.32494117  3.4423079   1.1988162  22.78921493]]
(4, 5)
[ 0.          8.5535856  43.04698411  1.0678522   1.10124423  0.
  1.93725331  1.57939832 55.92557835  4.46423365  0.         10.32832369
  8.96980688  2.1329589   6.37540872  0.          2.32494117  3.4423079
  1.1988162  22.78921493] (20,)


### 4.1 - Remark: dot product

In [25]:
b = np.linspace(0, 10, 11)
c = b @ b
# before 3.5:
# c = b.dot(b)
print(b)
print(c)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
385.0


### 4.2 -  For Matlab users

|     ` `       | Matlab | Numpy |
| ------------- | ------ | ----- |
| element wise  |  `.*`  |  `*`  |
|  dot product  |  `*`   |  `@`  |

`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`.

### 4.3 -  NumPy and SciPy sub-packages:

We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.

To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`

In [26]:
A = np.random.random([5,5])
print(A)
np.linalg.det(A)

[[0.83764052 0.40958368 0.54489936 0.58521968 0.08475294]
 [0.92368861 0.7880218  0.11325541 0.31329903 0.26083491]
 [0.89783234 0.38385572 0.50943619 0.0023744  0.60152127]
 [0.93101479 0.52772295 0.35964418 0.77921721 0.32323606]
 [0.70334195 0.13389007 0.40467299 0.44377009 0.69673332]]


0.0012931102705524067

In [27]:
squared_subA = A[1:3, 1:3]
print(squared_subA)
np.linalg.inv(squared_subA)

[[0.7880218  0.11325541]
 [0.38385572 0.50943619]]


array([[ 1.42311309, -0.31637967],
       [-1.0723033 ,  2.2013437 ]])

### 4.4 -  Introduction to Pandas: Python Data Analysis Library

Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.

[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)
[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)
[Pandas for SQL Users](http://sergilehkyi.com/translating-sql-to-pandas/)
[Pandas Introduction Training HPC Python@UGA](https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/training-hpc/-/blob/master/ipynb/11_pandas.ipynb)

---
<img width="80px" src="../fidle/img/00-Fidle-logo-01.svg"></img>