Dans ce notebook, nous allons couvrir les bases de Numpy. 

NumPy est une bibliothèque Python utilisée pour le calcul scientifique et l'analyse de données. Elle prend en charge les tableaux multidimensionnels et les matrices, ainsi qu'une large collection de fonctions mathématiques permettant d'opérer sur ces tableaux. NumPy est largement utilisé dans la communauté scientifique, en particulier dans des domaines tels que la physique, l'ingénierie et la science des données.

Nous couvrirons :
* Ndarray
* Principaux attributs et méthodes
* L'indexation
* Méthodes de la classe Ndarray
* Exemples de statistiques et d'algèbre linéaire

Pour aller plus loin, vous pouvez consulter la documentation officielle.
* https://numpy.org/doc/stable/


# 1. Numpy : ndarray

In [1]:
import numpy as np

## 1.1 **ndarray** array generator
- default generator: **ndarray()**
- 1D generator: **np.linspace** and **np.arange()**
- ND generator: **np.zeros()**, **np.ones()**, **np.random.randn()** (these are the most useful)

In [2]:
A = np.array([1, 2, 3]) # default generator, which converts lists (or other objects) into an ndarray array
A = np.zeros((2, 3)) # array of 0s with dimensions 2x3
B = np.ones((2, 3)) # array of 1s with dimensions 2x3
C = np.random.randn(2, 3) # random array (normal distribution) with dimensions 2x3
D = np.random.rand(2, 3) # random array (uniform distribution)
 
E = np.random.randint(0, 10, [2, 3]) # array of random integers from 0 to 10 and of dimension 2x3

In [3]:
A = np.ones((2, 3), dtype=np.float16) # define the type and the place to occupy on the memory
B = np.eye(4, dtype=np.bool) # create identity matrix and convert elements to type bool.

In [4]:
A = np.linspace(1,10, 10)
B = np.arange(0, 10, 10)

## 1.2 Attributs importants
- size
- shape

In [5]:
A = np.zeros((2, 3)) # creating an array of shape (2, 3)
 
print(A.size) # the number of elements in array A
print(A.shape) # the dimensions of array A (as a tuple)
 
print(type(A.shape)) # here is the proof that the shape is a tuple
 
print(A.shape[0]) # the number of elements in the first dimension of A

6
(2, 3)
<class 'tuple'>
2


## 1.3 Méthodes importantes
- **reshape()**: to resize an array
- **ravel()**: to flatten an array (make it only one dimension)
- **squeeze()**: when a dimension is equal to 1, this dimension disappears
- **concatenate()**: assembles 2 arrays together along one axis (also exists in hstack and vstack)

In [None]:
A = np.zeros((2, 3)) # creating an array of shape (2, 3)
 
A = A.reshape((3, 2)) # resize array A (3 rows, 2 columns)
A.ravel() # Flattens array A (one dimension only)
A.squeeze() # eliminates "1" dimensions from A.

In [None]:
A = np.zeros((2, 3)) # creating an array of shape (2, 3)
B = np.ones((2, 3)) # creating an array of shape (2, 3)

np.concatenate((A, B), axis=0) # axis 0: equivalent of np.vstack((A, B))

In [None]:
np.concatenate((A, B), axis=1) # axe 1 : equivalent of np.hstack((A, B))

---

# 2. Numpy : Slicing et Indexing

## 2.1 Indexing et Slicing
The operation is the same as for the lists

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A)

In [None]:
# To access row 0, column 1
A[0, 1] 

In [None]:
# To select the blocks of the line (0-1) column (0-1)
A[0:2, 0:2]

In [None]:
A[0:2, 0:2] = 10
print(A)

## 2.2 Boolean Indexing

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])
 
print(A<5) # boolean mask
 
print(A[A < 5]) # subset filtered by boolean mask
 
A[A<5] = 4 # converts selected values.
print(A)

---

# 3. Numpy : Mathematics

## 3.1 Basic (most useful) methods of the ndarray class

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6]])

print(A.sum()) # sum all array elements
print(A.sum(axis=0)) # perform the sum of the columns (sum over elements of the rows)
print(A.sum(axis=1)) # perform row sum (sum over column elements)
print(A.cumsum(axis=0)) # perform cumulative sum
 
print(A.prod()) # perform the product
print(A.cumprod()) # perform the cumulative product
 
print(A.min()) # find the minimum of the array
print(A.max()) # find the maximum of the array
 
print(A.mean()) # calculate the average
print(A.std()) # calculate the standard deviation,
print(A.var()) # calculate the variance

An important method: the method **argsort()**

In [None]:
A = np.random.randint(0, 10, [5, 5]) # random array
print(A)

In [None]:
print(A.argsort()) # returns the indexes to sort each row of the array

In [None]:
print(A[:,0].argsort()) # returns indexes to sort column 0 of A

In [None]:
A = A[A[:,0].argsort(), :] # sorts the columns of the array according to column 0.
A

## 3.2 Numpy Statistics
Pearson Correlation:

In [None]:
B = np.random.randn(3, 3) # random numbers 3x3

# return correlation matrix of B
print(np.corrcoef(B))

In [None]:
# returns the correlation matrix between rows 0 and 1 of B
print(np.corrcoef(B[:,0], B[:, 1]))

np.unique() :

In [None]:
np.random.seed(0)
A = np.random.randint(0, 10, [5,5])
A

In [None]:
np.unique(A)

In [None]:
values, counts = np.unique(A, return_counts=True)

for i, j in zip(values[counts.argsort()], counts[counts.argsort()]):
    print(f'value {i} appears {j}')

Statistical calculations in the presence of missing data (NaN)

In [None]:
A = np.random.randn(5, 5)
A[0, 2] = np.nan # insert a NaN into the matrix A
 
print('ratio NaN/size:', (np.isnan(A).sum()/A.size)) # computes the proportion of NaN in A
 
print('average without NaN:', np.nanmean(A)) # compute the average of A ignoring the NaN

## 3.3 Linear Algebra

In [None]:
A = np.ones((2,3))
B = np.ones((3,3))

print(A.T) # transpose of matrix A (this is an attribute of ndarray)

In [None]:
print(A.dot(B)) # matrix product A.B

In [None]:
A = np.random.randint(0, 10, [3, 3])
 
print('det=', np.linalg.det(A)) # compute the determinant of A
print('inv A:\n', np.linalg.inv(A)) # compute the inverse of A

In [None]:
val, vec = np.linalg.eig(A)
print('eigen value:\n', val) # eigen value
print('eigen vector:\n', vec) # eigen vector