# Introduction to NumPy

This material is inspired from different sources:

* https://github.com/SciTools/courses
* https://github.com/paris-saclay-cds/python-workshop/blob/master/Day_1_Scientific_Python/01-numpy-introduction.ipynb

In [51]:
#%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 1. Create numpy array

So we can easily create a NumPy array from scract using the function `np.array`.

In [52]:
arr = np.array([0, 1, 2, 3])

In [53]:
arr.dtype

dtype('int32')

In [54]:
arr = np.array([0, 1, 2, 3.0])
arr.dtype

dtype('float64')

Sometimes, we want our array to be in particular way: only zeros (`np.zeros`), only ones (`np.ones`), equally spaced (`np.linspace`) or logarithmic spaced (`np.logspace`), etc.

### Exercise

Try out some of these ways of creating NumPy arrays. See if you can:

* create a NumPy array from a list of integer numbers. Use the function [`np.array()`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html) and pass the Python list. You can refer to the example from the documentation.

In [5]:
# %load solutions/01_solutions.py
x = np.array([1, 2, 3.0, 4, 5])

In [6]:
x.dtype

dtype('float64')

While checking the documentation of [np.array](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.array.html) an interesting parameter to pay attention is ``dtype``. This parameter can force the data type inside the array.

In [9]:
arr.dtype

dtype('float64')

In [50]:
#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>----------------------------------------------
a = np.arange(25).reshape(5,5)
# divby3 = a % 3 == 0
# a[divby3]
a[[0,2,3,3], [2,3,1,4]]

array([ 2, 13, 16, 19])

In [46]:
# n = np.ones(np.shape([2,3]))
# n = np.ones(shape=(2,3))
# n
#a = np.shape((4,10))  # (2,)
#a = np.shape([4,20])  # (2,)
a = np.array([-1, -3, 1, 4, -6, 9,3])
negative = a < 0
print(negative)
b = a[negative]
b
#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<----------------------------------------------

[ True  True False False  True False False]


array([-1, -3, -6])

In [72]:
# %load solutions/02_solutions.py
arr = np.array([1, 2, 3, 4, 5], dtype=np.float64)
print(arr)
print(arr.dtype)


[1. 2. 3. 4. 5.]
float64


In [74]:
# %load solutions/03_solutions.py
arr_zeros = np.zeros((2, 3, 4))
print(arr_zeros)
print('Shape of the array\n', arr_zeros.shape)
print('Number of dimensions\n', arr_zeros.ndim)
arr_ones = np.ones((2, 3, 4))
print(arr_ones)
print('Shape of the array\n', arr_ones.shape)
print('Number of dimensions\n', arr_ones.ndim)


[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]
Shape of the array
 (2, 3, 4)
Number of dimensions
 3
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
Shape of the array
 (2, 3, 4)
Number of dimensions
 3


* a NumPy array filled with a constant value -- not 0 or 1. (Hint: this can be achieved using the last array you created, or you could use [np.empty](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.empty.html) and find a way of filling the array with a constant value),

In [84]:
# %load solutions/04_solutions.py
arr = np.ones(100) * 0.1
arr

array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
       0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])

In [85]:
x = np.empty(shape=(3, 2))
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        x[i, j] = 5

In [87]:
x.shape

array([[5., 5.],
       [5., 5.],
       [5., 5.]])

* a NumPy array of 8 elements with a range of values starting from 0 and a spacing of 3 between each element (Hint: check the function [np.arange](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.arange.html)), and

In [90]:
# %load solutions/05_solutions.py
np.arange(0, 3 * 8, 3)
list(range(0, 3 * 8, 3))

[0, 3, 6, 9, 12, 15, 18, 21]

## 2. Manipulating NumPy array

### 2.1 Indexing

Note that the NumPy arrays are zero-indexed:

In [97]:
data = np.random.randn(10000, 5)
print(data)

[[-0.46651296 -0.72119603 -0.72860743  0.21778252 -0.16485998]
 [ 0.90537976 -0.26444682  0.40316535  0.66360207  0.38278145]
 [ 0.39478413 -1.15142528 -0.07721078  0.57518142  1.82429728]
 ...
 [-0.37096991  0.00387659  2.59560027 -0.81605596 -1.4082213 ]
 [ 1.82216227 -0.78726464 -1.24825134  0.9700939  -0.59881287]
 [-0.92203334  0.25091886 -0.12528921  0.39948569  0.64727874]]


In [102]:
data[1,0]

0.9053797555505331

It means that that the third element in the first row has an index of [0, 2]:

In [103]:
data[0, 2]

-0.7286074282163371

We can also assign the element with a new value:

In [104]:
data[0, 2] = 100.
print(data[0, 2])

100.0


NumPy (and Python in general) checks the bounds of the array:

In [105]:
print(data.shape)
data[60, 10]

(10000, 5)


IndexError: index 10 is out of bounds for axis 1 with size 5

Finally, we can ask for several elements at once:

In [106]:
data[0, [0, 3]]

array([-0.46651296,  0.21778252])

You can even pass a negative index. It will go from the end of the array.

In [107]:
data[-1, -1]

0.6472787353965579

### 2.2 Slices

We can reuse the slicing as with the Python list or Pandas dataframe to get element from one of the axis.

In [108]:
data[0, 0:2]

array([-0.46651296, -0.72119603])

Note that the returned array does not include third column (with index 2).

You can skip the first or last index (which means, take the values from the beginning or to the end):

In [109]:
data[0, :2]

array([-0.46651296, -0.72119603])

If you omit both indices in the slice leaving out only the colon (:), you will get all columns of this row:

In [110]:
data[0, :]

array([ -0.46651296,  -0.72119603, 100.        ,   0.21778252,
        -0.16485998])

### 2.3 Filtering data

In [111]:
data

array([[-4.66512958e-01, -7.21196029e-01,  1.00000000e+02,
         2.17782520e-01, -1.64859978e-01],
       [ 9.05379756e-01, -2.64446815e-01,  4.03165353e-01,
         6.63602071e-01,  3.82781452e-01],
       [ 3.94784128e-01, -1.15142528e+00, -7.72107773e-02,
         5.75181424e-01,  1.82429728e+00],
       ...,
       [-3.70969913e-01,  3.87658755e-03,  2.59560027e+00,
        -8.16055962e-01, -1.40822130e+00],
       [ 1.82216227e+00, -7.87264640e-01, -1.24825134e+00,
         9.70093903e-01, -5.98812866e-01],
       [-9.22033344e-01,  2.50918862e-01, -1.25289206e-01,
         3.99485688e-01,  6.47278735e-01]])

We can produce a boolean array when using comparison operators.

In [112]:
data > 0

array([[False, False,  True,  True, False],
       [ True, False,  True,  True,  True],
       [ True, False, False,  True,  True],
       ...,
       [False,  True,  True, False, False],
       [ True, False, False,  True, False],
       [False,  True, False,  True,  True]])

This mask can be used to select some specific data.

In [113]:
data[data > 0]

array([100.        ,   0.21778252,   0.90537976, ...,   0.25091886,
         0.39948569,   0.64727874])

It can also be used to affect some new values

In [115]:
data[data > 0] = np.inf
data

[inf inf inf ... inf inf inf]


### 2.4 Quizz

Answer the following quizz:

In [116]:
data = np.random.randn(20, 20)

* Print the element in the $1^{st}$ row and $10^{th}$ cloumn of the data.

In [None]:
# %load solutions/08_solutions.py
data[0, 9]


* Print the elements in the $3^{rd}$ row and columns of $3^{rd}$ and $15^{th}$.

In [122]:
# %load solutions/09_solutions.py
data[2, [2, 14]]

array([-1.56442531,  0.26495772])

* Print the elements in the $4^{th}$ row and columns from $3^{rd}$ t0 $15^{th}$.

In [123]:
# %load solutions/10_solutions.py
data[3, 2:15]

array([-0.53298368,  1.33777875, -0.54826008,  0.14256053, -0.36834811,
        0.24544638,  0.280244  ,  1.07621361,  0.12355789, -1.82682985,
       -0.48602711,  0.74422247, -0.07844387])

* Print all the elements in column $15^{th}$ which their value is above 0.

In [130]:
# %load solutions/11_solutions.py
mask = data[:, 14] > 0
data[mask, 14]

array([0.26495772, 0.64540043, 0.19453746, 1.33949516, 0.98915383])

In [141]:
np.allclose([1, 2, 3], [1, 2, 3])

True

In [142]:
1 - 0.5 + 1

1.5

In [143]:
(1 + 1 + 0.5 - 1) == (1 - 0.5 + 1)

True

## 3. Numerical analysis

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations.

### 3.1 Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [144]:
v1 = np.arange(0, 5)
v1

array([0, 1, 2, 3, 4])

In [145]:
v1 * 2

array([0, 2, 4, 6, 8])

In [146]:
v1 + 2

array([2, 3, 4, 5, 6])

In [149]:
np.sin([1, 2, 3])  # np.log(A), np.arctan(A),...

array([0.84147098, 0.90929743, 0.14112001, 0.99999968])

### 3.2 Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:

In [151]:
A = np.array([[1, 2], [3, 4]])

In [152]:
A * A  # element-wise multiplication

array([[ 1,  4],
       [ 9, 16]])

In [153]:
v1 * v1

array([ 0,  1,  4,  9, 16])

* create a 3-dimensional NumPy array filled with all zeros or ones numbers. You can check the documentation of [np.zeros](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.zeros.html) and [np.ones](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html).

### 3.3 Calculations

Often it is useful to store datasets in NumPy arrays. NumPy provides a number of functions to calculate statistics of datasets in arrays. 

In [158]:
a = np.random.random(40)
np.random.randint(1, 6, 10)

array([3, 1, 3, 1, 2, 4, 5, 1, 3, 2])

Different frequently used operations can be done:

In [157]:
print ('Mean value is', np.mean(a))
print ('Median value is',  np.median(a))
print ('Std is', np.std(a))
print ('Variance is', np.var(a))
print ('Min is', a.min())
print ('Element of minimum value is', a.argmin())
print ('Max is', a.max())
print ('Sum is', np.sum(a))
print ('Prod', np.prod(a))
print ('Cumsum is', np.cumsum(a)[-1])
print ('CumProd of 5 first elements is', np.cumprod(a)[4])
print ('Unique values in this array are:', np.unique(np.random.randint(1, 6, 10)))
print ('85% Percentile value is: ', np.percentile(a, 85))

Mean value is 0.6426818587004184
Median value is 0.6601471787973205
Std is 0.26155473542378926
Variance is 0.06841087962260839
Min is 0.0905546242341384
Element of minimum value is 37
Max is 0.9949429576911121
Sum is 25.707274348016735
Prod 1.7524497011819436e-10
Cumsum is 25.707274348016735
CumProd of 5 first elements is 0.040612939178565594
Unique values in this array are: [1 2 3 4 5]
85% Percentile value is:  0.9113052356474332


In [163]:
a = np.random.random(40)
print(a.argsort())
a.sort() #sorts in place!
print(a.argsort())

[15 37  1 26  4 25 17 30 31 39 33  8 18 28 27 20 14 32  7 36 21 24  5 13
 19  0  9  2 38 16 12  6 11 10 35 22  3 29 23 34]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39]


In [164]:
np.sort(a)
# ceci renvoie une nvelle copie

array([0.01132955, 0.01330643, 0.01757898, 0.01978456, 0.0313841 ,
       0.0444446 , 0.05655343, 0.08555598, 0.12569939, 0.17659714,
       0.18229187, 0.23210642, 0.26540243, 0.28520816, 0.29460951,
       0.34151789, 0.35067948, 0.38122512, 0.43415223, 0.4634177 ,
       0.55915899, 0.56063031, 0.56836893, 0.56945679, 0.58783835,
       0.58826612, 0.5926974 , 0.63931482, 0.67427863, 0.67592641,
       0.67671295, 0.71886099, 0.73111739, 0.73882173, 0.78425008,
       0.91195533, 0.93785059, 0.94932187, 0.9518457 , 0.97586824])

#### Calculations with higher-dimensional data

When functions such as `min`, `max`, etc., is applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis. Using the `axis` argument we can specify how these functions should behave: 

In [165]:
m = np.random.rand(3, 3)
m

array([[0.00469701, 0.46012379, 0.87026313],
       [0.94492203, 0.10399905, 0.97094434],
       [0.3043054 , 0.92538596, 0.74041024]])

In [166]:
# global max
m.max()

0.9709443430589967

In [167]:
# max in each column
m.max(axis=0)

array([0.94492203, 0.92538596, 0.97094434])

In [168]:
# max in each row
m.max(axis=1)

array([0.87026313, 0.97094434, 0.92538596])

Many other functions and methods in the `array` and `matrix` classes accept the same (optional) `axis` keyword argument.

## 4. Data reshaping and merging

* How could you change the shape of the 8-element array you created previously to have shape (2, 2, 2)? Hint: this can be done without creating a new array.

In [169]:
arr = np.arange(8)

In [170]:
arr.reshape(2,2,2)

array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

In [172]:
# %load solutions/07_solutions.py
arr = np.random.random(8)
print('Shape of the array', arr.shape)
arr_reshaped = arr.reshape((2, 2, 2))
print('Shape of the array', arr_reshaped.shape)


Shape of the array (8,)
Shape of the array (2, 2, 2)


In [181]:
# cas ou chaque image fait 8 *8
# cas ou on a 1000 images
print( arr.reshape(1, -1))
print(arr.reshape(8))

[[0.81031257 0.57825823 0.34362185 0.69889831 0.31065321 0.31858178
  0.44262001 0.09898785]]
[0.81031257 0.57825823 0.34362185 0.69889831 0.31065321 0.31858178
 0.44262001 0.09898785]


* Could you reshape the same 8-element array to a column vector. Do the same, to get a row vector. You can use `np.reshape` or `np.newaxis`.

In [188]:
# %load solutions/22_solutions.py
column_arr = arr.reshape(-1, 1)
print(column_arr)
column_arr = arr[np.newaxis, :]
print(column_arr)
row_arr = arr.reshape(1, -1)
print(row_arr)
row_arr = arr[: np.newaxis]
print(row_arr)

[[0.81031257]
 [0.57825823]
 [0.34362185]
 [0.69889831]
 [0.31065321]
 [0.31858178]
 [0.44262001]
 [0.09898785]]
[[0.81031257 0.57825823 0.34362185 0.69889831 0.31065321 0.31858178
  0.44262001 0.09898785]]
[[0.81031257 0.57825823 0.34362185 0.69889831 0.31065321 0.31858178
  0.44262001 0.09898785]]
[0.81031257 0.57825823 0.34362185 0.69889831 0.31065321 0.31858178
 0.44262001 0.09898785]


* Stack vertically two 1D NumPy array of size 10. Then, stack them horizontally. You can use the function [np.hstack](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.hstack.html) and [np.vstack](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.vstack.html). Repeat those two operations using the function [np.concatenate](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.concatenate.html) with two 2D NumPy arrays of size 5 x 2.

In [193]:
# %load solutions/20_solutions.py
X = np.random.randn(10)
Y = np.random.randn(10)
print(np.hstack((X, Y)).shape)
print(np.vstack((X, Y)).shape)


(20,)
(2, 10)


In [195]:
# %load solutions/21_solutions.py
X = np.random.randn(5, 2)
Y = np.random.randn(5, 2)
print(np.concatenate((X, Y), axis=0).shape)
print(np.concatenate((X, Y), axis=1).shape)


(10, 2)
(5, 4)
