# Handling data with numpy

## What you'll learn in this course 🧐🧐

Now you're familiar with Numpy, we'll go deeper in details about some must-knows of Numpy, in particular as regards manipulating numpy arrays. The objective of this course is therefore :

* Understand the usefulness of Numpy in tensor manipulation
* Create and manipulate tensors with Numpy
* Know how to iterate on tensors
* Know how to create Numpy masks
* Understand what the *shape* of a tensor is

### Import librairies

In [86]:
import numpy as np

### Initialize numpy arrays with pre-defined values

#### Create matrices of 0s or 1s

It is useful to be able to create matrices composed only of 0 or 1. To do this, you can use: ```np.zeros()``` or ```np.ones()```.

In [87]:
# Initialization to 0 (matrix 3x4)
display(np.zeros((3,4)))

# Initialization to 1 (matrix 5x4)
display(np.ones((5,4)))

# Third order tensor (can be seen as 2 matrices 3x4)
np.ones((2,3,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

#### Create a list of regularly spaced values
We have seen that the `range()` function in python can generate lists of regularly spaced integers. In the same way, the np.arange() method allows to generate lists of values (including decimal numbers):

In [88]:
# Generates a list of numbers from 3.2 to 4.8 excluded with a step of 0.2.
np.arange(3.2, 4.8, 0.2)

array([3.2, 3.4, 3.6, 3.8, 4. , 4.2, 4.4, 4.6])

### Initialize an array from a user-defined function

In [89]:
def f(x, y):
    return 10*x+y

# Creating an array from the function from above
np.fromfunction(f,(5,4),dtype=int)

array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])

## Accessing items in a multidimensional array

In [90]:
# Creation of a matrix
c = np.array([[  0,  1,  2],
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

In [91]:
c[3, 2] # Accessing to the value contained in the 4th row, 3rd column, is accessed

113

In [92]:
# Creation of a third order tensor
c = np.array( [[[  0,  1,  2],
                 [ 10, 12, 13]],
                [[100,101,102],
                 [110,112,113]]])

In [93]:
c[0, 1, 2] # We access the value contained in the 2nd row, 3rd column of the first matrix

13

### Iterate over numpy arrays

In [94]:
# Create matrix
c = np.array([[  0,  1,  2],
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

#### Iterate on arrays

You can use the `np.flat` attribute to easily iterate across all the elements in an array:

In [95]:
# Iterate on the elements of the matrix

# Not recommanded (may be slow):
for row in c:
    for e in row:
        print(e)

print()
print('----------')
print()

# Better:
for e in c.flat:
    print(e)

0
1
2
10
12
13
100
101
102
110
112
113

----------

0
1
2
10
12
13
100
101
102
110
112
113


### Slices
In the same way as in pandas, you can use the slices syntax to select a sub-part of the arrays.

In [96]:
def f(x, y):
    return 10*x+y

# Creating an array from the function from above
b = np.fromfunction(f,(5,4),dtype=int)
print(b)

[[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]]


In [97]:
# Slices
print(b[0:5, 1])  # every lines of the second column
print()
print(b[:, 1])  # every lines of the second column

print()
print('----------')
print()

print(b[1:3, : ]) # 2nd and 3rd lines of all columns

print()
print('----------')
print()

print(b[2, 1:3]) # 3rd line, 2nd and 3rd columns

print()
print('----------')
print()

print(b[2, 2:4]) # 3rd line, 2 last columns
print()
print(b[2, -2:]) # 3rd line, 2 last columns

[ 1 11 21 31 41]

[ 1 11 21 31 41]

----------

[[10 11 12 13]
 [20 21 22 23]]

----------

[21 22]

----------

[22 23]

[22 23]


### Masks
Similarly, you can also use masks to select elements from your arrays based on a condition:

In [98]:
# Create a table
a = np.arange(12).reshape(3,4)
display(a)
# Create a mask
b = a > 4
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

In [99]:
# Use the mask to select elements of the matrix
a[b]

array([ 5,  6,  7,  8,  9, 10, 11])

Only the values for which the mask had the value `True` were shown.

## Manipulation of the shape of a numpy array

Finally, it is good to talk about the *shape* of an array. This is especially useful when manipulating images when you're attacking deep learning.

### What is the shape of a numpy array?

The *shape* of an array gives us information about its dimension and the number of elements present in each dimension.

In [100]:
# This is the shape of a matrix

c = np.array([[  0,  1,  2],
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

c.shape

(4, 3)

In [101]:
# A matrix is an array of the second dimension (order):
len(c.shape)

2

In [102]:
# A vector is an array of the first dimension:
v = np.array([1,2,3,4])
len(v.shape)

1

### Shape & Reshape

Quite often you need to change the shape of a matrix. For example, you might want to invert the rows and columns of your matrix or *flatten* your matrix. You can do this with ``.reshape()```

In [103]:
# Change array's shape with reshape()
c = np.array([[  0,  1,  2],
               [ 10, 12, 13],
               [100,101,102],
               [110,112,113]])

c.reshape((2,6)) # Be careful: n_lines x n_column must remain equal to the number of elements present in c

array([[  0,   1,   2,  10,  12,  13],
       [100, 101, 102, 110, 112, 113]])

You can also *guess* the value inside one of the dimensions of your array by setting the value `-1`.

In [104]:
# You can use -1 to let numpy guess the number of rows or the number of columns:
c.reshape((-1,4))

array([[  0,   1,   2,  10],
       [ 12,  13, 100, 101],
       [102, 110, 112, 113]])

In [105]:
c.reshape((2,-1))

array([[  0,   1,   2,  10,  12,  13],
       [100, 101, 102, 110, 112, 113]])

## List of functions grouped by theme

We have explained only the main features of Numpy in this course, but if you wish, you can explore the exhaustive list grouped by theme:

[Routines](https://docs.scipy.org/doc/numpy/reference/routines.html#routines)