# Introducción/repaso de NumPy

- Objetivo:
    - Conceptos y uso básico de Numpy.
    - Preparación para Pandas

# Introducción/repaso de NumPy

- Plan:
    1. Arrays de NumPy.
    2. Vistas vs copias
    3. Inicialización
    4. Manipulación de arrays
    5. Indexado y máscaras de selección
    6. Broadcasting

# 1. Arrays de NumPy

In [1]:
# Idiom típico para importar Numpy
import numpy as np 

- La estructura básica de NumPy es el array n-dimensional. 
- Los arrays tienen:
    - una forma (cantidad de dimensiones y tamaño de cada una y 
    - un tipo de (int32, float32, int64, etc.).

In [2]:
# Creación de un array de 10 floats (32bit).
a = np.ndarray( shape=(10), dtype=np.float32 )
a

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

Podemos ver la forma de un array con el atributo shape.

In [3]:
a.shape

(10,)

Podemos crear arrays n-dimensionales y con distintos tipos. Por ejemplo, un array (4,3) de números complejos.

In [5]:
z = np.ndarray( shape=(4,3), dtype=complex )
z

array([[6.23042070e-307+4.67296746e-307j,
        1.69121096e-306+1.86920192e-306j,
        1.33511562e-306+7.56599807e-307j],
       [8.90104239e-307+9.34593493e-307j,
        6.23059726e-307+9.34607074e-307j,
        1.24610723e-306+8.45593934e-307j],
       [2.78152906e-307+1.11261027e-306j,
        1.11261502e-306+1.42410839e-306j,
        7.56597770e-307+6.23059726e-307j],
       [1.42419530e-306+9.79101761e-307j,
        1.42417629e-306+9.34603679e-307j,
        1.78019761e-306+2.56765117e-312j]])

In [6]:
z.shape

(4, 3)

In [7]:
z[0] = 2 + 3j #sobreescribe todos los valores en la fila 0
z

array([[2.00000000e+000+3.00000000e+000j,
        2.00000000e+000+3.00000000e+000j,
        2.00000000e+000+3.00000000e+000j],
       [8.90104239e-307+9.34593493e-307j,
        6.23059726e-307+9.34607074e-307j,
        1.24610723e-306+8.45593934e-307j],
       [2.78152906e-307+1.11261027e-306j,
        1.11261502e-306+1.42410839e-306j,
        7.56597770e-307+6.23059726e-307j],
       [1.42419530e-306+9.79101761e-307j,
        1.42417629e-306+9.34603679e-307j,
        1.78019761e-306+2.56765117e-312j]])

- Es común que querramos crear arrays con valores iniciales.
- Por ejemplo, con una secuencia de números. 
    - La función arange puede verse como el equivalente en NumPy de range() de Python.

In [8]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Indexación y slicing.

In [9]:
a[0],a[1],a[-1]

(0, 1, 9)

In [10]:
a[0:5]

array([0, 1, 2, 3, 4])

# 2.  Vistas vs. copias

- En la sección anterior se mostró como en Python cada variable asignada recibe un nuevo objeto. 
- En NumPy el comportamiento por defecto es no crear un nuevo objeto, sino devolver una referencia a un objeto existente. 

In [11]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
b = a # Importante! b apunta a a
print(id(a),id(b))
assert(id(a)==id(b))

1847328934384 1847328934384


In [13]:
a[0],b[0]

(0, 0)

In [14]:
a[0] = 99

- Es importante tener cuidado con esto! 
- Un error común cuando se trabaja con variables para cálculos intermedios es modificar el dato original.

In [15]:
b[0]

99

In [16]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
b = a.copy()
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [18]:
a[0] = 99

In [19]:
a[0],b[0]

(99, 0)

# 3. Inicialización de arrays

Muestreo con interpolación lineal.

In [20]:
a = np.linspace(0,1,10)
a

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

Inicilizar una matriz de ceros.

In [21]:
a = np.zeros(shape=(2,2))
a

array([[0., 0.],
       [0., 0.]])

Inicilizar una matriz de unos.

In [22]:
a = np.ones(shape=(2,3))
a

array([[1., 1., 1.],
       [1., 1., 1.]])

Crear una matriz identidad.

In [23]:
a = np.eye(4)
a

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

# 4. Manipulación de arrays

Operación transpuesta.

In [24]:
a.T

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [25]:
a = np.arange(10,20)
b = np.arange(20,30)

Vertical Stacking.

In [26]:
c = np.vstack([a,b])
c.shape

(2, 10)

In [27]:
c

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

Horizontal Stacking

In [28]:
c = np.hstack([a,b])
c.shape

(20,)

In [29]:
c

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29])

Flatten y Reshaping

In [30]:
a = np.random.randint(4, size=(6, 3))
a.shape

(6, 3)

In [31]:
a_flattened = a.ravel() #Devuelve una matriz plana contigua.
a_flattened

array([3, 1, 1, 1, 1, 3, 3, 1, 2, 3, 2, 3, 2, 1, 0, 0, 2, 3])

In [32]:
a_flattened.shape

(18,)

In [33]:
a.reshape(2,9)

array([[3, 1, 1, 1, 1, 3, 3, 1, 2],
       [3, 2, 3, 2, 1, 0, 0, 2, 3]])

# 5. Indexado y máscaras de selección

A menudo es útil seleccionar los elementos de un array que verifiquen alguna condición para realizar acciones sólo con esos elementos.

In [34]:
a = np.random.randint(4, size=(6, 3))
print(a.shape)
a

(6, 3)


array([[1, 1, 0],
       [2, 0, 0],
       [0, 3, 3],
       [3, 1, 2],
       [2, 2, 3],
       [3, 0, 3]])

In [35]:
# Fila 0
a[0,:]

array([1, 1, 0])

In [36]:
# Si no se indica una selección de elementos en una dimensión
# por defecto es ::1 (todos los elementos recorridos de a 1)
a[0] # a[0,::1]

array([1, 1, 0])

In [37]:
a[0,0]

1

In [38]:
a[0,2]

0

In [39]:
a[1]

array([2, 0, 0])

In [40]:
a[-1]

array([3, 0, 3])

In [41]:
# Todas las filas, sólo la última columna
a[:,-1]

array([0, 0, 3, 2, 3, 3])

In [42]:
# Acceso aleatorio filas 0,3 y 4
random_access_indexes = [ 0, 3, 4 ]

a[random_access_indexes,:]

array([[1, 1, 0],
       [3, 1, 2],
       [2, 2, 3]])

In [43]:
# Aplicar una expresión con arrays 
# devuelve un array del mismo tamaño 
# con el resultado de la evaluación element-wise
condition_is_greater_than_2 = a > 2
condition_is_greater_than_2

array([[False, False, False],
       [False, False, False],
       [False,  True,  True],
       [ True, False, False],
       [False, False,  True],
       [ True, False,  True]])

In [44]:
condition_is_less_than_3 = a < 3
condition_is_less_than_3

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True, False, False],
       [False,  True,  True],
       [ True,  True, False],
       [False,  True, False]])

In [45]:
condition_is_even = a & 1 == 0
condition_is_even

array([[False, False,  True],
       [ True,  True,  True],
       [ True, False, False],
       [False, False,  True],
       [ True,  True, False],
       [False,  True, False]])

In [46]:
condition_is_odd = a & 1 == 1
condition_is_odd

array([[ True,  True, False],
       [False, False, False],
       [False,  True,  True],
       [ True,  True, False],
       [False, False,  True],
       [ True, False,  True]])

In [47]:
# Composición de condiciones
(condition_is_greater_than_2 & condition_is_less_than_3) & (condition_is_even)

array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]])

In [48]:
# Para alguna operación puedo necesitar conocer los índices (i,j,..) de un elemento
rows = np.arange(3)
cols = np.arange(3)

a = (rows,cols)
a

(array([0, 1, 2]), array([0, 1, 2]))

In [49]:
# Lo anterior es equivalente a esto: 
grid = np.indices((3,3))

i = 0
k = 0
grid[:,i,k]

array([0, 0])

In [50]:
# Y para qué sirve? Ejemplo. i == j
diag_mask = grid[0,:,:] == grid[1,:,:]
diag_mask

array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])

In [51]:
# i >= j
lower_diag_mask = grid[0,:,:] >= grid[1,:,:]
lower_diag_mask

array([[ True, False, False],
       [ True,  True, False],
       [ True,  True,  True]])

In [52]:
# i <= j
upper_diag_mask = grid[0,:,:] <= grid[1,:,:]
upper_diag_mask

array([[ True,  True,  True],
       [False,  True,  True],
       [False, False,  True]])

In [53]:
a = np.random.rand(3,3)
a

array([[0.75530901, 0.29508174, 0.94885786],
       [0.34649668, 0.24795653, 0.04044103],
       [0.87144716, 0.68376603, 0.11716173]])

In [54]:
a[diag_mask]

array([0.75530901, 0.24795653, 0.11716173])

In [55]:
a[upper_diag_mask]

array([0.75530901, 0.29508174, 0.94885786, 0.24795653, 0.04044103,
       0.11716173])

In [56]:
a[lower_diag_mask]

array([0.75530901, 0.34649668, 0.24795653, 0.87144716, 0.68376603,
       0.11716173])

In [57]:
np.sum(a[diag_mask])

1.1204272763999434

In [58]:
a_diag = a
a_diag

array([[0.75530901, 0.29508174, 0.94885786],
       [0.34649668, 0.24795653, 0.04044103],
       [0.87144716, 0.68376603, 0.11716173]])

In [59]:
a_diag[0][0] = 1

In [60]:
a[0]

array([1.        , 0.29508174, 0.94885786])

# 6. Broadcasting

> The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. 
> Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have 
> compatible shapes.

Fuente: Broadcasting, SciPy.org.

> In the context of deep learning, we also use some less conventional notation. 
> We allow the addition of matrix and a vector, yielding another matrix:
> C = A + b, where Ci,j = Ai,j + bj. 
> In other words, the vector b is added to each row of the matrix. 
> This shorthand eliminates the need to define a matrix with b copied into each 
> row before doing the addition. This implicit copying of b to many locations 
> is called broadcasting.

Fuente: Deep Learning (Adaptive Computation and Machine Learning series)

## Regla de broadcasting

> In order to broadcast, the size of the trailing axes for both arrays in an operation 
> must either be the same size or one of them must be one.

In [61]:
a = np. array([[ 0.0, 0.0, 0.0],
            [10.0,10.0,10.0],
            [20.0,20.0,20.0],
            [30.0,30.0,30.0]])

In [62]:
b = np.array([1.0,2.0,3.0])
a + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

## Bibliografía y Referencias

- Sci-Py Lectures https://scipy-lectures.org/. Accedido: 25/01/2021
- A Visual Intro to NumPy and Data Representation. http://jalammar.github.io/visual-numpy/. Accedido: 25/01/2021
- [Array Broadcasting in numpy](https://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc)