<img src="logo-spegc.svg" width=30%>

# Numpy

Numpy es una librería para computación científica en Python. Se basa principalmenete en un objeto matriz multidimensional (**array**) sobre el que se puede realizar un gran numero de operaciones con un elevado rendimiento.

Podemos inicializar un array de Numpy a partir de una lista en Python:

In [33]:
import numpy as np

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])   # Create a array
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Numpy proporciona varias funciones para crear arrays

In [2]:
import numpy as np

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.98834182 0.44593392]
 [0.00428605 0.12188129]]


## Tamaño, dimensión y rango de una matriz en Numpy

- **ndarray.ndim**: la cantidad de ejes (dimensiones) de la matriz. En Python, el número de dimensiones se denomina rango (ojo!, no confundir con el concepto de "rango" en el álgebra matricial.


- **ndarray.shape**: tupla con el tamaño de la matriz en cada dimensión. La longitud de esta tupla es, por lo tanto, el rango o el número de dimensiones de la matriz, ndim.


- **ndarray.size**: la cantidad total de elementos de la matriz.


- **ndarray.dtype**: tipo de los elementos en la matriz. Es posible especificar el tipo (dtype) usando los tipos de Python estándar y los tipos propio de Numpy: int32, como numpy.int16 o numpy.float64, entre otros.


- **ndarray.data**: contiene los elementos reales de la matriz. Normalmente, no necesitaremos usar este atributo porque accederemos a los elementos en una matriz usando indexación.

In [10]:
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

print(a)
print(a.shape)
print(a.ndim)
print(a.dtype)
print(type(a))

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
(3, 4)
2
int64
<class 'numpy.ndarray'>


## Broadcasting

In [16]:
import numpy as np

a = np.array([1,2,3,4,5], dtype="float")
s = 5.
print(a * s)
print("---------------------")
b = np.array([[1,2,3,4,5], [6,7,8,9,10]], dtype="float")
s = 5.
print(b * s)
print("---------------------")
print(b + np.array([[10],[20]]))
print("---------------------")
print(b + np.array([[10,20,30,40,50]]))
print("---------------------")
print(b * np.array([[10,20,30,40,50]]))

[ 5. 10. 15. 20. 25.]
---------------------
[[ 5. 10. 15. 20. 25.]
 [30. 35. 40. 45. 50.]]
---------------------
[[11. 12. 13. 14. 15.]
 [26. 27. 28. 29. 30.]]
---------------------
[[11. 22. 33. 44. 55.]
 [16. 27. 38. 49. 60.]]
---------------------
[[ 10.  40.  90. 160. 250.]
 [ 60. 140. 240. 360. 500.]]


## Indexación de arrays
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html

Numpy ofrece varias formas de indexar matrices.

**Slicing** (rebanado): Es posible dividir las matrices en Numpy de forma similar a como lo hacemos en una lista de Python. Como las matrices pueden ser multidimensionales, debemos especificar cada "rodaja" en la que dividimos la matriz.

In [15]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

print(a[2, 3])
print(a[0, 0])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print(b)

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

12
1
[[2 3]
 [6 7]]
2
77


También es posible combinar la indexación usando enteros con la indexación de sectores. Sin embargo, al hacerlo, se obtendrá una matriz de menor rango que la matriz original.

In [5]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)


### Indexación mediante arrays de enteros

Cuando se indexa en matrices numpy utilizando slicing, la matriz resultante siempre será una submatriz de la matriz original. Sin embargo, mediante la indexación con matrices enteras es posible construir matrices arbitrarias usando los datos de otra matriz.

In [17]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])
print(a)

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

[[1 2]
 [3 4]
 [5 6]]
[1 4 5]
[1 4 5]
[2 2]
[2 2]


Un truco útil en indexación de matrices enteras es seleccionar o mutar un elemento de cada fila de una matriz:

In [18]:
import numpy as np

# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 1  6  7 11]
[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


### Indexado de arrays booleanos

In [27]:
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]
                     #          [ True  True]]"

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"

# We can also set to zero the values of the elements greater than 3
a[a>3]=0
print(a)

[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]
[[1 2]
 [3 0]
 [0 0]]


## Matemáticas con arrays

Operaciones "element wise"

In [28]:
import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[1.         1.41421356]
 [1.73205081 2.        ]]


### Producto de matrices (dot)

In [30]:
import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

219
219
[29 67]
[29 67]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


### Sumas

In [31]:
import numpy as np

x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


### Transpuesta

In [32]:
import numpy as np

x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]


# Estadística con Numpy

In [12]:
import numpy as np

data = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]])

print("Max method", data.max())
print("Max function", np.max(data))

print("Min method", data.min())
print("Min function", np.min(data))

print("Mean method", data.mean())
print("Mean function", np.mean(data))

print("Std method", data.std())
print("Std function", np.std(data))

print("Max along first axis", data.max(axis=0))
print("Max along second axis", data.max(axis=1))

print("Max along second axis", np.max(data[:, 0:3] , axis=1))



Max method 15
Max function 15
Min method 1
Min function 1
Mean method 8.0
Mean function 8.0
Std method 4.320493798938574
Std function 4.320493798938574
Max along first axis [11 12 13 14 15]
Max along second axis [ 5 10 15]
Max along second axis [ 3  8 13]


## Copias y vistas

Al ejecutar ciertas funciones, algunas de ellas devuelven una *copia* de la matriz de entrada, mientras que otras devuelven una *vista*. Cuando los contenidos se almacenan físicamente en otra ubicación, se llama **copia**. Si, por el contrario, lo que se proporciona es una vista diferente del mismo contenido de memoria, lo llamamos **vista**.

### Asignaciones sin copia

Las asignaciones simples no hacen *copia* del objeto de matriz, utilizan el mismo **id()** de la matriz original para acceder a ella. El **id()** devuelve un identificador universal de objeto Python, similar al puntero en C.

Además, cualquier cambio en cualquiera de ellos se refleja en el otro. Por ejemplo, la forma cambiante de uno cambiará también la forma del otro.

In [15]:
import numpy as np 
a = np.arange(6) 

print('Our array is:') 
print(a)  

print('Applying id() function:')
print(id(a))  

print('a is assigned to b:')
b = a 
print(b)  

print('b has same id():')
print(id(b))  

print('Change shape of b:')
b.shape = 3,2 
print(b)  

print('Shape of a also gets changed:')
print(a)

b[0,0] = 99
print(a)

Our array is:
[0 1 2 3 4 5]
Applying id() function:
4426459776
a is assigned to b:
[0 1 2 3 4 5]
b has same id():
4426459776
Change shape of b:
[[0 1]
 [2 3]
 [4 5]]
Shape of a also gets changed:
[[0 1]
 [2 3]
 [4 5]]
[[99  1]
 [ 2  3]
 [ 4  5]]


## Copia profunda

La función **ndarray.copy()** crea una **copia profunda**. Es una copia completa de la matriz y sus datos, y no se comparte con la matriz original.

In [17]:
import numpy as np 
a = np.array([[1,2], [3,4], [5,6]]) 

print('Array a is:')
print(a)  

print('Create a deep copy of a:')
b = a.copy() 
print('Array b is:')
print(b) 

#b does not share any memory of a 
print('b is a?')
print(b is a)  

print('Change the contents of b:')
b[0,0] = 100 

print('Modified array b:')
print(b)  

print('a remains unchanged:')
print(a)

Array a is:
[[1 2]
 [3 4]
 [5 6]]
Create a deep copy of a:
Array b is:
[[1 2]
 [3 4]
 [5 6]]
b is a?
False
Change the contents of b:
Modified array b:
[[100   2]
 [  3   4]
 [  5   6]]
a remains unchanged:
[[1 2]
 [3 4]
 [5 6]]


## Lectura de ficheros

In [18]:
import numpy as np

data = np.genfromtxt('iris.data', delimiter=",")  # iris.data file loading

np.random.shuffle(data)  # we shuffle the data

x_data = data[:, 0:4].astype('f4')  # the samples are the four first rows of data
y_data = data[:, 4].astype(int)  # the labels are in the last row. Then we encode them in one hot code

print(x_data[0])
print(y_data[0])

[5.1 3.3 1.7 0.5]
0
