<font color="#6E6E6E"><h2 align="center">Numpy python library</h2></font>
<font color="#279D9F"><h6 align="center">Adapted from a notebok of Luis Fernándo Lago Fernández - Universidad Autónoma de Madrid</h2></font>
<h6 align="center">(Translation and adaptation: Gonzalo Martínez Muñoz)</h6>

## NumPy Library

<a href="http://www.numpy.org/">Numpy</a> (Numerical Python) is an easy-to-use package to work with numerical arrays and matrices. This package is fast and memory efficient. 

The main object in NumPy is *ndarray* that represents multidimensional arrays.

## 1. Class ndarray

An <a href="http://docs.scipy.org/doc/numpy-1.9.1/reference/generated/numpy.ndarray.html">ndarray</a> object represents a multidimensional array of elements. All elements are of the same type: int, float, boolean, string, etc. The type in generally assigned in the creation of the object.

In [2]:
# Importing numpy and creating an ndarray to represent a 3x3 matrix:
import numpy as np

data = np.array([[1, 2], [3, 4], [5, 6]])
data

array([[1, 2],
       [3, 4],
       [5, 6]])

All ndarray objects have a *shape* attribute to indicate the size of the array, and an attribute *dtype* to indicate the type of the elements of the arrays.

In [2]:
# Size of the array:
print(data.shape)
print("This array has {0} rows and {1} columns".format(data.shape[0], data.shape[1]))

(3, 2)
This array has 3 rows and 2 columns


In [3]:
# Type of the elements of the array:
data.dtype

dtype('int32')

### Array creation

A array can be created from any other sequence (lists or other ndarrays), using <a href="http://docs.scipy.org/doc/numpy-1.9.1/reference/generated/numpy.array.html#numpy.array">numpy.array</a> function. It is possible to specify in the construction the type of the array (dtype) and the dimension. 

Some examples:

In [4]:
# Creating an array from a standard python list:
x = np.array([1, 2, 3])
print(x.shape)
x

(3,)


array([1, 2, 3])

In [5]:
# From other ndarray:
y = np.array(x)
y

array([1, 2, 3])

In [9]:
# The type is selected as the minimum type that covers all data:
x = np.array([1, 2, 3])
print(x.dtype)
print(x)
y = np.array([1, 2, 3.0])
print(y.dtype)
print(y)

int32
[1 2 3]
float64
[1. 2. 3.]


In [7]:
# Setting the number of dimensions:
x = np.array([1, 2, 3], ndmin = 2)
print(x.shape)
x

(1, 3)


array([[1, 2, 3]])

In [8]:
# Setting the type:
x = np.array([1, 2, 3], dtype=float)
x.dtype

dtype('float64')

Other functions to create and initialize arrays/matrices:

- Arrays of zeros: <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html">numpy.zeros</a> and <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros_like.html">numpy.zeros_like</a> 
- Arrays of ones: <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html">numpy.ones</a> and <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones_like.html">numpy.ones_like</a>
- Identity matrix: <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.eye.html">numpy.eye</a> and <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html">numpy.identity</a>
- The function <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html">numpy.arange</a> behaves similarly to the standard python function <tt>range</tt>, but returns an ndarray.
- The function <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html">numpy.linspace</a> returns evenly spaced numbers over a specified interval.

In [10]:
# 3x2 matrix of zeros:
np.zeros((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [11]:
# Array of ones with the same size of x:
x = np.array([1, 2, 3])
np.ones_like(x)

array([1, 1, 1])

In [12]:
# Identity matrix of 5x5:
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [13]:
# Creating different series
y = np.arange(10)
print(y)
# From 1 (included) to 11 (excluded) in steps of 2
x = np.arange(1,11,2)
print(x)

[0 1 2 3 4 5 6 7 8 9]
[1 3 5 7 9]


In [56]:
# Generate an array of 10 elements evenly separated between -pi and pi
np.linspace(-np.pi,np.pi,10)
#np.logspace(-np.pi,np.pi,10)

[7.21784159e-04 3.60196134e-03 1.79750765e-02 8.97020666e-02
 4.47645424e-01 2.23391092e+00 1.11480152e+01 5.56325865e+01
 2.77626522e+02 1.38545573e+03]


### Basic operations with arrays

Arithmetic operations can be performed with ndarrays using the same syntax as the one used for scalars and avoiding the use of explicit loops. 



In [15]:
# We create an ndarray:
data = np.array([[1, 2, 3], [4, 5, 6]])
data

array([[1, 2, 3],
       [4, 5, 6]])

#### Arithmetic operations of ndarray and scalars 
The operation is performed to each element of the array

In [17]:
# Add up a scalar to the array:
data + 1

array([[2, 3, 4],
       [5, 6, 7]])

In [18]:
# Multiply by an scalar:
data * 10

array([[10, 20, 30],
       [40, 50, 60]])

#### Arithmetic operations between arrays of the same size
Arithmetic operations between arrays are also performed element by element


In [19]:
# Adding two arrays:
data + data

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [20]:
# Multiplying two arrays:
data * data

array([[ 1,  4,  9],
       [16, 25, 36]])

#### Boolean operations
All normal boolean operations, that is '<', '>', '>=', ==, etc., are applied element by element and an array of booleans is returned.

In [22]:
data = np.array([[1, 2, 3], [4, 5, 6], [4, 5, 6]])
print(data)
data > 4

[[1 2 3]
 [4 5 6]
 [4 5 6]]


array([[False, False, False],
       [False,  True,  True],
       [False,  True,  True]])

## Exercises

1. Built a squared matrix of 5x5 elements such that:
    - All elements in the diagonal are 10
    - All elements outside the diagonal are 5
2. Create an array containing the names of the days of the week. What is it dtype?

In [54]:
#1
sqr_matrix = np.identity(5)*5+5
print(sqr_matrix)
sqr_matrix2 = (np.identity(5)+1)*5
print(sqr_matrix2)

[[10.  5.  5.  5.  5.]
 [ 5. 10.  5.  5.  5.]
 [ 5.  5. 10.  5.  5.]
 [ 5.  5.  5. 10.  5.]
 [ 5.  5.  5.  5. 10.]]
[[10.  5.  5.  5.  5.]
 [ 5. 10.  5.  5.  5.]
 [ 5.  5. 10.  5.  5.]
 [ 5.  5.  5. 10.  5.]
 [ 5.  5.  5.  5. 10.]]


In [58]:
def diag2(n, f):
	''' f: factor
    n: tamaño del array'''
	return (np.identity(n) + 1) * f
diag2(5,5)

array([[10.,  5.,  5.,  5.,  5.],
       [ 5., 10.,  5.,  5.,  5.],
       [ 5.,  5., 10.,  5.,  5.],
       [ 5.,  5.,  5., 10.,  5.],
       [ 5.,  5.,  5.,  5., 10.]])

In [33]:
#2

weekdays_matrix = np.array(['Monday', 'Tuesday', 'Wednesday', 
                         'Thursday', 'Friday', 'Saturday', 'Sunday'])
print(weekdays_matrix.dtype)

<U9


## 2. Indexing and slicing

For unidimensional arrays, slicing works like for standard python lists:

In [59]:
# Create an ndarray of one dimenion with numbers 0 to 9:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [60]:
# Access to element in position 4 (arrays are 0-indexed):
x[4]

4

In [61]:
# Access to elements from position 2 (included) to 7 (excluded):
x[2:7]

array([2, 3, 4, 5, 6])

In [62]:
# Last three elements:
x[-3:]

array([7, 8, 9])

If a scalar value is assigned to a *slice*, the value is assigned to all elements in the slice 

In [63]:
# Assigning value -1 to slice x[4:7]:
x[4:7] = -1
x

array([ 0,  1,  2,  3, -1, -1, -1,  7,  8,  9])

If a array is assigned to a *slice*, the values are assigned one by one to the selected elements in the slice 

In [64]:
# Assigning value -1 to slice x[4:7]:
x = np.arange(10)
print(x)
x[0::2] = [19,17,15,13,11]
x

[0 1 2 3 4 5 6 7 8 9]


array([19,  1, 17,  3, 15,  5, 13,  7, 11,  9])

Note that if you get a reference to a slice, the elements are not copied and only the reference to the slice elements are kept. This means that if you use the slice afterwards to modify its values, the values are changed in the original array. Let's see an example:

In [73]:
x = np.ones((5,3), dtype=int)
x

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [72]:
y = x[:5:2,1:3]
y

array([[1, 1],
       [1, 1],
       [1, 1]])

In [70]:
y[:] = 25
x

array([[ 1, 25, 25],
       [ 1,  1,  1],
       [ 1, 25, 25],
       [ 1,  1,  1],
       [ 1, 25, 25]])

If we need a copy of the slice instead of a reference to the values in the original arrays, we can use the copy() function:

In [74]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [75]:
y = x[:3].copy()
y

array([0, 1, 2])

In [76]:
y[2] = 25
print(x)
print(y)

[0 1 2 3 4 5 6 7 8 9]
[ 0  1 25]


With multidimensional arrays we use the same ideas for slicing. Commas are used to separate the slices for each dimension:

In [78]:
data = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
data

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [79]:
# Access columns 1 to 3 (excluded) of row 1:
data[1,1:3]

array([6, 7])

In [80]:
# If only the first dimension is given we will access rows. In this case rows 1 and 2
data[1:3]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [81]:
# If we want to select complete columns we put ':' in the first dimension to 
# indicate that all elements are selected. For instance this code selects 
# columns 1 and 2:
data[:, 1:3]

array([[ 2,  3],
       [ 6,  7],
       [10, 11],
       [14, 15]])

In [82]:
# Multiple slice along both dimensions:
data[1:4:2, 1:4:2]

array([[ 6,  8],
       [14, 16]])

### Boolean indexing

It is posible to use an array of boolean values (True/False) to select elements from the ndarray. 

In the next example, we first create a matrix of normal random numbers

In [83]:
# Matrix with normal random numbers of size 10x3:
data = np.random.randn(10, 3)
data

array([[ 1.04641955, -1.45606284,  0.40657132],
       [ 0.26352206, -0.89930644,  0.6182393 ],
       [-0.67780179,  0.32454297,  1.17200945],
       [-0.43900153,  0.42115866,  0.46682241],
       [ 0.23027901,  1.40398617, -0.52401569],
       [-2.21870546,  0.61607108, -1.21852446],
       [-0.2165149 , -1.05522391,  1.79847349],
       [-0.78535649,  2.51108983, -1.02716463],
       [-1.06364468,  3.9569818 ,  0.98471536],
       [ 0.70239547,  0.21571521,  0.47272245]])

In [84]:
# Select positive numbers
data[data>0]

array([1.04641955, 0.40657132, 0.26352206, 0.6182393 , 0.32454297,
       1.17200945, 0.42115866, 0.46682241, 0.23027901, 1.40398617,
       0.61607108, 1.79847349, 2.51108983, 3.9569818 , 0.98471536,
       0.70239547, 0.21571521, 0.47272245])

In [85]:
# Crating the boolean indexes of the rows whose first element is positive
ix = data[:,0] > 0
ix

array([ True,  True, False, False,  True, False, False, False, False,
        True])

In [86]:
# Now we use the boolean index to select the rows that start with a positive number:
selected = data[ix, :]
selected

array([[ 1.04641955, -1.45606284,  0.40657132],
       [ 0.26352206, -0.89930644,  0.6182393 ],
       [ 0.23027901,  1.40398617, -0.52401569],
       [ 0.70239547,  0.21571521,  0.47272245]])

It is important to note that boolean indexing always creates a copy differently to slicing.

In [87]:
# We change some selected elements:
selected[1, :] = 1
selected

array([[ 1.04641955, -1.45606284,  0.40657132],
       [ 1.        ,  1.        ,  1.        ],
       [ 0.23027901,  1.40398617, -0.52401569],
       [ 0.70239547,  0.21571521,  0.47272245]])

In [88]:
# The original array remains the same:
data

array([[ 1.04641955, -1.45606284,  0.40657132],
       [ 0.26352206, -0.89930644,  0.6182393 ],
       [-0.67780179,  0.32454297,  1.17200945],
       [-0.43900153,  0.42115866,  0.46682241],
       [ 0.23027901,  1.40398617, -0.52401569],
       [-2.21870546,  0.61607108, -1.21852446],
       [-0.2165149 , -1.05522391,  1.79847349],
       [-0.78535649,  2.51108983, -1.02716463],
       [-1.06364468,  3.9569818 ,  0.98471536],
       [ 0.70239547,  0.21571521,  0.47272245]])

It is possible to combine several boolean operations using (and), | (or) and ~ (negation).

In [89]:
# Select the rows where the first element is positive and the last negative:
data[(data[:, 0] > 0) & (data[:, 2] < 0), :]

array([[ 0.23027901,  1.40398617, -0.52401569]])

In [90]:
# Select the rows where the first element is positive or the last negative:
data[(data[:, 0] > 0) | (data[:, 2] < 0), :]

array([[ 1.04641955, -1.45606284,  0.40657132],
       [ 0.26352206, -0.89930644,  0.6182393 ],
       [ 0.23027901,  1.40398617, -0.52401569],
       [-2.21870546,  0.61607108, -1.21852446],
       [-0.78535649,  2.51108983, -1.02716463],
       [ 0.70239547,  0.21571521,  0.47272245]])

In [91]:
# Select the rows where the first element is not positive:
data[~(data[:, 0] > 0), :]

array([[-0.67780179,  0.32454297,  1.17200945],
       [-0.43900153,  0.42115866,  0.46682241],
       [-2.21870546,  0.61607108, -1.21852446],
       [-0.2165149 , -1.05522391,  1.79847349],
       [-0.78535649,  2.51108983, -1.02716463],
       [-1.06364468,  3.9569818 ,  0.98471536]])

Boolean indexing can also be used to assign values to the elements of the array that satisfy a condition:

In [92]:
# Assign 0 to all negative elements
data[data < 0] = 0
data

array([[1.04641955, 0.        , 0.40657132],
       [0.26352206, 0.        , 0.6182393 ],
       [0.        , 0.32454297, 1.17200945],
       [0.        , 0.42115866, 0.46682241],
       [0.23027901, 1.40398617, 0.        ],
       [0.        , 0.61607108, 0.        ],
       [0.        , 0.        , 1.79847349],
       [0.        , 2.51108983, 0.        ],
       [0.        , 3.9569818 , 0.98471536],
       [0.70239547, 0.21571521, 0.47272245]])

### Fancy indexing

It is also possible to index using integer arrays in a similar way to slicing. These integers indicate the column or row indexes to select. This type of indexing is useful, for instance, to select rows and columns in an order different to the original one.

In [93]:
# Create a matrix and initialize it:
nrows = 5
ncols = 3
p = 1
data = np.zeros((nrows, ncols))
for i in range(ncols):
    data[:, i] = np.arange(nrows)*p
    p*=10
data

array([[  0.,   0.,   0.],
       [  1.,  10., 100.],
       [  2.,  20., 200.],
       [  3.,  30., 300.],
       [  4.,  40., 400.]])

In [94]:
# Select rows 3 and 1 in this order:
data[[3, 1]]

array([[  3.,  30., 300.],
       [  1.,  10., 100.]])

In [95]:
# Select column 2 and 0 in this order:
data[:, [2, 0]]

array([[  0.,   0.],
       [100.,   1.],
       [200.,   2.],
       [300.,   3.],
       [400.,   4.]])

## Ejercicios

In [None]:
# Exercise 1.
# Suppose we execute this code:
x = np.array([1, 2, 3, 4, 5])
y = x[:3]
print(id(y))
y = 100*y
#y[:] = 100*y # <-- Check the difference with this
print(id(y))
# What happens when printing x?
print(x)
# and printing y?
print(y)
# Why?

In [101]:
x = np.array([1, 2, 3, 4, 5])
y = x[:3]
print(y)
y = 100*y
print(y)
print(x)
y = x[:3]
y[:] = 100*y
print(y)
print(x)

[1 2 3]
[100 200 300]
[1 2 3 4 5]
[100 200 300]
[100 200 300   4   5]


In [45]:
# Exercise 2.
# The following array contains a list of 100 random grades from 0 to 10:
n = 100
np.random.seed(13)
notas = np.random.rand(n)*10
# Built an array that has a qualitative mark for each numeric grade as 
# cualitativa de acuerdo al siguiente esquema:
# [0, 5) --> SUSPENSO
# [5, 7) --> APROBADO
# [7, 9) --> NOTABLE
# [9, 10) --> SOBRESALIENTE

# Here your code
print(notas)

notas_cual = np.array(notas, dtype=str)
notas_cual[notas < 5] = "SUSPENSO"
notas_cual[(notas >= 5) & (notas < 7) ] = "APROBADO"
notas_cual[(notas >= 7) & (notas < 9) ] = "NOTABLE"
notas_cual[notas >= 9] = "SOBRESALIENTE"
        

print(notas_cual)

[7.77702411e+00 2.37541220e+00 8.24278533e+00 9.65749198e+00
 9.72601114e+00 4.53449247e+00 6.09042463e+00 7.75526515e+00
 6.41613345e+00 7.22018230e+00 3.50365241e-01 2.98449471e+00
 5.85124919e-01 8.57060943e+00 3.72854028e+00 6.79847952e+00
 2.56279949e+00 3.47581215e+00 9.41277008e-02 3.58333783e+00
 9.49094182e+00 2.17899009e+00 3.19391366e+00 9.17772386e+00
 3.19036664e-01 6.50845370e-01 6.29828999e+00 8.73813443e+00
 8.71573230e-02 7.46577237e+00 8.12841171e+00 7.57174462e-01
 6.56455335e+00 5.09262200e+00 4.79883391e+00 9.55574145e+00
 1.20335695e-04 2.46978701e+00 7.12232678e+00 3.24582050e+00
 2.76996356e+00 6.95445453e+00 9.18551748e+00 2.44475702e+00
 4.58085817e+00 2.52992683e+00 3.79333291e+00 6.04538829e+00
 7.72378760e+00 6.79174968e-01 6.86085079e+00 5.48260097e+00
 1.37986053e+00 9.87532192e-01 2.45559105e+00 1.51786663e+00
 9.25994479e+00 6.80105016e+00 2.37658922e+00 5.68885253e+00
 5.56632051e+00 7.27372109e-01 8.39708510e+00 4.05319493e+00
 1.44870989e+00 1.909200

In [48]:
# Exercise 3. 
# Now count the number of SUSPENSOS, 
# APROBADOS, NOTABLES and SOBRESALIENTES.
print(np.sum(notas_cual == "SUSPENSO"))
print(np.sum(notas_cual == "APROBADO"))
print(np.sum(notas_cual == "NOTABLE"))
print(np.sum(notas_cual == "SOBRESALIENTE"))

54
18
16
12


In [49]:
for value in np.unique(notas_cual):
    print("# de ", value, ":", np.sum(notas_cual == value))

# de  APROBADO : 18
# de  NOTABLE : 16
# de  SOBRESALIENTE : 12
# de  SUSPENSO : 54


## 3. Array operations

### Transpose

To transpose a 2D array you can use the method transpose() or the attribute T. 

In [50]:
# 4x4 matrix:
data = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
data

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

In [51]:
# Transpose using T:
data.T

array([[ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15],
       [ 4,  8, 12, 16]])

In [None]:
# Transpose using the method transpose()
data.transpose()


It is also possible to transpose multidimensional arrays by specifying the indexes of he axis that are to be swapped. More info in: <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.transpose.html">ndarray.transpose()</a> and <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.swapaxes.html">ndarray.swapaxes()</a>.

### Change the shape of an array (reshape y ravel)

- The method <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.reshape.html">ndarray.reshape()</a> changes the shape of an array without changing its content, just readjusting it. The total number of cells in both arrays should be equal. 
- The method <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ravel.html">ndarray.ravel()</a> flattens an array into a one dimension array.
- The method <a href="http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.concatenate.html">numpy.concatenate</a> concatenates two arrays along one axis, given that the size along the other dimension is the same.

Some examples:

In [52]:
# One dimension array with 1-12:
x = np.arange(1, 13) 
print(x)

[ 1  2  3  4  5  6  7  8  9 10 11 12]


In [53]:
# Change it to 3x4:
y = x.reshape(3, 4)
print(y)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [54]:
# Flatten the array again:
z = y.ravel()
print(z)

[ 1  2  3  4  5  6  7  8  9 10 11 12]


In [55]:
# Two arrays of 0's and 1's:
x = np.zeros((3, 4))
print(x)
y = np.ones((3, 2))
print(y)


[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [56]:
# Concatenate along axis 1 (columns):
np.concatenate((x, y), axis = 1)

array([[0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 1., 1.]])

In [57]:
# Concatenate by rows (axis 0) getting only the first two columns of x:
np.concatenate((x[:, :2], y), axis = 0)

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [58]:
print(data)
print(np.sum(data, axis = 0))
print(np.sum(data, axis = 1))


[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
[28 32 36 40]
[10 26 42 58]
