Estudo do artido [Look Ma, No For-Loops: Array Programming With NumPy](https://realpython.com/numpy-array-programming/)

# NumPy Arrays

The fundamental object of NumPy is its ndarray (or numpy.array), an n-dimensional array

In [1]:
import numpy as np

In [2]:
arr = np.arange(36).reshape(3, 4, 3)
arr

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17],
        [18, 19, 20],
        [21, 22, 23]],

       [[24, 25, 26],
        [27, 28, 29],
        [30, 31, 32],
        [33, 34, 35]]])

One intuitive way to think about an array’s shape is to simply “read it from left to right.” arr is a 3 by 4 by 3 array:

In [3]:
arr.shape

(3, 4, 3)

![](imgs/1.png)

# What is Vectorization?

Vectorization is a powerful ability within NumPy to express operations as occurring on entire arrays rather than their individual elements.

When looping over an array or any data structure in Python, there’s a lot of overhead involved. Vectorized operations in NumPy delegate the looping internally to highly optimized C and Fortran functions, making for cleaner and faster Python code.

In [4]:
np.random.seed(444)

In [6]:
x = np.random.choice([False, True], size=100000)
x

array([ True,  True,  True, ..., False, False, False])

count the number of “False to True” transitions in the sequence. 

In [8]:
import time
# usando laço for
def count_transitions(x) -> int:
    count = 0
    for i, j in zip(x[:-1], x[1:]):
        if j and not i:
            count += 1
    return count

start = time.time()
count = count_transitions(x)
end = time.time()
print(count)
print(end - start)

25025
0.016856670379638672


In [14]:
# Usando vetorização com np

start = time.time()
count = np.count_nonzero(x[:-1] < x[1:])
end = time.time()
print(count)
print(end - start)

25025
0.0004584789276123047


In [15]:
from timeit import timeit

setup = 'from __main__ import count_transitions, x; import numpy as np'
num = 1000
t1 = timeit('count_transitions(x)', setup=setup, number=num)
t2 = timeit('np.count_nonzero(x[:-1] < x[1:])', setup=setup, number=num)
print('Speed difference: {:0.1f}x'.format(t1 / t2))

Speed difference: 54.2x


# Understanding Axes Notation

In NumPy, an axis refers to a single dimension of a multidimensional array:

In [2]:
arr = np.array([[1, 2, 3],
                [10, 20, 30]])

arr.sum(axis=0)

array([11, 22, 33])

In [3]:
arr.sum(axis=1)

array([ 6, 60])

In the documentation for Pandas (a library built on top of NumPy), you may frequently see something like:

``` axis : {'index' (0), 'columns' (1)} ```

In [5]:
# the number of axes (dimensions) of the array.
arr.ndim

2

In [6]:
# the dimensions of the array.
# a matrix with n rows and m columns, shape will be (n,m).
arr.shape

(2, 3)

In [7]:
# the total number of elements of the array.
arr.size

6