# Look Ma, No for Loops: Array Programming with NumPy

Python has a number of statistics libraries to use in data analyis:

1. statistics library is a built in for descriptive statistics. 
2. NumPy used for numerical computing, opotimized for working with single and multi-dimensional arrays. Used ndarrays. contains many [routines](https://numpy.org/doc/stable/reference/routines.statistics.html) for statistical analysis
3. SciPy used for scientific computing based on Numpy. Contains a scipy.stats for statistical analysis
4. pandas numerical computing based on NumPy. Uses Series for 1D data and DataFrame objects for 2D data
5. Matplotlib used for data visualization. Works well with NumPy, SciPy and pandas

The code below will focus on using NumPy for Array Programming. It follows this [tutorial](https://realpython.com/numpy-array-programming/) from realpython.com

## Intro to NumPy Arrays

There are three concepts that lend NumPy it's power:
1. Vectorization
2. Broadcasting
3. Indexing

## Getting into Shape: Intro to NumPy Arrays

The fundamental object of NumPy is the ndarray(numpy.array)

Below is code to generate a d 3-dimensional array with 36 elements

In [62]:
from statistics import mean
import numpy as np
#from timeit import timeit

# a 2 by 4 by 3 array
arr = np.arange(36).reshape(3, 4, 3) # a container with three 3x4grids
arr

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17],
        [18, 19, 20],
        [21, 22, 23]],

       [[24, 25, 26],
        [27, 28, 29],
        [30, 31, 32],
        [33, 34, 35]]])

In [63]:
# the shape 
arr.shape

(3, 4, 3)

## What is Vectorization

Vectorization is expressing operations as occurring on entire arrays rather than on the individual elements within the array.  Vectorization replaces explicit loops in code

### Counting: Easy as 1, 2, 3
Example: Suppose you have a 1-D vector of boolean values for which you want to count the number of False to True transitions. Using for loops you could write a function

In [64]:
# First get random data
np.random.seed(444)
x = np.random.choice([False, True], size=100000)


In [81]:
def count_transitions(x):
    counter = 0
    
    for i in range(0, len(x)-1, 1):
        #print(f'i={i} and i+1={i+1}')
        if x[i] > x[i+1]:
            counter += 1
    return counter

c = count_transitions(x)
c

24984

In vectorized form this could be written as shown in the next cell.

In [82]:
np.count_nonzero(x[:-1] < x[1:])

24984

When compared to the count_transitions function the vectorized version is much quicker. Note the %timeit is known as python magic function. the -o notation allows the result of the magic function to be stored in a variable

In [83]:
t1 = %timeit -o count_transitions(x)

56.7 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [84]:
t2 = %timeit -o np.count_nonzero(x[:-1] < x[1:])

6.62 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [93]:
#t1 best time is over 8800 times slower than the best 
t1.best/t2.best

0.05652902000001632

In [97]:
0.1*mean(t1.all_runs)/0.00001*mean(t2.all_runs)

3756.2366319916046