<a href="https://colab.research.google.com/github/cutecat0/ArtsofData/blob/master/data_science/Python_Data_Science_Handbook_Jake_VanderPlas/06_Computation_on_NumPy_Arrays.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Computation on NumPy Arrays: Universal Functions

! Why NumPy is so important in the Python Data Science World!!

Namely, it provides an easy and flexible interface to optimized computation with arrays of data.

Computation on NumPy arrays can be very fast, or it can be very slow.

The key to making it fast is to use
`vectorized` operations, generally implemented through NumPy's `universal functions` (ufuncs).

# The Slowness of Loops

Python's default implementation (known as CPython) dose some operations very slowly.

This is part due to the `dynamic`, `interpreted` nature of the language:
the fact that types are `flexible`, so that sequeences of operations `cannot be compiled down` to `efficient machine code` as in language `like C` and Fortran.

Various attempts to to address this weakness:
1. PyPy. project: a just-in-time compiled implementation of Python; http://pypy.org/
2. Cython project, which converts Python code to compilable C code; http://cython.org/
3. Numba project: which converts snippets of Python code to fast LLVM bytecode. http://numba.pydata.org/




In [3]:
import numpy as np
np.random.seed(0)


def compute_reciprocals(values):
  output = np.empty(len(values))
  for i in range(len(values)):
    output[i] = 1.0 / values[i]
  return output

values = np.random.randint(1, 10, size=5)
print(values)
compute_reciprocals(values)

[6 1 4 4 8]


array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [4]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

3.1 s ± 40.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The above code runs so slow reason:
`type-checking` and `funciton dispatches` that Cpython must do at each cycle of the loop.

Each time the reciprocals ir`s comouted, Python first examines the object's type and does a dynamic lookup of the correct function to use for that type.

If we were working in compiled code instead, this type specification would be known before the code executes and the result could be computed much more efficiently.

# Introducing UFuncs

`Vectorized` operation:
Pushing the loop into the compiled layer that underlines NumPy, leading to much faster execution.

In [2]:
import numpy as np
np.random.seed(0)


def compute_reciprocals(values):
  output = np.empty(len(values))
  for i in range(len(values)):
    output[i] = 1.0 / values[i]
  return output

values = np.random.randint(1, 10, size=5)

print(compute_reciprocals(values))
print(1.0 / values)

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


In [4]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit (1.0 / big_array)

1.08 ms ± 65.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays. Ufuncs are extremly flexible-before we saw an operation between a scalar and an array, but we can also operate between two arrays:


In [2]:
import numpy as np
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

And `ufuncs` operations are not limited to one-dimensional arrays-they can also act on multi-dimensional arrays as well:


In [3]:
import numpy as np
x = np.arange(9).reshape((3, 3))
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

Computations using vectoraization through `ufuncs` are nearly always more efficient than their counterpart implemented using Python loops, especially as the arrays grow in size.
Any time you see such a loop in a Python script, you should consider whether it can be replaced with a vectorized expression.

# Exploring NumPy's UFuncs
Ufuncs exist in 2 flavors:
1. `unary ufuncs`: which operate on a single input
2. `binary ufuncs`: which operate on two inputs


## Array arithemetic


In [6]:
import numpy as np

x = np.arange(4)
print("x      =", x)
print("x + 5  =", x + 5)
print("x - 5  =", x - 5)
print("x * 2  =", x * 2)
print("x / 2  =", x / 2)
print("x // 2 =", x // 2)
print("-x     =", -x)
print("x ** 2 =", x ** 2)
print("x % 2  =", x % 2)
-(0.5*x + 1) ** 2

x      = [0 1 2 3]
x + 5  = [5 6 7 8]
x - 5  = [-5 -4 -3 -2]
x * 2  = [0 2 4 6]
x / 2  = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
-x     = [ 0 -1 -2 -3]
x ** 2 = [0 1 4 9]
x % 2  = [0 1 0 1]


array([-1.  , -2.25, -4.  , -6.25])

Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy;
for example, the `+` operator is a wrapper for the `add` function:

In [7]:
import numpy as np
x = np.arange(4)
np.add(x, 2)

array([2, 3, 4, 5])

<b>Operator	Equivalent ufunc	Description</b>

`+`	`np.add        `	     `Addition (e.g., 1 + 1 = 2)`

`-`	`np.subtract`	`Subtraction (e.g., 3 - 2 = 1)`

`-`	`np.negative`	`Unary negation (e.g., -2)`

`*`	`np.multiply`	`Multiplication (e.g., 2 * 3 = 6)`

`/`	`np.divide`	`Division (e.g., 3 / 2 = 1.5)`

`//`	`np.floor_divide`	`Floor division (e.g., 3 // 2 = 1)`

`**`	`np.power`	`Exponentiation (e.g., 2 ** 3 = 8)`

`%`	`np.mod`	`Modulus/remainder (e.g., 9 % 4 = 1)`


## Absolute value

In [8]:
import numpy as np
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

The corresponding NumPy ufunc is `np.absolute`,
which is also avaliable under the alias `np.abs`:

In [9]:
np.absolute(x)

array([2, 1, 0, 1, 2])

In [10]:
np.abs(x)

array([2, 1, 0, 1, 2])

This ufunc can also handle complex data, in which the absolute value returns the magnitude:

In [11]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

array([5., 5., 2., 1.])