# Universal Functions for Computation over Arrays

The last notebook dealt with creating and modifying numPy arrays. However, till now, these arrays are simply fancier data types. What makes numPy so important? 

Universal functions allow computation over numPy arrays to be *vectorized*. In linear algebra terms, vectorization is a linear transformation that converts matrices into column vectors. This allows computations to be done over the entire matrix or vector in a single step as opposed to each element. 

## Where Python fails and NumPy takes over

Pythons default operations do some operations slowly, maily loops. This is due to the fact that the loops iterate over every single element in an array or list.

## Introduction to UFuncs

NumPy provides vectorized operations. This approach is designed to push the loop into the compiled layer that underlies NumPy.

### Array arithmetic

NumPy ufuncs are very similar to python's native operators. 

In [2]:
import numpy as np

In [3]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [ 0.   0.5  1.   1.5]
x // 2 = [0 0 1 1]


### Absolute Value 
Vanilla python version

In [4]:
x = np.array([-2, -7, 5, -8])
abs(x)

array([2, 7, 5, 8])

NumPy version. Corresponding ufunc is np.absolute

In [5]:
np.absolute(x)

array([2, 7, 5, 8])

### Trigonometric functions

Some of the most useful functions for data scientists are the trig functions.

If we define an array of angles:

In [8]:
theta = np.linspace(0, np.pi, 4)
theta

array([ 0.        ,  1.04719755,  2.0943951 ,  3.14159265])

We can compute some trigonometric functions over these values. Remember
this operation is vectorized.

In [12]:
print("sin(theta): ", np.sin(theta))
print("cos(theta): ", np.cos(theta))
print("tan(theta): ", np.tan(theta))


sin(theta):  [  0.00000000e+00   8.66025404e-01   8.66025404e-01   1.22464680e-16]
cos(theta):  [ 1.   0.5 -0.5 -1. ]
tan(theta):  [  0.00000000e+00   1.73205081e+00  -1.73205081e+00  -1.22464680e-16]


### Exponential and Logarithmic functions
These are crucial in mathematics for growth, decay problems.

In [13]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x     = [1, 2, 3]
e^x   = [  2.71828183   7.3890561   20.08553692]
2^x   = [ 2.  4.  8.]
3^x   = [ 3  9 27]


In [14]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x        = [1, 2, 4, 10]
ln(x)    = [ 0.          0.69314718  1.38629436  2.30258509]
log2(x)  = [ 0.          1.          2.          3.32192809]
log10(x) = [ 0.          0.30103     0.60205999  1.        ]


### Aggregates and Accumulation

The reduce method applies to any ufunc and applies a given operation to each element until a single element remains.

In [15]:
x = np.arange(1,10)
np.add.reduce(x)

45

In [16]:
np.multiply.reduce(x)

362880

The accumulate method has the same idea but keeps the array intact while applying the operator from one element to another.

In [19]:
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15, 21, 28, 36, 45])

In [20]:
np.multiply.accumulate(x)

array([     1,      2,      6,     24,    120,    720,   5040,  40320,
       362880])

# Aggregations: Min, Max, and Everything In Between

The ultimate goal of Data Science is to extract meaningful information from data. When faced with a large amount of data the first step is to compute 'summary statistics'. Examples of summary statistics are
  1. Mean
  2. Std deviation
  3. Min/max
  4. Median
  5. Sum/Product
  6. Quantiles

As we covered in the previous section, numPy has fast built-in aggregation functions for working on arrays big and small.


In [43]:
L = np.random.random(100)
sum(L) #using pythons built-in sum function

44.982527104347589

In [44]:
np.sum(L) #using numPy's sum function

44.982527104347596

## More aggregate functions
The following table provides useful aggregation functions in numPy

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |
