# Data Processing in Numpy array
Most operations in __numpy__ array will replace explicit loops with array expression called _vectorization_. In general, vectorized array operations will often be one or two orders of magnitude faster than their pure Python equivalents. 

## Expressing Conditional Logic
The `numpy.where` function is a vectorized version of the ternary expression `x if condition else y`. 
Let's see an example with three arrays, the result will choose one of two values from two different arrays based on the value of the condition array. First we see how we implement it in pure Python.

In [2]:
import numpy as np
xarr = np.array([1, -2, 9, 10, 11])
yarr = np.array([-90, 80, -8, -3, -4])
cond = np.array([True, False, False, True, False])

In [4]:
%timeit result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]

The slowest run took 5.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.81 µs per loop


In [7]:
%timeit  np.where(cond, xarr, yarr)

The slowest run took 42.26 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 748 ns per loop


You can see the speed of __`numpy.where`__ compared with the pure Python list comprehension. And the last two arguments don't need to be arrays, can be scalars.

## Mathematical and statistical methods
Like Python, __Numpy__ provides a set of mathematical functions that computes about an entire array or the data along an axis.

In [8]:
arr = np.random.randn(5,4)
arr.mean()

-0.012223132836414241

In [10]:
np.mean(arr)

-0.012223132836414241

In [11]:
arr.mean(axis=1)

array([ 0.5446143 , -0.79508737,  0.50499445, -0.07389175, -0.24174529])

Some methods like __`cumsum`, `cumprod`__ do not aggregate and generate intermediate results instead.

In [12]:
arr.cumsum()

array([ 0.75577814, -0.16372114,  0.20294637,  2.17845721,  1.01606677,
        1.48079512,  0.56117763, -1.00189228, -1.09542866, -0.43998607,
        0.01206096,  1.01808554,  1.09788517,  0.63274708,  1.00497615,
        0.72251852,  1.07612614,  0.83398248,  0.3856547 , -0.24446266])

## Methods for Boolean Arrays
Since Boolean values are cast to 1 (True) or 0 (False), thus __`sum`__ functions can be used to count __True__ values in the array. And __`Any`, `All`__ functions are very useful with Boolean Arrays.

In [14]:
arr = np.random.randn(100)
(arr < 0).sum()

44

In [15]:
test = np.array([True, False, True, True, False, False])

In [16]:
test.any()

True

In [17]:
test.all()

False

## Unique and Set logic
__`np.unique`__ returns the sorted unique values in array. 

In [18]:
example = np.array([1, 0, 3, -4, 3, 2])
np.unique(example)

array([-4,  0,  1,  2,  3])

Another function __`np.in1d`__ tests membership of the values in one array in another and returns a boolean array.

In [19]:
np.in1d(example, np.arange(4))

array([ True,  True,  True, False,  True,  True], dtype=bool)

## Linear Algebra
In numpy, all the linear algebra related function can be found in __`numpy.linalg`__.

In [21]:
from numpy.linalg import inv, qr
X = np.random.randn(5, 5)

In [22]:
inv(X)

array([[ 2.26583136, -2.22816814, -0.25767718,  0.80850701,  0.06415524],
       [ 0.61295148, -1.00039421, -0.4641682 ,  0.71008939, -0.35260108],
       [ 0.35574107, -0.22210267,  0.90610954, -0.36452645, -0.62535105],
       [ 0.78869446, -0.74566777,  0.18086962,  0.53819744,  0.06918841],
       [ 0.08194525, -0.3946557 ,  0.25398049, -0.04132269, -0.05536487]])

In [23]:
q, r = qr(X)

In [24]:
q

array([[-0.49071048, -0.10990025, -0.48409234, -0.69556029,  0.17022239],
       [ 0.10239031, -0.26527369, -0.48542508,  0.10689343, -0.81980631],
       [ 0.40236944, -0.53665513, -0.45807092,  0.24884571,  0.52758596],
       [ 0.74735945,  0.07540896,  0.11969924, -0.64348365, -0.08583836],
       [-0.16814319, -0.78985224,  0.55304631, -0.16962986, -0.11500776]])