<div class="alert alert-block alert-info">
<b>

# Python for Data Science Bootcamp
## Lecture 10 - Part 1
    
## Textbook reference: Python Data Science Handbook 
## Chapter 2

Here are the topics for this lecture:

* Computation on Arrays - Ufuncs, Aggregates, Broadcasting

Let's get started...
</b> 
</div>

<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/PDSH-cover-small.png">

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

<!--NAVIGATION-->
< [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb) | [Contents](Index.ipynb) | [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb) >

<a href="https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.03-Computation-on-arrays-ufuncs.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>


# Computation on NumPy Arrays: Universal Functions

<div class="alert alert-block alert-info">
<b>
In the next few sections, we will dive into the reasons that NumPy is so important in the Python data science world. Namely, it provides an easy and flexible interface to optimized computation with arrays of data.

Computation on NumPy arrays can be very fast, or it can be very slow.

The key to making it fast is to use *vectorized* operations, generally implemented through NumPy's *universal functions* (ufuncs).

</b> 
</div>

<div class="alert alert-block alert-info">
<b>


## Introducing UFuncs

For many types of operations, NumPy provides a convenient interface into statically typed, compiled routines. This is known as a *vectorized* operations.
This can be accomplished by simply performing an operation on the array, which will then be applied to each element.
This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

</b> 
</div>

<div class="alert alert-block alert-info">
<b>

Vectorized operations in NumPy are implemented via *ufuncs*, whose main purpose is to quickly execute repeated operations on values in NumPy arrays.

</b> 
</div>

In [1]:
import numpy as np
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

<div class="alert alert-block alert-info">
<b>

And ufunc operations are not limited to one-dimensional arrays–they can also act on multi-dimensional arrays as well:

</b> 
</div


In [3]:
x = np.arange(9).reshape((3, 3)) # 9 elements arranged in 3 x 3 matrix
2 ** x # 2 to the x power

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

<div class="alert alert-block alert-info">
<b>

NOTE: Computations using vectorization through **ufuncs are nearly always more efficient than their counterpart implemented using Python loops**, especially as the arrays grow in size.
Any time you see such a loop in a Python script, you should consider whether it can be replaced with a vectorized expression.

</b> 
</div>

<div class="alert alert-block alert-info">
<b>
    
## Exploring NumPy's UFuncs

Ufuncs exist in two flavors: **unary ufuncs**, which operate on a single input, and **binary ufuncs**, which operate on two inputs.
We'll see examples of both these types of functions here.

</b> 
</div>

<div class="alert alert-block alert-info">
<b>

### Array arithmetic

NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic operators.
The standard addition, subtraction, multiplication, and division can all be used:

</b> 
</div>

In [4]:
x = np.arange(4) # Create array of four elements
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


There is also a unary ufunc for negation, and a ``**`` operator for exponentiation, and a ``%`` operator for modulus:

In [4]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


<div class="alert alert-block alert-info">
<b>

In addition, these can be strung together however you wish, and the **standard order of operations is respected**:

</b> 
</div>    

In [5]:
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

Each of these arithmetic operations are simply **convenient wrappers** around specific functions built into NumPy; for example, the ``+`` operator is a wrapper for the ``add`` function:

In [5]:
print(x)
np.add(x, 2) # Add two to each element of x

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10]])

The following table lists the arithmetic operators implemented in NumPy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

Additionally there are Boolean/bitwise operators; we will explore these in [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb).

<div class="alert alert-block alert-info">
<b>

### Absolute value

Just as NumPy understands Python's built-in arithmetic operators, it also understands Python's built-in absolute value function:

</b> 
</div>

In [7]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

The corresponding NumPy ufunc is ``np.absolute``, which is also available under the alias ``np.abs``:

In [14]:
np.absolute(x)

array([2, 1, 0, 1, 2])

In [8]:
np.abs(x)

array([2, 1, 0, 1, 2])

This ufunc can also handle complex data, in which the absolute value returns the magnitude:

In [14]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

array([ 5.,  5.,  2.,  1.])

<div class="alert alert-block alert-info">
<b>

NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions.
We'll start by defining an array of angles:

</b> 
</div>

In [9]:
import numpy as np
theta = np.linspace(0, np.pi, 3) # Create array of 3 angles between 0 and pi

Now we can compute some trigonometric functions on these values:

In [10]:
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

theta      =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]


The values are computed to within machine precision, which is why values that should be zero do not always hit exactly zero.
Inverse trigonometric functions are also available:

In [9]:
x = [-1, 0, 1]
print("x         = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

x         =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]


<div class="alert alert-block alert-info">
<b>

### Exponents and logarithms

Another common type of operation available in a NumPy ufunc are the exponentials:

</b> 
</div>

In [10]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x)) # e to the power of x
print("2^x   =", np.exp2(x)) # 2 to the power of x
print("3^x   =", np.power(3, x)) # 3 to the power of x

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]


The inverse of the exponentials, the logarithms, are also available.
The basic ``np.log`` gives the natural logarithm; if you prefer to compute the base-2 logarithm or the base-10 logarithm, these are available as well:

In [18]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x)) # log base e
print("log2(x)  =", np.log2(x)) # log base 2
print("log10(x) =", np.log10(x)) # log base 10

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


There are also some specialized versions that are useful for maintaining precision with very small input:

In [13]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x)) # exp x minus 1
print("log(1 + x) =", np.log1p(x)) # log 1 plus x

exp(x) - 1 = [0.         0.0010005  0.01005017 0.10517092]
log(1 + x) = [0.         0.0009995  0.00995033 0.09531018]


In [15]:
np.log1p(x)

array([0.        , 0.0009995 , 0.00995033, 0.09531018])

When **``x`` is very small**, these functions give more precise values than if the raw ``np.log`` or ``np.exp`` were to be used.

### Specialized ufuncs

NumPy has many more ufuncs available, including hyperbolic trig functions, bitwise arithmetic, comparison operators, conversions from radians to degrees, rounding and remainders, and much more.
A look through the NumPy documentation reveals a lot of interesting functionality.

Another excellent source for more specialized and obscure ufuncs is the submodule ``scipy.special``.
If you want to compute some **obscure mathematical function** on your data, chances are it is implemented in ``scipy.special``.
There are far too many functions to list them all, but the following snippet shows a couple that might come up in a statistics context:

In [19]:
from scipy import special # Import special library from scipy

In [17]:
# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x)     =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2)   =", special.beta(x, 2))

gamma(x)     = [1.0000e+00 2.4000e+01 3.6288e+05]
ln|gamma(x)| = [ 0.          3.17805383 12.80182748]
beta(x, 2)   = [0.5        0.03333333 0.00909091]


In [18]:
# Error function (integral of Gaussian)
# its complement, and its inverse
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x)  =", special.erf(x))
print("erfc(x) =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))

erf(x)  = [0.         0.32862676 0.67780119 0.84270079]
erfc(x) = [1.         0.67137324 0.32219881 0.15729921]
erfinv(x) = [0.         0.27246271 0.73286908        inf]


There are many, many more ufuncs available in both NumPy and ``scipy.special``.
Because the documentation of these packages is available online, a web search along the lines of "gamma function python" will generally find the relevant information.

## Advanced Ufunc Features

Many NumPy users make use of ufuncs without ever learning their full set of features.
We'll outline a few specialized features of ufuncs here.

<div class="alert alert-block alert-info">
<b>

### Aggregates

For binary ufuncs, there are some **interesting aggregates** that can be computed directly from the object.
For example, if we'd like to *reduce* an array with a particular operation, we can use the ``reduce`` method of any ufunc.
A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.

For example, calling ``reduce`` on the ``add`` ufunc returns the sum of all elements in the array:

</b> 
</div>


In [11]:
x = np.arange(1, 6) # Create array with 1 row and 6 columns
print(x)
np.add.reduce(x) # Add numbers and reduce vector

[1 2 3 4 5]


15

In [12]:
x = np.arange(1, 6)
x

array([1, 2, 3, 4, 5])

Similarly, calling ``reduce`` on the ``multiply`` ufunc results in the product of all array elements:

In [13]:
np.multiply.reduce(x) # Multiply elements of x and then reduce
# First element times second, result times third, etc....
x=np.arange(1,5)
print(x)
np.multiply.reduce(x) # like 4 factorial

[1 2 3 4]


24

If we'd like to store all the intermediate results of the computation, we can instead use ``accumulate``:

e.g. add.accumulate captures sequence of element additions

In [26]:
print(x)
np.add.accumulate(x) # Stores intermediate results, no reduction

[1 2 3 4 5]


array([ 1,  3,  6, 10, 15])

In [30]:
# multiply.accumulate captures sequence of element multiplication
print(x)
np.multiply.accumulate(x) # No reduction

[1 2 3 4]


array([ 1,  2,  6, 24])

Note that for these particular cases, there are dedicated NumPy functions to compute the results (``np.sum``, ``np.prod``, ``np.cumsum``, ``np.cumprod``), which we'll explore in [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb).

The ``ufunc.at`` and ``ufunc.reduceat`` methods, which we'll explore in [Fancy Indexing](02.07-Fancy-Indexing.ipynb), are very helpful as well.

Another extremely useful feature of ufuncs is the ability to operate between arrays of different sizes and shapes, a set of operations known as *broadcasting*.
This subject is important enough that we will devote a whole section to it (see [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb)).

## Ufuncs: Learning More

More information on universal functions (including the full list of available functions) can be found on the [NumPy](http://www.numpy.org) and [SciPy](http://www.scipy.org) documentation websites.

Recall that you can also access information directly from within IPython by importing the packages and using IPython's tab-completion and help (``?``) functionality, as described in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb).

<!--NAVIGATION-->
< [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb) | [Contents](Index.ipynb) | [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb) >

<a href="https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.03-Computation-on-arrays-ufuncs.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>
