# Advanced ufunc features
-------------
* ``info`` -- documentation for a specific ufunc 


In [None]:
import numpy as np

In [None]:
np.info(np.sin)

### 1. Specifying ufunc output
_______________________

* For **all** ufuncs, the ``out`` argument of  ufunc can be used to write computation results directly to the memory location :

In [None]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

* This can even be used with **array views**. 

For example, we can write the results of a computation to every other element of a specified array:

In [None]:
y = np.zeros(10)
np.power(2, x, out=y[::2])
print(y)

*  ``y[::2] = 2 ** x`` means  creation of a temporary array to hold the results of ``2 ** x``, followed by a second operation copying those values into the ``y`` array.

### 2. Aggregates

____________________________________

* Reducing an array with a **particular** operation using  ``reduce`` method of **any** ufunc :

    * ``reduce`` repeatedly applies a given operation to the elements of an array until **only a single result** remains.

For example, calling ``reduce`` on the ``add`` ufunc returns the sum of all elements in the array:

In [None]:
np.add. #Tab

In [None]:
x = np.arange(1, 6)
np.add.reduce(x)

In [None]:
help(np.add.reduce)

* Similarly, calling ``reduce`` on the ``multiply`` ufunc results in the product of all array elements:

In [None]:
np.multiply.reduce(x)

* If we'd like to store all the intermediate results of the computation, we can instead use ``accumulate``:

In [None]:
np.add.accumulate(x)

In [None]:
help (np.add.accumulate)

* Using NumPy builtins for aggregation

In [None]:
np.sum(x)

In [None]:
np.cumsum(x)

In [None]:
x.sum() # the best !!!

In [None]:
np.multiply.accumulate(x)

### 3. Class ``ufunc``

__________________________________
  Functions that operate element by element on whole arrays

In [None]:
help(np.ufunc)

### 4. Examples of ``ufunc``
_________________________
#### 4.1. Outer products
____________________

``ufunc`` can compute the output of all pairs of two different inputs using the ``outer`` method:

In [None]:
x = np.arange(1, 6)
np.multiply.outer(x, x)

#### 4.2. Summing values 
-------------

* Python itself can do this using the built-in ``sum`` :

In [None]:
x = np.random.random(100)
type(x)

In [None]:
sum(x)

* NumPy's ``sum`` has quite similar syntax, and the result is the same:

In [None]:
np.sum(x)

* However, because it executes the operation in compiled code, NumPy's version of the operation is computed much more quickly:

In [None]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

* ``sum`` function and the ``np.sum`` function are not identical
*  In particular, their optional arguments have different meanings, and ``np.sum`` is aware of multiple array dimensions

#### 4.3. Minimum and Maximum
--------------

* Python has built-in ``min`` and ``max`` functions, used to find the minimum value and maximum value of any given array:

In [None]:
min(big_array), max(big_array)

* NumPy's corresponding functions have similar syntax but operate much more quickly:

In [None]:
np.min(big_array), np.max(big_array)

In [None]:
%timeit min(big_array)
%timeit np.min(big_array)

* For ``min``, ``max``, ``sum``, and several other NumPy aggregates, a **shorter syntax** is to use methods of the array object itself:

In [None]:
print(big_array.min(), big_array.max(), big_array.sum())

Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!

#### 4.4. Multi dimensional aggregates
----------------

In [None]:
x = np.random.random((3, 4))
print(x)

* By default, each NumPy aggregation function will return the aggregate over the entire array:

In [None]:
x.sum()

* Aggregation functions take an additional argument specifying  ``axis`` along which the aggregate is computed. 

The ``axis`` keyword specifies the dimension of the array that will be collapsed.
So specifying ``axis=0`` means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated.

For example, we can find the minimum value within each column by specifying ``axis=0``:

In [None]:
x.min(axis=0)

The function returns four values, corresponding to the four columns of numbers.

Similarly, we can find the maximum value within each row:

In [None]:
x.max(axis=1)

#### 4.5. Other aggregation functions
-----------------

* Most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value.


|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |



#### 4.6.  Example: What is the Average Height of US Presidents?
----------------

Aggregates available in NumPy can be extremely useful for summarizing a set of values.
As a simple example, let's consider the heights of all US presidents.
This data is available in the file *president_heights.csv*.

In [None]:
import csv

In [None]:
data=[]
with open('president_heights.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    for line in csv_reader:
        data.append(float(line['height(cm)']))

In [None]:
heights = np.array(data)
print(heights)

Now that we have this data array, we can compute a variety of summary statistics:

In [None]:
print("Mean height:       ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height:    ", heights.min())
print("Maximum height:    ", heights.max())

Note that in each case, the aggregation operation reduced the entire array to a single summarizing value, which gives us information about the distribution of values.
We may also wish to compute quantiles:

In [None]:
print("25th percentile:   ", np.percentile(heights, 25))
print("Median:            ", np.median(heights))
print("75th percentile:   ", np.percentile(heights, 75))

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # set plot style

In [None]:
plt.hist(heights)
plt.title('Height Distribution of US Presidents')
plt.xlabel('height (cm)')
plt.ylabel('number');

These aggregates are some of the fundamental pieces of exploratory data analysis.