##### Note: DataCamp slides are saved to computer! Notes written on slides.

# Chapter 2 - Intro to NumPy

## TOC:
- [Understanding Data Types in Python](#Understanding-Data-Types-in-Python)
    - [Integers](#Integers)
    - [Lists](#Lists)
        - [Fixed-Type Arrays in Python](#Fixed-Type-Arrays-in-Python)
        - [Creating Arrays from Python Lists](#Creating-Arrays-from-Python-Lists)
        - [Creating Arrays from Scratch](#Creating-Arrays-from-Scratch)
    - [NumPy Standard Data Types](#NumPy-Standard-Data-Types)
- [The Basics of NumPy Arrays](#The-Basics-of-NumPy-Arrays)
    - [NumPy Array Basics](#NumPy-Array-Basics)
    - [Array Indexing: Accessing Single Elements](#Array-Indexing:-Accessing-Single-Elements)
    - [Array Slicing: Accessing Subarrays](#Array-Slicing:-Accessing-Subarrays)
        - [One-Dimensional Subarrays](#One-Dimensional-Subarrays)
        - [Multidimensional Subarrays](#Multidimensional-Subarrays)
            - [Acessing Array Rows and Columns](#Accessing-Array-Rows-and-Columns)
        - [Sub-Arrays as No-Copy Views](#Subarrays-as-No-Copy-Views)
        - [Creating Copies of Arrays](#Creating-Copies-of-Arrays)
    - [Reshaping of Arrays](#Reshaping-of-Arrays)
    - [Array Concatenation and Splitting](#Array-Concatenation-&-Spliting)
        - [Concatenation of Arrays](#Concatenation-of-Arrays)
        - [Splitting of Arrays](#Splitting-of-Arrays)
- [Computation on NumPy Arrays: Universal Functions](#Computation-on-NumPy-Arrays:-Universal-Functions)
    - [The Slowness of Loops](#The-Slowness-of-Loops)
    - [Introducing UFuncs](#Introducing-UFuncs)
    - [Exploring NumPy's Ufuncs](#Exploring-NumPy's-Ufuncs)
        - [Array Arithmetic](#Array-Arithmetic)
        - [Absolute Value](#Absolute-Value)
        - [Trigonometric Functions](#Trigonometric-Functions)
        - [Exponents and Logarithms](#Exponents-and-Logarithms)
        - [Specializeds Ufuncs](#Specialized-Ufuncs)
    - [Advanced Ufunc Features](#Advanced-Ufunc-Features)
        - [Specifying Output](#Specifying-Output)
        - [Aggregates](#Aggregates)
        - [Outer Products](#Outer-Products)
    - [Ufuncs: Learning More](#Ufuncs:-Learning-More)
- [Aggregations: Min, Max, and Everything in Between](#Aggregations:-Min,-Max,-and-Everything-in-Between)
    - [Summing the Values in an Array](#Summing-the-Values-in-an-Array)
    - [Minimum and Maximum](#Minimum-and-Maximum)
    - [Multidimensional Aggregates](#Multidimensional-Aggregates)
    - [Other Aggregation Functions](#Other-Aggregation-Functions)

---
- NumPy = Numerical Python
    - provides interface to store & efficiently operate w/(huge amounts of) data
    
[NumPy website](http://www.numpy.org)

In [1]:
import numpy as np        # import numpy
np.__version__        # check verison of numpy
#np?        # display built-in documentation

'1.12.1'

## Understanding Data Types in Python
### Integers
- head contains reference count, type code, size
    - reference count --> helps Python handle memory allocation
    - type code --> encode variable type
    - size --> specifies size of data members
- digit --> contains actual integer values we expect var to represent

### Lists
- Python can have lists of all same type or of mixed type
- to allow mixed type lists, each item has to be its own complete item
    - each item has to have type info, ref count, etc.
        - redundant if single type variable, takes too much space
- if list is of single type, it;s more efficient to store data in fixed-type array
    - this is a NumPy-style array

#### Fixed-Type Arrays in Python
- Python has built in `array` module
- NumPy's `ndarray` object adds efficient operations on data

In [2]:
import array
L = list(range(10))
array.array('i', L)     # i is type code indicating integer data

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### Creating Arrays from Python Lists
- use `np.array`
- NumPy arrays must contain info of all same type
    - if types don't match, it'll upcast if possible
- can explicitly set data type using `dtype` keyword
- NumPy arrays can be multidimensional

In [3]:
a1 = np.array([1, 2, 3, 4])     # integer array
a1

array([1, 2, 3, 4])

In [4]:
a2 = np.array([3.14, 4, 5, 6])     # mismatched types, upcast to floating point
a2

array([ 3.14,  4.  ,  5.  ,  6.  ])

In [5]:
a3 = np.array([1,2,3,4], dtype='float32')     # explicitly set data type
a3

array([ 1.,  2.,  3.,  4.], dtype=float32)

In [6]:
a4 = np.array([range(i, i+3) for i in [2,4,6]])     # nested lists resulting in multidimensional array
a4     # results is 2D array

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

#### Creating Arrays from Scratch
- w/larger arrays, it's more efficient to create arrays from scratch using NumPy routines

In [7]:
# length-5 int array filled w/zeros
np.zeros(5, dtype='int')

array([0, 0, 0, 0, 0])

In [8]:
# 3x5 floating-point array of 1s
np.ones((3,5), dtype=float)

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [9]:
# 3x5 filled w/3.14
np.full((3,5), 3.14)

array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14]])

In [10]:
# linear sequence starting @ 0 ending @ 20, step by 2
# similar to range()
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [11]:
# array of 5 values evenly spaced between 0 & 1
np.linspace(0,1,5)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])

In [12]:
# 3x3 array of uniformly distributed random #s between 0 & 1
np.random.random((3,3))

array([[ 0.34295657,  0.02304556,  0.15127279],
       [ 0.9352297 ,  0.13766249,  0.15570051],
       [ 0.95757951,  0.93051372,  0.16765972]])

In [13]:
# 3x3 of random ints on [0, 10)
np.random.randint(0,10,(3,3))

array([[0, 9, 1],
       [9, 5, 1],
       [5, 3, 2]])

In [14]:
# 4x4 identity matrix
np.eye(4)

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

In [15]:
# uninitialized array of 3 ints
# values will be whatever happens to already exist @ that memory location
np.empty(3)

array([  1.72723371e-077,  -4.33017271e-311,   2.15482819e-314])

### NumPy Standard Data Types
- data types are similar to C b/c NumPy is built in C
- when constructing an array, specify data type using `dtype`

(Table taken from [NumPy Documentation](https://docs.scipy.org/doc/numpy/user/basics.types.html))
<table border="1" class="docutils">
<colgroup>
<col width="17%" />
<col width="83%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Data type</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td><code class="docutils literal"><span class="pre">bool_</span></code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">int_</span></code></td>
<td>Default integer type (same as C <code class="docutils literal"><span class="pre">long</span></code>; normally either
<code class="docutils literal"><span class="pre">int64</span></code> or <code class="docutils literal"><span class="pre">int32</span></code>)</td>
</tr>
<tr class="row-even"><td>intc</td>
<td>Identical to C <code class="docutils literal"><span class="pre">int</span></code> (normally <code class="docutils literal"><span class="pre">int32</span></code> or <code class="docutils literal"><span class="pre">int64</span></code>)</td>
</tr>
<tr class="row-odd"><td>intp</td>
<td>Integer used for indexing (same as C <code class="docutils literal"><span class="pre">ssize_t</span></code>; normally
either <code class="docutils literal"><span class="pre">int32</span></code> or <code class="docutils literal"><span class="pre">int64</span></code>)</td>
</tr>
<tr class="row-even"><td>int8</td>
<td>Byte (-128 to 127)</td>
</tr>
<tr class="row-odd"><td>int16</td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr class="row-even"><td>int32</td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr class="row-odd"><td>int64</td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr class="row-even"><td>uint8</td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr class="row-odd"><td>uint16</td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr class="row-even"><td>uint32</td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr class="row-odd"><td>uint64</td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">float_</span></code></td>
<td>Shorthand for <code class="docutils literal"><span class="pre">float64</span></code>.</td>
</tr>
<tr class="row-odd"><td>float16</td>
<td>Half precision float: sign bit, 5 bits exponent,
10 bits mantissa</td>
</tr>
<tr class="row-even"><td>float32</td>
<td>Single precision float: sign bit, 8 bits exponent,
23 bits mantissa</td>
</tr>
<tr class="row-odd"><td>float64</td>
<td>Double precision float: sign bit, 11 bits exponent,
52 bits mantissa</td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">complex_</span></code></td>
<td>Shorthand for <code class="docutils literal"><span class="pre">complex128</span></code>.</td>
</tr>
<tr class="row-odd"><td>complex64</td>
<td>Complex number, represented by two 32-bit floats (real
and imaginary components)</td>
</tr>
<tr class="row-even"><td>complex128</td>
<td>Complex number, represented by two 64-bit floats (real
and imaginary components)</td>
</tr>
</tbody>
</table>

## The Basics of NumPy Arrays
### NumPy Array Basics
`.random.seed(0)` is the seed for reproducibility
- view [this page](https://stackoverflow.com/questions/21494489/what-does-numpy-random-seed0-do) for a great explanation on this
- doesn't reset the random # generator, so each time you run the code, you get the same "random" numbers
- don't use if you want different numbers each time
- if you want _truly_ random numbers leave the seed value blank, so the call would be `.seed()` (see link above for more)

Attributes of NumPy arrays:
- `ndim` (# of dimensions)
- `shape` (shape of each dimension)
- `size` (total size of array)
- `dtype` (data type of array)
- `itemsize` (lists size {in bytes} of each array element)
- `nbytes` (lists total size {in bytes} of array)
    - in generally expected to equal `itemsize` times `size`

In [16]:
np.random.seed(0)     # see notes above about this, used here to follow book
x1 = np.random.randint(10, size=6)         # 1D array
x2 = np.random.randint(10, size=(3,4))     # 2D array
x3 = np.random.randint(10, size=(3,4,5))   # 3D array

print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size:", x3.size)
print("x3 dtype", x3.dtype)
print("x3 itemsize:", x3.itemsize, "bytes")
print("x3 nbytes:", x3.nbytes, "bytes")

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
x3 dtype int64
x3 itemsize: 8 bytes
x3 nbytes: 480 bytes


### Array Indexing: Accessing Single Elements
- just like regular Python list, index counting starts at 0 & desired index is specified w/in []
- to index starting from end of array, use negative indices
    - -1 is the last element in array
- in multidimensional array, access items using comma-separated tuple of indices
    - `x2[2,-1]`
    - can modify values in this way as well
        - `x2[0,0] = 12`
- remember that NumPy array are fixed type, so if you put in a mismatched type, it'll be forced into the appropriate type

### Array Slicing: Accessing Subarrays
- can slice NumPy arrays with `:`
- NumPy slicing syntax:
    `x[start:stop:step]`
    - if any of these values is left unspecified, the defaults are:
        - `start = 0`
        - `stop =` _size of dimension_
        - `step = 1`

#### One-Dimensional Subarrays
- `np.arange(#)`
    - `.arange(start, stop, step, dtype=____)`
    - return evenly space values w/in given interval
    - start value is optional, interval includes this value
    - stop value is required, interval excludes this value
    - step value is optional, spacing between values
        - interval doesn't include this value except when `step` isn't an `int` (i.e. a `float`) in which round-off affects length of output
        - if negative, defaults for start & stop are swapped
            - easy way to reverse an array
    - dtype specification is optional
        - if unspecified, it's inferred from data

In [17]:
# help(np.arange)
a1 = np.arange(10)     # stop value of 10
a1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [18]:
a1[:5]     # first 5 elements

array([0, 1, 2, 3, 4])

In [19]:
a1[5:]     # elements after index 5

array([5, 6, 7, 8, 9])

In [20]:
a1[4:7]     # middle subarray

array([4, 5, 6])

In [21]:
a1[::2]     # every other element

array([0, 2, 4, 6, 8])

In [22]:
a1[1::2]     # every other element starting @ index 1

array([1, 3, 5, 7, 9])

In [23]:
a1[::-1]     # all elements, reversed

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [24]:
a1[5::-2]    # reversed every other from index 5

array([5, 3, 1])

#### Multidimensional Subarrays
- slicing works same as one-dimensional, w/multiple slices separated by commas
- subarray dimensions can be reversed together
- if only working w/rows, can eliminate empty slice for cols (see reversing rows below)

In [25]:
# using x2 from above
x2[:2, :3]     # 2 rows, 3 cols

array([[3, 5, 2],
       [7, 6, 8]])

In [26]:
x2[:3, ::2]     # 3 rows, every other column

array([[3, 2],
       [7, 8],
       [1, 7]])

In [27]:
x2[::-1]     # reverse rows

array([[1, 6, 7, 7],
       [7, 6, 8, 8],
       [3, 5, 2, 4]])

In [28]:
x2[:,::-1]     # reverse cols

array([[4, 2, 5, 3],
       [8, 8, 6, 7],
       [7, 7, 6, 1]])

In [29]:
x2[::-1, ::-1]     # reverse rows & columns together

array([[7, 7, 6, 1],
       [8, 8, 6, 7],
       [4, 2, 5, 3]])

##### Accessing Array Rows and Columns
- accessing single rows/columns is done by combining indexing & slicing using an empty slice (`:`)
- w/row access, empty slice for columns can be omitted

In [30]:
print(x2[:,0])     # first col of x2
print(x2[0])     # first row of x2, same as x2[0,:]

[3 7 1]
[3 5 2 4]


#### Subarrays as No-Copy Views
- array slices return views, not copies, of array data
- if we create a subarray & assign it no a var then modify it, the original array will be modified
    - this is default behavior & is useful when processing large datasets
    
#### Creating Copies of Arrays
- to create a copy of array (so you don't change original), use the `.copy()` method

In [31]:
print(x2)     # original array
x2_sub = x2[:2, :2]     # 2x2 subarray
x2_sub[0,0] = 99     # modify subarray
print("\n", x2)     # observe change in original

x2_sub_copy = x2[:2, :2].copy()     # create subarray copy
x2_sub_copy[0,0] = 42     # modify copy
print("\n", x2_sub_copy)     # observe changes in copy
print("\n", x2)     # no changes in original

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]

 [[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]

 [[42  5]
 [ 7  6]]

 [[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


### Reshaping of Arrays
- use `.reshape()` to shape an array
- for this to work, size of initial array must match size of reshaped array
- where possible, this will create a no-copy view of the original
- conversion of 1D array into 2D row/column matrix can be done with `reshape` or by using `newaxis` w/in a slice operation

In [32]:
grid = np.arange(1,10).reshape((3,3))
print(grid)

x = np.array([1,2,3])
x.reshape((1,3))     # row vector using reshape
x[np.newaxis, :]     # row vector using newaxis
x.reshape((3,1))     # column vector using reshape
x[:, np.newaxis]     # column vector using newaxis

[[1 2 3]
 [4 5 6]
 [7 8 9]]


array([[1],
       [2],
       [3]])

### Array Concatenation & Spliting
#### Concatenation of Arrays
- concatenation can be accomplished using:
    - `.concatenate()`
        - takes tuple/list of arrays as first argument
        - 2nd argument is axis to join along (zero-indexed)
        - can concatenate more than 2 arrays @ once
        - can be used on 2D arrays
    - `.vstack()`
        - stacks veritcally
    - `.hstack()`
        - stacks horizonally
    - `.dstack()`
        - stacks along 3rd axis
- when using mixed dimensions, using `.vstack()`, `.hstack()`, `.dstack()` can be easier

In [33]:
# use x from above
y = np.array([3,2,1])
z = np.array([99,99,99])
# use grid from above
# use x3 from above

np.concatenate([x,y])     # concatenate 2 arrays
np.concatenate([x,y,z])     # concatenate 3 arrays

np.concatenate([grid, grid])     # concatenate along 1st axis
np.concatenate([grid, grid], axis=1)     # concatenate along 2nd axis

np.vstack([x, grid])     # vertically stack x & grid

z = z.reshape((3,1))     # reshape z

np.hstack([grid, z])     # horizonally stack grid & z

# good idea to look at x3 before looking at this output
np.dstack([x3,x3])     # dstack x3 & x3

array([[[8, 1, 5, 9, 8, 8, 1, 5, 9, 8],
        [9, 4, 3, 0, 3, 9, 4, 3, 0, 3],
        [5, 0, 2, 3, 8, 5, 0, 2, 3, 8],
        [1, 3, 3, 3, 7, 1, 3, 3, 3, 7]],

       [[0, 1, 9, 9, 0, 0, 1, 9, 9, 0],
        [4, 7, 3, 2, 7, 4, 7, 3, 2, 7],
        [2, 0, 0, 4, 5, 2, 0, 0, 4, 5],
        [5, 6, 8, 4, 1, 5, 6, 8, 4, 1]],

       [[4, 9, 8, 1, 1, 4, 9, 8, 1, 1],
        [7, 9, 9, 3, 6, 7, 9, 9, 3, 6],
        [7, 2, 0, 3, 5, 7, 2, 0, 3, 5],
        [9, 4, 4, 6, 4, 9, 4, 4, 6, 4]]])

#### Splitting of Arrays
- splitting is opposite of concatenation
- functions:
    - `.split()`
    - `.hsplit()`
    - `.vsplit()`
    - `.dsplit()`
- these functions work just like the concatenation functions
- provide a list of split points
    - N split points gives N+1 subarrays

In [34]:
x = [1, 2, 3, 99, 99, 3, 2, 1]     # array to split
x0, x1, x2 = np.split(x, [3, 5])     # split function w/array var to assign the split to
print(x1, x2, x3,"\n")     # view result of split

grid = np.arange(16).reshape((4, 4))     # reassign grid
upper, lower = np.vsplit(grid, [2])     # split grid vertically & view results
print(upper)
print(lower, "\n")

left, right = np.hsplit(grid, [2])     # split grid horizontally & view results
print(left)
print(right)

[99 99] [3 2 1] [[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]] 

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]] 

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


In [35]:
x3d = np.dstack([x3,x3])     # dstack x3 & x3
x31, x32, x33 = np.dsplit(x3d, [4,5])     # split on 3rd axis & view results
print(x31, "\n")
print(x32, "\n")
print(x33)

[[[8 1 5 9]
  [9 4 3 0]
  [5 0 2 3]
  [1 3 3 3]]

 [[0 1 9 9]
  [4 7 3 2]
  [2 0 0 4]
  [5 6 8 4]]

 [[4 9 8 1]
  [7 9 9 3]
  [7 2 0 3]
  [9 4 4 6]]] 

[[[8]
  [3]
  [8]
  [7]]

 [[0]
  [7]
  [5]
  [1]]

 [[1]
  [6]
  [5]
  [4]]] 

[[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]


## Computation on NumPy Arrays: Universal Functions
- key to making computations fast is using vectorized operations, generally through NumPy's _universal functions_ (ufuncs)

### The Slowness of Loops
- can time execuation using the IPython Magic `%timeit`

In [40]:
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
        return output

values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

big_array = np.random.randint(1,100,size=1000000)
%timeit compute_reciprocals(big_array)

The slowest run took 10.99 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.22 µs per loop


### Introducing UFuncs
- vectorized approach is designed to push loop into compiled layer underlying NumPy, leading to much faster execution
- vectorized operations in NumPy are utilized via _ufuncs_, whose main purpose is to quickly execute repeated operations on vals in NumPy arrays
- ufuncs are really flexible
    - can operate between scalar + array
    - can operate between 2 arrays
        - can be one-dimensional
        - can be multidimensional
    - can operate between arrays of different sizes & shapes
        - this is _broadcasting_, which will be explored later

### Exploring NumPy's UFuncs
- _unary ufuncs_ take a single input
- _binary ufuncs_ take 2 inputs

#### Array Arithmetic
- NumPy's ufuncs make use of Python's native math operators
    - standard addition, subtraction, multiplication, division can all be used
        - addition: +
        - subtraction: -
            - negation: -
        - multiplication: \*
            - exponents: \*\*
        - division: /
            - floor division: //
            - modulus: %
    - operations can be strung together whatever way you like & order of operations is followed

<table>
    <tr>
        <th>Operator</th>
        <th>ufunc</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>+</td>
        <td>`np.add`</td>
        <td>addition</td>
    </tr>
    <tr>
        <td>-</td>
        <td>`np.subtract`</td>
        <td>subtraction</td>
    </tr>
    <tr>
        <td>-</td>
        <td>`np.negative`</td>
        <td>unary negation</td>
    </tr>
    <tr>
        <td>*</td>
        <td>`np.multiply`</td>
        <td>mulitiplication</td>
    </tr>
    <tr>
        <td>**</td>
        <td>`np.power`</td>
        <td>exponentiation</td>
    </tr>
    <tr>
        <td>/</td>
        <td>`np.divide`</td>
        <td>division</td>
    </tr>
    <tr>
        <td>//</td>
        <td>`np.floor_divide`</td>
        <td>floor division</td>
    </tr>
    <tr>
        <td>%</td>
        <td>`np.mod`</td>
        <td>modulus/remainder</td>
    </tr>
</table>

#### Absolute Value
- NumPy understands Python's abosulute value function
    - Python: `abs()`
    - NumPy ufunc: `np.absolute()` or `np.abs()`
- when handling complex data, the ufunc returns the magnitude

#### Trigonometric Functions
- `np.pi` is the value of pi
- values are computed to w/in machine precision, which is why values that should be 0 don't always come out exactly at 0
- trig functions that are available include:
<table>
    <tr>
        <td>`np.sin()`</td>
        <td>`np.cos()`</td>
        <td>`np.tan()`</td>
    </tr>
    <tr>
        <td>`np.arcsin()`</td>
        <td>`np.arccos()`</td>
        <td>`np.arctan()`</td>
    </tr>
</table>

#### Exponents and Logarithms
- `np.exp(x)`
    - raise `e` to the x power
- `np.exp2(x)`
    - 2 to the x power
- `np.power(#, x)`
    - \# to the x power
- `np.log(x)`
    - natural log of x [ln(x)]
- `np.log2(x)`
    - log base 2 of x
- `np.log10(x)`
    - log base 10 of x
- there are specialized versions useful for maintaining precision w/very small input:
    - `np.expm1(x)`
    - `np.log1p(x)`
    - when x is very small, these give more precise values than regular log & exp

In [46]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) =", np.exp(x))
print("exp(x) - 1 =", np.expm1(x))
# print("log(x) =", np.log(x))     # don't run this, it gives a division by 0 error
print("log(1 + x) =", np.log1p(x))

exp(x) = [ 1.          1.0010005   1.01005017  1.10517092]
exp(x) - 1 = [ 0.          0.0010005   0.01005017  0.10517092]
log(1 + x) = [ 0.          0.0009995   0.00995033  0.09531018]


#### Specialized Ufuncs
- many more ufuncs are available
    - hyperbolic trig functions
    - bitwise math
    - comparision operators
    - radian to degree conversion
    - rounding & remainders
    - ... & much more!
- look though the NumPy documenation for more
- more specialized & obscure functions are available through `scipy.special`
    - see the book for (stats) example & documentation for more
        - [SciPy documentation](https://docs.scipy.org/doc/scipy/reference/tutorial/special.html)

### Advanced Ufunc Features
#### Specifying Output
- so you can do what's in the cell below or you can do what's in the cell below that, which seems like a waste of time
    - it's using the `out` argument, which can be done on all ufuncs & array views
    - apparently the 1st way creates a temp array to hold the computation results then copies them to the assigned array
        - memory waster for large amounts of data

In [49]:
x = np.arange(5)
y = np.multiply(x,10)
y

array([ 0, 10, 20, 30, 40])

In [50]:
y = np.empty(5)
np.multiply(x, 10, out=y)
y

array([  0.,  10.,  20.,  30.,  40.])

In [54]:
y = np.zeros(10)    # using out argument on array views
np.power(2,x, out=y[::2])     # done instead of y[::2] = 2 ** x
print(y)

[  1.   0.   2.   0.   4.   0.   8.   0.  16.   0.]


#### Aggregates
- for binary ufuncs, there's some aggregates that can be computed directly from the object
- if we want to reduce an array w/an operation, use the `reduce` method of any ufunc
    - a reduce applies a given operation to the elements of an array until only a single result remains
- if we want to store all intermediate results of computation, use `accumulate` method

In [59]:
ag1 = np.arange(1,6)

add = np.add.reduce(ag1)     # reduce addition
print(add)

add = np.add.accumulate(ag1)     # accumulate addition
print(add)

mult = np.multiply.reduce(ag1)     # reduce multiplication
print(mult)

mult = np.multiply.accumulate(ag1)     # accumulate multiplication
print(mult)

15
[ 1  3  6 10 15]
120
[  1   2   6  24 120]


#### Outer Products
- any ufunc can compute output of all pairs of 2 different inputs using `outer` method
    - allows you, in one line, to create things like multiplication tables (see below)

In [60]:
mtable = np.arange(1,6)
np.multiply.outer(mtable,mtable)

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

### Ufuncs: Learning More
- [NumPy Universal Function Documentation](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs)

## Aggregations: Min, Max, and Everything in Between
### Summing the Values in an Array
- Python's summing function is `sum()`
- NumPy's summing function is `.sum()`
- NumPy's summing function is much faster
- the functions aren't identical though
    - optional arguments have different meanings
    - NumPy's sum is aware of multiple array dimensions

### Minimum and Maximum
- Python functions: `min()` and `max()`
- NumPy functions: `.min()` and `.max()`
    - operate more quickly
- for many NumPy aggregates, a shorter syntax is to use methods of the array object itself
    - ex: `big_array.min()`
- whenever possible, make sure you're using NumPy version of the aggregates when operating on NumPy arrays

### Multidimensional Aggregates
- common type of aggregation is along a row/column
- default is that each aggregate function operates over entire array
- aggregation functions take additional argument specifying the `axis` argument
    - specifies dimension of the array that'll be collapsed (not the dimension to be returned)
    - in a 2D array, if axis is 0, aggregation will be done on columns
    - in a 2D array, if axis is 1, aggregations will be done on rows

### Other Aggregation Functions
- most aggregates have a NaN-safe counterpart that computes result while ignoring missing values, which are marked by the special IEEE floating-point NaN value
- _note: skipped example b/c didn't have data_

<table>
    <tr>
        <th>Function Name</th>
        <th>NaN-Safe Version</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>`np.sum`</td>
        <td>`np.nansum`</td>
        <td>compute sum of elements</td>
    </tr>
    <tr>
        <td>`np.prod`</td>
        <td>`np.nanprod`</td>
        <td>compute product of elements</td>
    </tr>
    <tr>
        <td>`np.mean`</td>
        <td>`np.nanmean`</td>
        <td>compute mean of elements</td>
    </tr>
    <tr>
        <td>`np.std`</td>
        <td>`np.nanstd`</td>
        <td>compute standard deviation</td>
    </tr>
    <tr>
        <td>`np.var`</td>
        <td>`np.nanvar`</td>
        <td>compute variance</td>
    </tr>
    <tr>
        <td>`np.min`</td>
        <td>`np.nanmin`</td>
        <td>find minimum value</td>
    </tr>
    <tr>
        <td>`np.max`</td>
        <td>`np.nanmax`</td>
        <td>find maximum value</td>
    </tr>
    <tr>
        <td>`np.argmin`</td>
        <td>`np.nanargmin`</td>
        <td>find index of minimum value</td>
    </tr>
    <tr>
        <td>`np.argmax`</td>
        <td>`np.nanargmax`</td>
        <td>find index of maximum value</td>
    </tr>
    <tr>
        <td>`np.median`</td>
        <td>`np.nanmedian`</td>
        <td>compute median of elements</td>
    </tr>
    <tr>
        <td>`np.percentile`</td>
        <td>`np.nanpercentile`</td>
        <td>compute rank-based statistics of elements</td>
    </tr>
    <tr>
        <td>`np.any`</td>
        <td>n/a</td>
        <td>evaluate whether any elements are true</td>
    </tr>
    <tr>
        <td>`np.all`</td>
        <td>n/a</td>
        <td>evaluate whether all elements are true</td>
    </tr>
</table>

## Computation on Arrays: Broadcasting
- another means of vectorizing operations is to use _broadcasting_ functionality
- broadcasting is simply a set of rules for applying binary ufuncs on arrays of different 

### Introducing Broadcasting
- for arrays of same size, binary operations are performed on element-by-element basis
- broadcasting lets this type of operation be performed on arrays of different sizes
- in the first 2 examples below, only one of the arrays is broadcast onto the other
- more complicated examples (third below) involve broadcasting of both arrays
    - both arrays are stretched to match a common shape

In [2]:
a = np.array([0,1,2])
a + 5      # same as adding a to an array [5,5,5]

array([5, 6, 7])

In [4]:
one = np.ones((3,3))
one + a      # [1x3] + [3x3]

array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

In [7]:
# a from above
b = np.arange(3)[:, np.newaxis]
print(a)
print(b)
a + b      # both arrays are stretched to match common shape

[0 1 2]
[[0]
 [1]
 [2]]


array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

### Rules of Broadcasting