<DIV ALIGN=CENTER>

# Introduction to NumPy
## Professor Robert J. Brunner
  
</DIV>  
-----
-----

## Introduction 

As we discused in the [Fundamental Python](2_fundamentalpy.ipynb)
Notebook, the Python programming language provides a rich set of data
structures such as the list, tuple, dictionary, and string, which can
greatly simply common programming tasks. Of these, all but the string
are heterogeneous, which means they can hold data of different types.
This flexibility comes at a cost, however, as it is more expensive in
both computational time and storage to maintain an arbitrary collection
of data than to hold a pre-defined set of data.

(Advanced) For example, the Python list is implemented (in the
[Cython](http://cython.org) implementation) as a variable length array
that contains pointers to the objects held in the array. While flexible,
it takes time to create, resize, and iterate over, even if the data
contained in the list is homogenous. In addition, even though you can
create multiple-dimensional lists, creating and working with them is not
simple nor intuitive. Yet, many applications require multi-dimensional
representation; for example, location on the surface of the Earth or
pixel properties in an image.

Thus, these data structures are clearly not designed nor optimized for
data intensive computing. Scientific and engineering computing
applications have a long history of using optimized data structures and
libraries, including codes written in C, Fortran, or MatLab. These
applications expect to have vector and matrix data structures and
optimized algorithms that can leverage these structures in a seamless
way. Fortunately, since many data science applications, including
statistical and machine learning, are built on this academic legacy, a
community of open source developers have built the [Numerical Python
(NumPy)](http://numpy.org) library to which is a fundamental numerical
package to facilitate scientific computing in Python.

NumPy is built around a new, n-dimensional array (`ndarray`) data
structure that provides fast support for numerical computations. This
data type for objects stored in the array can be specified at creation
time, but the array is homogenous. This array can be used to represent a
vector (one-dimensional set of numerical values) or matrices
(multiple-dimensional set of vectors). Furthermore, NumPy provides
additional benefits built-on top of the `array` object, including
_masked arrays_, _universal functions_, _sampling from random
distributions_, and support for _user-defined, arbitrary data-types_
that allow the `array` to become an efficient, multi-dimensional generic
data container.

-----

### Is NumPy worth learning?

Despite the discussion in the previous section, you might be curious if
the benefits of learning NumPy are worth the effort of learning to
effectively use a new Python data structure, especially one that is not
part of the standard Python distribution. In the end, you will have to
make this decision, but there are several definite benefits to using
NumPy:

1. NumPy is much faster than using a `list`.
2. NumPy is generally more intuitive than using a `list`.
3. NumPy is used by many other libraries like SciPy, MatPlotLib, and Pandas.
4. NumPy is part of the standard **data science** Python distribution.

NumPy is a very powerful library, with a rich and detailed [user guide][1].
Every time I look through the documentation, I learn new tricks. The
time you spend learning to use this library properly will be amply
rewarded. In the rest of this IPython notebook we will introduce many of
the basic NumPy features, but to fully master this library you will need
to spend time reading the documentation and trying out the full
capabilities of the library.

To demonstrate the first two points, consider the programming task of
computing basic mathematical functions on a large number of data points.
In the first code block we first import both the `math` library and the
`numpy` library. Second, we define two constants: size, which is the
number of data points to process, and delta, which is a floating point
offset we add to the array elements. You can change these two parameters
in order to see how the performance of the different approaches varies.
Finally we create the `list` and the NumPy `array` that we will use in
the next few codes blocks:

-----

[1]: http://docs.scipy.org/doc/numpy/user/

In [30]:
import math
import numpy as np

size = 100000
delta = 1.0E-2

aList = [(x + delta) for x in range(size)]
anArray = np.arange(size) + delta

print(aList[2:6])
print(anArray[2:6])

[2.01, 3.01, 4.01, 5.01]
[ 2.01  3.01  4.01  5.01]


-----

At this point, we have created and populated both data structures and
you have seen that they are both indexed in the same manner, meaning it
is probably easier to learn and use NumPy arrays than you might have
thought. Next, we can apply several standard mathematical functions to
our `list`, creating new `list`s in the process. To determine how long
these operations take, we use the IPython `%timeit` magic function,
which will, by default, run the code contained on the rest of the line
multiple times and report the _average_ best time.

-----

In [31]:
%timeit [math.sin(x) for x in aList]
%timeit [math.cos(x) for x in aList]
%timeit [math.log(x) for x in aList]

100 loops, best of 3: 14.8 ms per loop
100 loops, best of 3: 14.2 ms per loop
100 loops, best of 3: 16.1 ms per loop


-----

As you can see, to create these new `list`s, we apply the mathematical
function to every angle in the original `list`. These operations are
fairly fast and all roughly constant in time, demonstrating the overall
speed of _list comprehensions_ in Python. Now lets, try doing the same
thing by using the NumPy library.

-----

In [32]:
%timeit np.sin(anArray)
%timeit np.cos(anArray)
%timeit np.log10(anArray)

100 loops, best of 3: 3.4 ms per loop
100 loops, best of 3: 3.2 ms per loop
100 loops, best of 3: 2.76 ms per loop


-----

First, the creation of each of these new arrays was much faster, nearly a
factor of ten in each case (actual results will depend on the host computer 
and Python version)! Second, the operations themselves were
arguably simpler to both write and to read as the function is applied to
the data structure itself and not each individual element. But perhaps
we should compare the results to ensure they are the same?

-----

In [33]:
l = [math.sin(x) for x in aList]
a = np.sin(anArray)

print("Python List: ", l[2:10:3])
print("NumPY array:", a[2:10:3])

print("Difference = ", a[5:7] - np.array(l[5:7]))
      
# Now create a NumPy array from a Python list
%timeit(np.sin(aList))

Python List:  [0.905090563325201, -0.9560397542711181, 0.9878538030350801]
NumPY array: [ 0.90509056 -0.95603975  0.9878538 ]
Difference =  [ 0.  0.]
100 loops, best of 3: 6.4 ms per loop


-----

As the previous code block demonstrates, the NumPy results agree with
the standard Python results, although the NumPy results are more
conveniently displayed. As a last test, we create a new NumPy `array`
from the original Python `list` by applying the `np.sin` function,
which, while not as fast as the pure NumPy version, is faster than the
Python version and easier to read.

Now lets change gears and actually introduce the NumPy library.


-----

### Creating an Array

[NumPy arrays][i], which are instances of the `ndarray` class are
statically-typed, homogenous data structures that can be created in a
number of [different ways][1]. You can create an array from an existing
Python `list` or `tuple`, or use one of the many built-in NumPy
convenience methods:

- `empty`: Creates a new array whose elements are uninitialized.
- `zeros`: Create a new array whose elements are initialized to zero.
- `ones`: Create a new array whose elements are initialized to one.
- `empty_like`: Create a new array whose size matches the input array 
and whose values are uninitialized.
- `zero_like`: Create a new array whose size matches the input array 
and whose values are initialized to zero.
- `ones_like`: Create a new array whose size matches the input array 
and whose values are initialized to unity.

-----
[i]: http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
[1]: http://docs.scipy.org/doc/numpy/user/basics.creation.html

In [34]:
# Make and print out simple NumPy arrays

print(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

print("\n", np.empty(10))
print("\n", np.zeros(10))
print("\n", np.ones(10))
print("\n", np.ones_like(np.arange(10)))

[0 1 2 3 4 5 6 7 8 9]

 [  0.00000000e+000   4.94065646e-324   9.88131292e-324   1.48219694e-323
   1.97626258e-323   2.47032823e-323   2.96439388e-323   3.45845952e-323
   3.95252517e-323   4.44659081e-323]

 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]

 [1 1 1 1 1 1 1 1 1 1]


-----

We can also create NumPy arrays that have specific initialization
patterns. For example, the `arange(start, end, stride)` method works
like the normal Python `range` method, creating an array whose elements
begin with the `start` parameter. Subsequent elements are incremented
successively by the `stride` parameter, stopping when the `end` parameter
would either be reached or surpassed. As was the case with the `range`
method, the `start` and `stride` parameters are optional, defaulting to
zero and one respectively, and the `end` value is not included in the
array.

-----

In [35]:
# Demonstrate the np.arange method

print(np.arange(10))
print(np.arange(3, 10, 2))

[0 1 2 3 4 5 6 7 8 9]
[3 5 7 9]


-----

NumPy also provides two convenience methods that also take a similar
form, but in this case they assign the elements valus that are evenly
spaced. the first method is the `linspace(start, end, num)` method,
which creates `num` elements and assign values that are linearly spaced
between `start` and `end`. 

The second method, `logspace(start, end, num)`, creates `num` elements
and assigns valus that are logarithmically spaced between `start` and
`end`. The `num` parameter is optional and defaults to fifty. Unlike the
`arange` method, these two methods are inclusive, which means both the
`start` and `end` parameters are included as elements in the new array.
There is an optional parameter called `base`, that you can use to
specify the base of the logarithm used to create the intervals. By
default this value is ten, making the intervals `log10` spaced.

-----

In [36]:
# Demonstrate linear and logarthmic array creation.

print("Linear space bins [0, 10] = {}\n".format(np.linspace(0, 10, 4)))

print("Default linspace bins = {}\n".format(len(np.linspace(0,10))))


print("Log space bins [0, 1] = {}\n".format(np.logspace(0, 1, 4)))

print("Default linspace bins = {}\n".format(len(np.logspace(0,10))))

Linear space bins [0, 10] = [  0.           3.33333333   6.66666667  10.        ]

Default linspace bins = 50

Log space bins [0, 1] = [  1.           2.15443469   4.64158883  10.        ]

Default linspace bins = 50



-----

### Array Attributes

Each NumPy array has several attributes that describe the general
features of the array. These attributes include the following:
- `ndim`: Number of dimensions of the array (previous examples were all unity).
- `shape`: The dimensions of the array, so a matrix with n rows and m
columns has `shape` equal to `(n, m)`.
- `size`: The total number of elements in the array. For a matrix with n
rows and m columns, the `size` is equal to the product of $n \times m$.
- `dtype`: The data type of each element in the array.
- `itemsize`: The size in bytes of each element in the array.
-`data`: The buffer that actually holds the array data.

-----

### Array Data Types

NumPy arrays are statically-typed, thus their [data type][1] is
specified when they are created. The default data type is `float`, but
this can be specified in several ways. First, if you use an existing
`list` (as we did in the previous code block) or `array` to initialize
the new `array`, the data type of the previous data structure will be
used. If a heterogeneous Python `list` is used, the greatest data type
will be used in order guarantee that all values will be safely contained
in the new `array`. If using a NumPy function to create the new `array`,
the data type can be specified explicitly by using the `dtype` argument,
which can either be one of the predefined built-in data types or a user
defined custom data type.

The full list of built-in data types can be obtained from the
`np.sctypeDict.keys()` method; but for brevity, we list some of the more
commonly used built-in data types below, along with their maximum size
in bits, which constrains the maximum allowed value that may be stored
in the new `array`:

- Integer: `int8`, `int16`, `int32`, and `int64` 
- Unsigned Integer: `uint8`, `uint16`, `uint32`, and `uint64` 
- Floating Point: `float16`, `float32`, `float64`, and `float128` 

Other data types include complex numbers, byte arrays, character arrays,
and dat/time arrays. 

To check the type of an array, you can simply access the array's `dtype`
attribute. A `ValueError` exception will be thrown if you try to assign
an incompatible value to an element in an `array`. 

-----
[1]: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

In [37]:
# Access our previously created array's data type 

a.dtype

dtype('float64')

In [39]:
# Try to assign a string to a floating point array element

a[0] = 'Hello!'

ValueError: could not convert string to float: 'Hello!'

-----

### Multidimensional Arrays

NumPy supports multidimensional arrays, although for simplicity we will
rarely discuss anything other than two- or three-dimensional arrays.
Higher dimensional arrays can be created in the normal process, where an
array with the correct number of elements is created, and subsequently
reshaped into the correct dimensionality. For example, you can create a
NumPy array with one hundred elements and reshape this new array into a
ten by ten matrix.

-----

In [40]:
# Make a 10 x 10 array

data = np.arange(100)

mat = data.reshape(10, 10)

print(mat)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]


-----

Special convenience functions also exist to create special
multidimensional arrays. For example, you can create an identify matrix,
where the diagonal elements are all one and all other elements are zero
by using the `eye` method (since the Identify matrix is often denoted by
the capital letter I). You can also specify the diagonal (or certain
off-diagonal) elements by using the `diag` function, where the input
array is assigned to a set of diagonal elements in the new array. If the
`k` parameter to the `np.diag`  method is zero, the diagonal elements
will be initialized. If the `k` parameter is a positive(negative)
integer, the diagonal corresponding to the integer value of `k` is
initialized. The size of the resulting array will be the minimum
possible size to allow the input array to be properly initialized.

-----

In [41]:
# Create special two-dimensional arrays

print("Matrix will be 4 x 4.\n", np.eye(4))
print("\nMatrix will be 4 x 4.\n", np.diag(np.arange(4), 0))
print("\nMatrix will be 5 x 5.\n", np.diag(np.arange(4), 1))
print("\nMatrix will be 5 x 5.\n", np.diag(np.arange(4), -1))

Matrix will be 4 x 4.
 [[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]

Matrix will be 4 x 4.
 [[0 0 0 0]
 [0 1 0 0]
 [0 0 2 0]
 [0 0 0 3]]

Matrix will be 5 x 5.
 [[0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 2 0]
 [0 0 0 0 3]
 [0 0 0 0 0]]

Matrix will be 5 x 5.
 [[0 0 0 0 0]
 [0 0 0 0 0]
 [0 1 0 0 0]
 [0 0 2 0 0]
 [0 0 0 3 0]]


-----

### Indexing Arrays

NumPy supports many different ways to [access elements][1] in an array.
Elements can be indexed or sliced in the same way a Python list or tuple
can be indexed or sliced, as demonstrated in the following code cell.

------
[1]: http://docs.scipy.org/doc/numpy/user/basics.indexing.html

In [42]:
a = np.arange(9)
print("Original Array = ", a)

a[1] = 3
a[3:5] = 4
a[0:6:2] *= -1

print("\nNew Array = ", a)

Original Array =  [0 1 2 3 4 5 6 7 8]

New Array =  [ 0  3 -2  4 -4  5  6  7  8]


-----

### Slicing Multidimensional Arrays

Multi-dimensional arrays can be sliced, the only trick is to remember
the proper ordering for the elements. Each dimension is differentiated
by a comma in the slicing operation, so a two-dimensional array is
sliced with `[start1:end1, start2:end2]`, while a three-dimensional
array is sliced with `[start1:end1, start2:end2. start3:end3]`,
continuing on with higher dimensions. If only one dimension is
specified, it will default to the first dimension. These concepts are
demonstrated in the following two code cells, first for a
two-dimensional array, followed by a three-dimensional array.

Note that each of these slicing operations (i.e., `start:end`) can also
include an optional `stride` value as well.

-----

In [43]:
b = np.arange(9).reshape((3,3))

print("3 x 3 array = \n",b)

print("\nSlice in first dimension (row 1): ",b[0])
print("\nSlice in first dimension (row 3): ",b[2])

print("\nSlice in second dimension (col 1): ",b[:,0])
print("\nSlice in second dimension (col 3): ", b[:,2])

print("\nSlice in first and second dimension: ", b[0:1, 1:2])


print("\nDirect Element access: ", b[0,1])

3 x 3 array = 
 [[0 1 2]
 [3 4 5]
 [6 7 8]]

Slice in first dimension (row 1):  [0 1 2]

Slice in first dimension (row 3):  [6 7 8]

Slice in second dimension (col 1):  [0 3 6]

Slice in second dimension (col 3):  [2 5 8]

Slice in first and second dimension:  [[1]]

Direct Element access:  1


In [44]:
c = np.arange(27).reshape((3,3, 3))

print("3 x 3 x 3 array = \n",c)
print("\nSlice in first dimension (first x axis slice):\n",c[0])

print("\nSlice in first and second dimension: ", c[0, 1])

print("\nSlice in first dimension (third x axis slice):\n", c[2])

print("\nSlice in second dimension (first y axis slice):\n", c[:,0])
print("\nSlice in second dimension (third y axis slice):\n", c[:,2])

print("\nSlice in first and second dimension: ", c[0:1, 1:2])

print("\nSlice in first and second dimension:\n", c[0,1])
print("\nSlice in first and third dimension: ", c[0,:,1])
print("\nSlice in first, second, and third dimension: ", c[0:1,1:2,2:])

print("\nDirect element access: ", c[0,1, 2])

3 x 3 x 3 array = 
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]

Slice in first dimension (first x axis slice):
 [[0 1 2]
 [3 4 5]
 [6 7 8]]

Slice in first and second dimension:  [3 4 5]

Slice in first dimension (third x axis slice):
 [[18 19 20]
 [21 22 23]
 [24 25 26]]

Slice in second dimension (first y axis slice):
 [[ 0  1  2]
 [ 9 10 11]
 [18 19 20]]

Slice in second dimension (third y axis slice):
 [[ 6  7  8]
 [15 16 17]
 [24 25 26]]

Slice in first and second dimension:  [[[3 4 5]]]

Slice in first and second dimension:
 [3 4 5]

Slice in first and third dimension:  [1 4 7]

Slice in first, second, and third dimension:  [[[5]]]

Direct element access:  5


-----

### Special Indexing

NumPy also provides several other _special_ indexing techniques. The
first such technique is the use of an [index array][1], where you use an
array to specify the elements to be selected. The second technique is a
[Boolean mask array][2]. In this case, the Boolean array is the same
size as the primary NumPy array, and if the element in the mask array
is `True` the corresponding element in the primary array is selected,
and vice-versa for a `False` mask array element. These two special
indexing techniques are demonstrated in the following two code cells.

-----
[1]: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#index-arrays
[2]: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays

In [45]:
# Demonstration of an index array

a = np.arange(10)

print("\nStarting array:\n", a)
print("\nIndex Access: ", a[np.array([1, 3, 5, 7])])

c = np.arange(10).reshape((2, 5))

print("\nStarting array:\n", c)
print("\nIndex Array access: \n", c[np.array([0, 1]) , np.array([3, 4])])


Starting array:
 [0 1 2 3 4 5 6 7 8 9]

Index Access:  [1 3 5 7]

Starting array:
 [[0 1 2 3 4]
 [5 6 7 8 9]]

Index Array access: 
 [3 9]


In [46]:
# Demonstrate Boolean mask access

# Simple case

a = np.arange(10)
print("Original Array:", a)

print("\nMask Array: ", a > 4)

# Now change the values by using the mask

a[a > 4] = -1.0
print("\nNew Array: ", a)

# Now a more complicated example.

print("\n--------------------")
c = np.arange(25).reshape((5, 5))
print("\n Starting Array: \n", c)

# Build a mask that is True for all even elements with value greater than four
mask1 = (c > 4)
mask2 = (c % 2 == 0)

print("\nMask 1:\n", mask1)
print("\nMask 2:\n", mask2)

# We use the logical_and ufunc here, but it is described later
mask = np.logical_and(mask1, mask2)

print("\nMask :\n", mask)

print("\nMasked Array :\n", c[mask])
c[mask] /= -2.

print("\nNew Array :\n", c)

Original Array: [0 1 2 3 4 5 6 7 8 9]

Mask Array:  [False False False False False  True  True  True  True  True]

New Array:  [ 0  1  2  3  4 -1 -1 -1 -1 -1]

--------------------

 Starting Array: 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

Mask 1:
 [[False False False False False]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]

Mask 2:
 [[ True False  True False  True]
 [False  True False  True False]
 [ True False  True False  True]
 [False  True False  True False]
 [ True False  True False  True]]

Mask :
 [[False False False False False]
 [False  True False  True False]
 [ True False  True False  True]
 [False  True False  True False]
 [ True False  True False  True]]

Masked Array :
 [ 6  8 10 12 14 16 18 20 22 24]

New Array :
 [[  0   1   2   3   4]
 [  5  -3   7  -4   9]
 [ -5  11  -6  13  -7]
 [ 15  -8  17  -9  19]
 [-10  21 -11  23 -12]]


-----

### Random Data

NumPy has a rich support for [random number][1] generation, which can be
used to create and populate an array of a given shape and size. NumPy
provides support for sampling random values from over thirty different
distributions including the `binomial`, `normal`, and `poisson`, there
are also special convenience functions to simplify the sampling of
random data over the uniform or normal distributions. These techniques
are demonstrated in the following code cell.

-----
[1]: http://docs.scipy.org/doc/numpy/reference/routines.random.html




In [47]:
# Create arrays of random data from the uniform distribution

print("Uniform sampling [0, 1): ", np.random.rand(5))
print("Uniform sampling, integers [0, 1): ", np.random.randint(0, 10, 5))
print("Normal sampling (0, 1) : ", np.random.randn(5))

Uniform sampling [0, 1):  [ 0.19397511  0.43290111  0.91392302  0.98875048  0.81663076]
Uniform sampling, integers [0, 1):  [5 6 9 8 7]
Normal sampling (0, 1) :  [ 0.09471824 -0.21687796  0.27984128 -1.06039074 -1.65826281]


-----

### Basic Operations

NumPy arrays naturally support basic mathematical operations, including
addition, subtraction, multiplication, division, integer division, and
remainder allowing you to easily combine a scalar (a single number) with
a vector (a one-dimensional array), or a matrix (a multi-dimensional
array). In the next code block, we first create a one-dimensional array,
and subsequently operate on this array to demonstrate how to combine a
scalar with a vector.

-----

In [48]:
# Create and use a vector
a = np.arange(10)

print(a)
print("\n", (2.0 * a + 1)/3)
print("\n", a%2)
print("\n", a//2)

[0 1 2 3 4 5 6 7 8 9]

 [ 0.33333333  1.          1.66666667  2.33333333  3.          3.66666667
  4.33333333  5.          5.66666667  6.33333333]

 [0 1 0 1 0 1 0 1 0 1]

 [0 0 1 1 2 2 3 3 4 4]


-----

These same operations can be used to combine a scalar with a matrix. In
the next code block we create a two-dimensional array and use that array
to demonstrate how to operate on a scalar and a matrix.

-----

In [49]:
# Create a two-dimensional array

b = np.arange(9).reshape((3,3))

print("Matrix = \n", b)

print("\nMatrix + 10.5 =\n", (b + 10.5))

print("\nMatrix * 0.25 =\n", (b * 0.25))

print("\nMatrix % 2 =\n", (b % 2))

print("\nMatrix / 3.0 =\n", ((b - 4.0) / 3.))

Matrix = 
 [[0 1 2]
 [3 4 5]
 [6 7 8]]

Matrix + 10.5 =
 [[ 10.5  11.5  12.5]
 [ 13.5  14.5  15.5]
 [ 16.5  17.5  18.5]]

Matrix * 0.25 =
 [[ 0.    0.25  0.5 ]
 [ 0.75  1.    1.25]
 [ 1.5   1.75  2.  ]]

Matrix % 2 =
 [[0 1 0]
 [1 0 1]
 [0 1 0]]

Matrix / 3.0 =
 [[-1.33333333 -1.         -0.66666667]
 [-0.33333333  0.          0.33333333]
 [ 0.66666667  1.          1.33333333]]


----- 

We also can combine arrays as long as they have the same dimensionality.
In the next code block we create a one-dimensional and a two-dimensional
array and demonstrate how these two arrays can be combined.

-----

In [50]:
# Create two arrays

a = np.arange(1, 10)
b = (10. - a).reshape((3, 3))
print("Array = \n",a)
print("\nMatrix = \n",b)

print("\nArray[0:3] + Matrix Row 1 = ",a[:3] + b[0,:,])

print("\nArray[0:3] + Matrix[:0] = ", a[:3] + b[:,0])

print("\nArray[3:6] + Matrix[0:] = ", a[3:6] + b[0, :])

# Now combine scalar operations

print("\n3.0 * Array[3:6] + (10.5 + Matrix[0:]) = ", 3.0 * a[3:6] + (10.5 + b[0, :]))

Array = 
 [1 2 3 4 5 6 7 8 9]

Matrix = 
 [[ 9.  8.  7.]
 [ 6.  5.  4.]
 [ 3.  2.  1.]]

Array[0:3] + Matrix Row 1 =  [ 10.  10.  10.]

Array[0:3] + Matrix[:0] =  [ 10.   8.   6.]

Array[3:6] + Matrix[0:] =  [ 13.  13.  13.]

3.0 * Array[3:6] + (10.5 + Matrix[0:]) =  [ 31.5  33.5  35.5]


-----

### Summary Functions

NumPy provides convenience functions that can quickly summarize the
values of an array, which can be very useful for specific data
processing tasks. These functions include basic [statistical
measures][1] (`mean`, `median`, `var`, `std`, `min`, and `max`), the
total sum or product of all elements in the array (`sum`, `prod`), as
well as running sums or products for all elements in the array
(`cumsum`, `cumprod`). The last two functions actually produce arrays
that are of the same size as the input array, where each element is
replaced by the respective running sum/product up to and including the
current element. Another function, `trace`, calculates the trace of an
array, which simply sums up the diagonal elements in the
multi-dimensional array.

-----

[1]: http://docs.scipy.org/doc/numpy/reference/routines.statistics.html

In [51]:
# Demonstrate data processing convenience functions

# Make an array = [1, 2, 3, 4, 5]
a = np.arange(1, 6)

print("Mean value = {}".format(np.mean(a)))
print("Median value = {}".format(np.median(a)))
print("Variance = {}".format(np.var(a)))
print("Std. Deviation = {}\n".format(np.std(a)))

print("Minimum value = {}".format(np.min(a)))
print("Maximum value = {}\n".format(np.max(a)))

print("Sum of all values = {}".format(np.sum(a)))
print("Running cumulative sum of all values = {}\n".format(np.cumsum(a)))

print("Product of all values = {}".format(np.prod(a)))
print("Running product of all values = {}\n".format(np.cumprod(a)))

# Now compute trace of 5 x 5 diagonal matrix (= 5)
print(np.trace(np.eye(5)))


Mean value = 3.0
Median value = 3.0
Variance = 2.0
Std. Deviation = 1.4142135623730951

Minimum value = 1
Maximum value = 5

Sum of all values = 15
Running cumulative sum of all values = [ 1  3  6 10 15]

Product of all values = 120
Running product of all values = [  1   2   6  24 120]

5.0


-----

### Universal Functions

NumPy also includes methods that _universal functions_ or
[__ufuncs__][1] that are vectorized and thus operate on each element in
the array, without the need for a loop. You have already seen examples
of some of these functions at the start of this IPython Notebook when we
compared the speed and simplicity of NumPy versus normal Python for
numerical operations. These functions almost all include an optional
`out` parameter that allows a pre-defined NumPy array to be used to hold
the results of the calculation, which can often speed-up the processing
by eliminating the need for the creation and destruction of temporary
arrays. These functions will all still return the final array, even if
the `out` parameter is used. 

NumPy includes over sixty _ufuncs_ that come in several different
categories:

- Math operations, which can be called explicitly or simply implicitly
when the standard math operators are used on NumPy arrays. Example
functions in this category include `add`, `divide`, `power`, `sqrt`,
`log`, and `exp`.
- Trigonometric functions, which assume angles measured in radians.
Example functions include the `sin`, `cos`, `arctan`, `sinh`, and
`deg2rad` functions.
- Bit-twiddling functions, which manipulate integer arrays as if they
are bit patterns. Example functions include the `bitwise_and`,
`bitwise_or`, `invert`, and `right_shift`.
- Comparison functions, which can be called explicitly or implicitly
when using standard comparison operators that compare two arrays,
element-by-element, returning a new array of the same dimension. Example
functions include `greater`, `equal`, `logical_and`, and `maximum`.
- Floating functions, which compute floating point tests or operations,
element-by-element. Example functions include `isreal`, `isnan`,
`signbit`, and `fmod`.

Look at the official [NumPy _ufunc_][1] reference guide for more
information on any of these functions, for example, the [isnan][2]
function, since the user guide has a full breakdown of each function and
sample code demonstrating how to use the function. 

In the following code block, we demonstrate several of these _ufuncs_.

-----
[1]: http://docs.scipy.org/doc/numpy/reference/ufuncs.html
[2]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.isnan.html#numpy.isnan

In [52]:
b = np.arange(1, 10).reshape(3, 3)

print('original array:\n', b)

c = np.sin(b)

print('\nnp.sin : \n', c)

print('\nnp.log and np.abs : \n', np.log10(np.abs(c)))

print('\nnp.mod : \n', np.mod(b, 2))

print('\nnp.logical_and : \n', np.logical_and(np.mod(b, 2), True))



original array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

np.sin : 
 [[ 0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155 ]
 [ 0.6569866   0.98935825  0.41211849]]

np.log and np.abs : 
 [[-0.07496085 -0.04129404 -0.85041141]
 [-0.12101744 -0.01821569 -0.55374951]
 [-0.18244349 -0.00464642 -0.38497791]]

np.mod : 
 [[1 0 1]
 [0 1 0]
 [1 0 1]]

np.logical_and : 
 [[ True False  True]
 [False  True False]
 [ True False  True]]


In [53]:
# Demonstrate Boolean tests with operators

d = np.arange(9).reshape(3, 3)

print("Greater Than or Equal Test: \n", d >= 5)

# Now combine to form Boolean Matrix

np.logical_and(d > 3, d % 2)

Greater Than or Equal Test: 
 [[False False False]
 [False False  True]
 [ True  True  True]]


array([[False, False, False],
       [False, False,  True],
       [False,  True, False]], dtype=bool)

-----

### Masked Arrays

NumPy provides support for [masked arrays][1], where certain elements
can be _masked_ based on some criterion and ignored during subsequent
calculations (i.e., these elements are masked). Masked arrays are in the
`numpy.ma` package, and simply require a masked array to be created from
a given array and a condition that indicates which elements should be
masked. This new masked array can be used as a normal NumPy array,
except the masked elements are ignored. NumPy provides [operations][2]
for masked arrays, allowing them to be used in a similar manner as
normal NumPy arrays.

You can also impute missing (or bad) values by using a masked array, and
replacing masked elements with a different value, such as the mean
value. Masked arrays can also be used to mask out bad values in a
calculation such as divide-by-zero or logarithm of zero, and a masked
array will ignore error conditions during standard operations, making
them very useful since they operate in a graceful manner.

-----

[1]: http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html#examples
[2]: http://docs.scipy.org/doc/numpy/reference/routines.ma.html

In [54]:
# Create and demonstrate a masked array

import numpy.ma as ma

x = [0.,1.,-9999.,3.,4.]
print("Original array = :", x)


mx = ma.masked_values (x, -9999.)
print("\nMasked array = :", mx)


print("\nMean value of masked elements:", mx.mean())
print("\nOperate on unmaksed elements: ", mx - mx.mean())
print("\n Impute missing values (using mean): ", mx.filled(mx.mean())) # Imputation

Original array = : [0.0, 1.0, -9999.0, 3.0, 4.0]

Masked array = : [0.0 1.0 -- 3.0 4.0]

Mean value of masked elements: 2.0

Operate on unmaksed elements:  [-2.0 -1.0 -- 1.0 2.0]

 Impute missing values (using mean):  [ 0.  1.  2.  3.  4.]


In [55]:
# Create two arrays with masks
x = ma.array([1., -1., 3., 4., 5., 6.], mask=[0,0,0,0,1,0])
y = ma.array([1., 2., 0., 4., 5., 6.], mask=[0,0,0,0,0,1])

# Now take square root, ignores div by zero and masked elements.
print(np.sqrt(x/y))

[1.0 -- -- 1.0 -- --]




In [56]:
# Now try some random data

d = np.random.rand(1000)

# Now mask for values within some specified range (0.1 to 0.9)
print("Masked array mean value: ", ma.masked_outside(d, 0.1, 0.9).mean())

Masked array mean value:  0.498705593689


-----

### NumPy File Input/Output

NumPy has support for reading or writing data to [files][1]. Of these,
two of the most useful are the [`loadtxt` method][3] and the
[`genfromext` method][2], each of which allow you to easily read data
from a text file into a NumPy array. The primary difference is that the
`genfromtxt` method can handle missing data, while the `loadtxt` can
not. Both methods allow you to specify the column delimiter, easily
skip header or footer rows, specify which columns should be extracted
for each row, and allow you to _unpack_ the row so that each column goes
into a separate array.

For example, the following code snippet demonstrates how to use the
`loadtxt` method to pull out the second and fourth columns from the
`fin` file handle, where the file is assumed to be in CSV format/ The
data is persisted into the `a` and `b` NumPy arrays.

```python
a, b = np.loadtxt(fin, delimeter = ',', usecols=(1, 3), unpack=True)
```

We demonstrate the `genfromtxt` method in the following code block, where
we first create the test data file, before reading that data back into a
NumPy array.

-----

[1]: http://docs.scipy.org/doc/numpy/user/basics.io.html
[2]: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#defining-the-input
[3]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt

In [57]:
# First write data to a file using Unix commands. 
info = "1, 2, 3, 4, 5 \n 6, 7, 8, 9, 10"
with open("test.csv", 'w') as fout:
    print(info, file=fout)

# Now we can read it back into a NumPy array.
d = np.genfromtxt("test.csv", delimiter=",")

print("New Array = \n", d)

New Array = 
 [[  1.   2.   3.   4.   5.]
 [  6.   7.   8.   9.  10.]]


-----

In addition to the normal Python `help` function, NumPy provides a
special `lookup` function that will search the NumPy library for
classes, types, or methods that match the search string passed to the
function. This can be useful for finding specific information given a
general concept, or to learn more about related topics by performing a
search.

-----

In [58]:
np.lookfor('masked array')

Search results for 'masked array'
---------------------------------
numpy.ma.array
    An array class with possibly masked values.
numpy.ma.asarray
    Convert the input to a masked array of the given data-type.
numpy.mafromtxt
    Load ASCII data stored in a text file and return a masked array.
numpy.ma.put
    Set storage-indexed locations to corresponding values.
numpy.ma.asanyarray
    Convert the input to a masked array, conserving subclasses.
numpy.ma.diag
    Extract a diagonal or construct a diagonal array.
numpy.ma.dump
    Pickle a masked array to a file.
numpy.ma.isMA
    Test whether input is an instance of MaskedArray.
numpy.ma.masked_all
    Empty masked array with all elements masked.
numpy.ma.count
    Count the non-masked elements of the array along the given axis.
numpy.ma.dumps
    Return a string corresponding to the pickling of a masked array.
numpy.ma.masked_less
    Mask an array where less than a given value.
numpy.ma.mvoid
    Fake a 'void' object to use for ma

### Additional References

1. [Numpy Tutorial][1]
2. [Numpy Cheatsheet][2]
3. [Numpy Demonstration][3]
4. [NumPy Notebook Demo][4]
-----

[1]: http://docs.scipy.org/doc/numpy/user/index.html
[2]: http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html
[3]: http://www.tp.umu.se/~nylen/pylect/intro/numpy/numpy.html
[4]: http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb


### Return to the [Course Index](index.ipynb).

-----