<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
	* [Some Simple Setup](#Some-Simple-Setup)
* [What is NumPy?](#What-is-NumPy?)
	* [NumPy arrays have:](#NumPy-arrays-have:)
	* [NumPy's Uses and Capabilities](#NumPy's-Uses-and-Capabilities)
	* [NumPy Ecosystem](#NumPy-Ecosystem)
* [NumPy Arrays](#NumPy-Arrays)
	* [Array Shape](#Array-Shape)
	* [Array Type](#Array-Type)
* [Array Creation](#Array-Creation)
	* [`np.zeros` and `np.ones`](#np.zeros-and-np.ones)
	* [`np.empty`](#np.empty)
	* [`np.arange`](#np.arange)
	* [`np.linspace` and `np.logspace`](#np.linspace-and-np.logspace)
	* [Diagonal arrays:  `np.eye` and `np.diag`](#Diagonal-arrays:--np.eye-and-np.diag)
	* [Arrays from Random Distributions](#Arrays-from-Random-Distributions)
		* [Uniform on [0,1)](#Uniform-on-[0,1%29)
		* [Standard Normal](#Standard-Normal)
		* [Uniform Integers](#Uniform-Integers)
	* [Arrays From a Python List](#Arrays-From-a-Python-List)
	* [From row and column stacks:  `np.r_` and `np.c_`](#From-row-and-column-stacks:--np.r_-and-np.c_)
* [Array Shape and Reshaping](#Array-Shape-and-Reshaping)

# Learning Objectives:

After completion of this module, learners should be able to:

* explain relevant distinctions between (`numpy`) ndarrays & lists in Python
* create numerical arrays with specified attributes (e.g., shape, data type, etc.)
* use & describe dtype attributes associated with `numpy` ndarrays

## Some Simple Setup

We're going to run a few quick commands in IPython to shorten a few names and to make some nice graphics interaction (in this Jupyter notebook).

If you do not have matplotlib installed in your conda environment run
```
% conda install -y matplotlib
```

In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

# What is NumPy?

Numpy is a Python library that provides multi-dimensional arrays, matrices, and fast operations on these data structures.

## NumPy arrays have:

* fixed size 
  * all elements have the same type
      * that type may be compound and/or user-defined
  * fast operations from:
      * vectorization — implicit looping
      * pre-compiled C code using high-quality libraries 
          * NumPy default
          * BLAS/ATLAS
          * Intel's MKL

## NumPy's Uses and Capabilities

- Image and signal processing
- Linear algebra
- Data transformation and query
- Time series analysis
- Statistical analysis

## NumPy Ecosystem

<center>
![](img/ecosystem.lightbg.scaled-noalpha.png)
</center>

# NumPy Arrays

NumPy arrays (`numpy.ndarray`) are the fundamental data type in NumPy.  They have:
    
  * shape
  * an element type called *dtype*
  
For example:

In [2]:
import numpy as np
M, N = 6, 5
arr = np.zeros(shape=(M,N), dtype=float)
arr

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [3]:
arrzeros = np.zeros(30)
print(arrzeros.dtype, arrzeros.shape)
arrzeros

float64 (30,)


array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.])

In [4]:
arrzeros2 = np.zeros((30,1))
print(arrzeros2.dtype, arrzeros2.shape)
arrzeros2

float64 (30, 1)


array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])

is the NumPy array corresponding to the two-dimensional matrix:

![default](img/numpyzerosdims-noalpha.png)

NumPy has both a general N-dimensional array *and* a specific 2-dimensional matrix data type.  NumPy arrays may have an arbitrary number of dimensions.

NumPy arrays support vectorized mathematical operation.

In [7]:
arr = np.arange(15).reshape(3,5)
print("original:")
print(arr)

print()

print("elementwise computed:")
print(((arr+4)*2) % 30)
print(arr.dtype)

original:
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

elementwise computed:
[[ 8 10 12 14 16]
 [18 20 22 24 26]
 [28  0  2  4  6]]
int32


In [8]:
((arr.reshape(15,)+4)*2) % 30

array([ 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28,  0,  2,  4,  6], dtype=int32)

In [12]:
((arr.reshape(5,3)+4)*2) % 30

array([[ 8, 10, 12],
       [14, 16, 18],
       [20, 22, 24],
       [26, 28,  0],
       [ 2,  4,  6]], dtype=int32)

In [11]:
arr.reshape(4, 7)

ValueError: total size of new array must be unchanged

We are generally going to visually inspect the `str` representation of array.  Here's how the `str` and `repr` differ:

In [None]:
arr = np.arange(10).reshape(2,5)
print(arr)
print(repr(arr))

## Array Shape

Many array creation functions take a shape parameter.  For a 1D array, the shape can be an integer.

In [15]:
print(np.zeros(shape=5, dtype=np.float16))

[ 0.  0.  0.  0.  0.]


For nD arrays, the shape needs to be given as a tuple.

In [16]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [21]:
print(np.zeros(shape=(4,3,2), dtype=float), end=vsep)
print(repr(np.zeros(shape=(4,3,2), dtype=float)))

[[[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]

 [[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]

 [[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]

 [[ 0.  0.]
  [ 0.  0.]
  [ 0.  0.]]]
-------------------
array([[[ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.]]])


In [None]:
print(np.zeros(shape=(2,2,2,2)))

In the above example, the outermost dimension (the first item in the tuple passed to `shape`, 4) is the dimension that varies the most slowly. The innermost dimension (the last item in the tuple, 2) is the dimension that varies the most quickly. Visually, this means that for 3D arrays, we might describe the shape as: (panels, rows, columns). Contrast this with the 2D case, where we usually discuss shape as: (rows, columns).

We can always ask an array its shape by accessing the `shape` attribute (e.g., `arr.shape`).

In [None]:
arr = np.ones(shape=(4,3,2), dtype=float)
print("shape is:", arr.shape)

In [None]:
arr.resize((4,6))
arr

In [22]:
# Take a look at the varying type promotion rules in np.concatenate
# (the order of the concatenated arrays matters!)
arr1 = np.ones(shape=(4,5), dtype=np.longdouble)
arr2 = np.zeros(shape=(4,3), dtype=np.int8)
print(arr1,end=vsep)
print(arr2,end=vsep)
newarr = np.concatenate((arr1,arr2), axis=1)
print(newarr)
newarr.dtype

[[ 1.0  1.0  1.0  1.0  1.0]
 [ 1.0  1.0  1.0  1.0  1.0]
 [ 1.0  1.0  1.0  1.0  1.0]
 [ 1.0  1.0  1.0  1.0  1.0]]
-------------------
[[0 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 0]]
-------------------
[[ 1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0]
 [ 1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0]
 [ 1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0]
 [ 1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0]]


dtype('float64')

In [None]:
help(newarr.resize)

## 

In [38]:
arr3 = np.arange(1,int(1e9),dtype=np.int8)
arr3.nbytes / 1e9

0.999999999

Arrays can be transposed on-the-fly by using `.T`

In [23]:
print(arr)
print(arr.shape,end=vsep)
print(arr.T)
print(arr.T.shape)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)
-------------------
[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]
(5, 3)


## Array Type

All arrays have a specific type for their associated elements.  Every element in the array shares that type.  The NumPy terminology for this type is *dtype*.  The basic types are *bool*, *int*, *uint*, *float*, and *complex*.  The types may be modified by a number indicating their size in bits.  Python's built-in types can be used as a corresponding dtype.  Note, the generic NumPy types end with an underscore ("\_") to differentiate the name from the Python built-in.

|Python Type |NumPy dtype|
|------------|-----------|
|bool	     |np.bool_   |
|int         |np.int_    |
|float       |np.float_  |
|complex     |np.complex_|

In [39]:
np.float16('nan')

nan

Here is one example of specifying a *dtype*:

In [43]:
arr = np.zeros(shape=(5,), dtype=np.float_) # NumPy default sized float
print(arr, "->", arr.dtype)

[ 0.  0.  0.  0.  0.] -> float64


In [49]:
# Can get truncation of data types if large numbers forced into smaller types
array = np.array([4192984799048971232, 3, 4], dtype=np.int64)
array

array([4192984799048971232,                   3,                   4], dtype=int64)

In [44]:
np.array([4192984799048971232000, 3, 4], dtype=np.int16)

OverflowError: Python int too large to convert to C long

OverflowError: cannot fit 'int' into an index-sized integer

In [None]:
np.array([1], dtype=int).dtype.itemsize * 8

In [None]:
# Can check the bit length of numbers
nums = [4192984799048971232000, 3, 4]
max(i.bit_length() for i in nums), min(i.bit_length() for i in nums)

In [52]:
def chop_bits(nums, bitlen=np.dtype(int).itemsize * 8):
    mask = (1<<bitlen) - 1
    return (num & mask for num in nums)

list(chop_bits([4192984799048971232000, 4192984799048971232, 3, 4])), \
list(chop_bits([4192984799048971232000, 4192984799048971232, 3, 4], 32)), \
list(chop_bits([4192984799048971232000, 4192984799048971232, 3, 4], 24)), \
list(chop_bits([4192984799048971232000, 4192984799048971232, 3, 4], 16))

([1989321472, 2548904928, 3, 4],
 [1989321472, 2548904928, 3, 4],
 [9609984, 15545312, 3, 4],
 [41728, 13280, 3, 4])

And we can make a clever loop over some possible *dtypes* and pass them through eval:

In [None]:
for dt in ["float", "np.float_", "np.float64", 
           "np.uint8", "np.int", "int"]:
    arr = np.zeros(shape=(5,), dtype=eval(dt)) 
    print("%10s: %s -> %s" % (dt, arr, arr.dtype))

Now we'll take just a moment to define one quick helper function to show us these details in a pretty format.

In [54]:
arr2 = np.arange(1,int(1e7),dtype=np.int64).sum()
arr2

49999995000000

In [56]:
def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)
    
vsep = "\n---------------------\n"

See also: http://docs.scipy.org/doc/numpy/user/basics.types.html

# Array Creation

NumPy provides a number of ways to create an array.

## `np.zeros` and `np.ones`

In [57]:
zrr = np.zeros(shape=(2,3))
dump_array(zrr)

(2, 3) array of float64:
[[ 0.  0.  0.]
 [ 0.  0.  0.]]


In [58]:
print(np.ones(shape=(2,5)))

[[ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]]


In [59]:
one_arr = np.ones(shape = (2,2), dtype=int)
dump_array(one_arr)

(2, 2) array of int32:
[[1 1]
 [1 1]]


## `np.empty`

`np.empty` is lightning quick because it simply requests some amount of memory from the operating system and then *does nothing with it*.  Thus, the array returned by `np.empty` is *uninitialized*.  Consider yourself warned.  `np.empty` is very useful if you know you are going to fill up all the (used) elements of your array later.

In [67]:
# DANGER!  uninitialized array 
# (re-run this cell and you will very likely see different values)
err = np.empty(shape=(20,30), dtype=int)
dump_array(err)

(20, 30) array of int32:
[[ 108893520          0  108344992          0   50331648          0
    63513317          0   63513309          0   63513301          0
    59965442          0          0          0          0          0
           0          0   60031405          0          0          0
           0          0          0          0   50397823          0]
 [1868408976          0          6          0   96067720          0
    63513198          0   63513190          0   63513182          0
    63513174          0   63513166          0   63513158          0
    63513151          0   63513143          0   63513135          0
    63513127          0   63513119          0   63513111          0]
 [  63513103          0   63513095          0   63513087          0
    63513079          0   63513071          0   63513063          0
    63513055          0   63513047          0   63513039          0
    63513031          0   63513023          0   63513015          0
    63513007         

## `np.arange`

`np.arange` generates sequences of numbers like Python's `range` built-in.  Non-integer step values may lead to unexpected results; for these cases, you may prefer `np.linspace` and see below. (For a quick — and mostly practical — discussion of the perils of floating-point approximations, see https://docs.python.org/2/tutorial/floatingpoint.html).

  * a single value is a stopping point
  * two values are a starting point and a stopping point
  * three values are a start, a stop, and a step size

As with `range`, the ending point is *not* included.

In [68]:
print("int arg: %s" % np.arange(10), end=vsep)     # cf. range(stop)
print("float arg: %s" % np.arange(10.0), end=vsep) # cf. range(stop)
print("step: %s" % np.arange(0, 12, 2), end=vsep)  # end point excluded
print("neg. step: %s" % np.arange(10, 0, -1.0))

int arg: [0 1 2 3 4 5 6 7 8 9]
---------------------
float arg: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
---------------------
step: [ 0  2  4  6  8 10]
---------------------
neg. step: [ 10.   9.   8.   7.   6.   5.   4.   3.   2.   1.]


## `np.linspace` and `np.logspace`

`np.linspace(BEGIN, END, NUMPT)` generates exactly *NUMPT* number of points, evenly spaced, on $[BEGIN, END]$.  Unlike Python's `range` and `np.arange`, these functions are inclusive at BEGIN and END (they produce a closed interval).

In [71]:
print("End-points are included:", end=vsep)
print(np.linspace(0, 10, 2), end=vsep)
print(np.linspace(0, 10, 3), end=vsep)
print(np.linspace(0, 10, 4), end=vsep)
print(np.linspace(0, 10, 20), end=vsep)
print(vsep)

End-points are included:
---------------------
[  0.  10.]
---------------------
[  0.   5.  10.]
---------------------
[  0.           3.33333333   6.66666667  10.        ]
---------------------
[  0.           0.52631579   1.05263158   1.57894737   2.10526316
   2.63157895   3.15789474   3.68421053   4.21052632   4.73684211
   5.26315789   5.78947368   6.31578947   6.84210526   7.36842105
   7.89473684   8.42105263   8.94736842   9.47368421  10.        ]
---------------------

---------------------



For `np.logspace(BEGIN, END, NUMPTS)`, the array is closed on $[10^{BEGIN}, 10^{END}]$ with NUMPT points spread evenly on a logarithmic scale.

In [72]:
# (10^BEGIN, 10^END, NUMPT)
import math
print(np.logspace(0, 3, 4), end=vsep)
print(np.logspace(0, 3, 4, base=math.e))

[    1.    10.   100.  1000.]
---------------------
[  1.           2.71828183   7.3890561   20.08553692]


## Diagonal arrays:  `np.eye` and `np.diag`

In [99]:
np.logspace(0,11,10,2,'int')

TypeError: ufunc 'power' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

`np.eye(N)` produces an array with shape (N,N) and ones on the diagonal (an NxN identity matrix).

In [None]:
print(np.eye(3))

`np.diag` produces a diagonal 2D array from an array argument.

In [None]:
print(np.diag(np.arange(1,4)), end=vsep)
print(np.diag([3,2,1]))

It can also be used to extract a diagonal from a given array.

In [None]:
# the diagonal of an identity matrix ...
np.diag(np.eye(3))

## Arrays from Random Distributions

It is common to create arrays whose elements are samples from a random distribution.  For the many options, see:

  * help(np.random) 
  * http://docs.scipy.org/doc/numpy/reference/routines.random.html

In [104]:
npr = np.randint

AttributeError: module 'numpy' has no attribute 'randint'

### Uniform on [0,1)

In [106]:
print("Uniform on [0,1):")
dump_array(npr.random((2,5)))

Uniform on [0,1):
(2, 5) array of float64:
[[ 0.21284435  0.70265133  0.99987525  0.45312778  0.40897225]
 [ 0.38686708  0.86633104  0.86861255  0.55173294  0.0449129 ]]


### Standard Normal

`np.random` has some redundancy.  It also has some variation in calling conventions.

  * `standard_normal` takes *one* tuple argument
  * `randn` (which is very common to see in code) takes n arguments where n is the number of dimensions in the result

In [107]:
print("std. normal - N(0,1):")
dump_array(npr.standard_normal((2,5)))
print(vsep)
dump_array(npr.randn(2,5)) # one tuple parameter

std. normal - N(0,1):
(2, 5) array of float64:
[[-0.89962375  0.4869672  -0.80702097  0.58116866 -0.46055054]
 [-1.03528457  1.39516846 -0.62893603 -0.22774471  1.35913641]]

---------------------

(2, 5) array of float64:
[[ 1.80583223 -0.18128489  1.11706915 -0.73600113  0.20148372]
 [ 0.32189231  0.07744157  1.36002637 -0.17535764 -1.15483561]]


### Uniform Integers

There are also some differences on discrete distributions:
    
  * `randint` excludes its upper bound
  * `random_integers` includes its upper bound

In [108]:
print("Uniform ints on [0,5) - upper open:")
dump_array(npr.randint(0, 5, (2,5)))

Uniform ints on [0,5) - upper open:
(2, 5) array of int32:
[[1 2 1 4 2]
 [2 1 2 3 1]]


In [109]:
print("\nUniform ints on [0,5] - upper closed:")
dump_array(npr.random_integers(0, 5, (2,5)))


Uniform ints on [0,5] - upper closed:
(2, 5) array of int32:
[[4 3 4 3 3]
 [2 1 1 2 5]]


## Arrays From a Python List

Note, it is also possible to create NumPy arrays from Python lists and tuples.  While this is a nice capability, remember that instantiating a Python list can take relatively long compared to directly using NumPy building blocks.  Other containers and iterables will not, generally, give useful results.

In [112]:
dump_array(np.array([1, 2, 3]))

print()
dump_array(np.array([10.0, 20.0, 3+0j]))

(3,) array of int32:
[1 2 3]

(3,) array of complex128:
[ 10.+0.j  20.+0.j   3.+0.j]


Dimensionality is maintained within nested lists:

In [111]:
dump_array(np.array([[1, 2, 3], 
                    [4, 5, 6]]))

print()
dump_array(np.array([[1.0, 2],
                    [3, 4],
                    [5, 6]]))

(2, 3) array of int32:
[[1 2 3]
 [4 5 6]]

(3, 2) array of float64:
[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]]


## From row and column stacks:  `np.r_` and `np.c_`

`np.r_` and `np.c_` are special objects, that when sliced (think `obj[some, :, indexers]`), return NumPy `array`s constructed from the contents of the slices. These tools can be used to abbreviate some array constructions.

In [None]:
np.c_[np.arange(8), np.arange(8)*2, np.arange(8)*3]

In [None]:
np.r_[np.arange(8), np.arange(8)*2, np.arange(8)*3]

# Array Shape and Reshaping

We've used the following to get arrays of different shapes:

In [None]:
arr = np.arange(24).reshape((3,4,2))
print(arr.shape)

Let's investigate shapes in a bit more detail.  When modifying shapes, `-1` can be used as a wildcard that will fill in the shape for one dimension, based on the others.

In [None]:
arr = np.arange(1,11)
dump_array(arr)

print(vsep)
# shape of 5 x ? -> 5 x 2
reshaped = arr.reshape(5, -1)
dump_array(reshaped)

In [None]:
print(arr.reshape(-1, 5))

In [None]:
print(arr.reshape(2,5))

Some other shaping utilities include:  

  * `np.ravel`
  * `np.flatten`
  * `np.squeeze`

`np.flatten`, like its name implies, makes a 1-D version of the array.  `np.ravel` behaves similarly, but it will try to avoid making a copy.

In [None]:
arr = np.arange(10).reshape(5,2)
print("arr:")
dump_array(arr)

flat = arr.flatten()
print("\nflattened (my own data? %s):" % flat.flags.owndata)
dump_array(flat)

rav = arr.ravel()
print("\nraveled (my own data? %s)" % rav.flags.owndata)
dump_array(rav)

`np.squeeze` will remove any dimensions that have length 1.  Occasionally, you might get a length 1 dimension by selection.

In [None]:
arr = np.arange(10).reshape(5,1,2,1)
print("arr:")
print(arr)

print("\nwhittled down:")
print(arr.squeeze())

<img src='img/copyright.png'>