<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
* [Set-up](#Set-up)
* [Axis](#Axis)
* [Tile](#Tile)
* [Rows and Columns](#Rows-and-Columns)
	* [Some Simple Setup](#Some-Simple-Setup)
* [Broadcasting](#Broadcasting)
	* [What are the rules for broadcasting? ](#What-are-the-rules-for-broadcasting?)
	* [Some Simple Setup](#Some-Simple-Setup)
* [Array Joining and Splitting](#Array-Joining-and-Splitting)
	* [Some Simple Setup](#Some-Simple-Setup)
* [Array Meta-data and *dtype*](#Array-Meta-data-and-*dtype*)

# Learning Objectives:

After completion of this module, learners should be able to:

* create, manipulate, and examine numerical arrays with specified attributes (axes)
* use and explain *broadcasting* in numpy
* create, manipulate, and examine numerical arrays with specified attributes (shape, join, split)
* understanding of metadata and dtype in Numpy arrays

# Set-up

In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

# Axis

In [14]:
np.random.seed(1981)
arr = np.random.randint(0,10,20).reshape((5,4))
arr

array([[2, 9, 7, 2],
       [5, 6, 0, 0],
       [5, 8, 9, 2],
       [5, 5, 3, 1],
       [7, 8, 3, 7]])

In [4]:
arr.sum(axis=None)

94

In [6]:
arr.sum(axis=0)

array([24, 36, 22, 12])

In [7]:
arr.sum(axis=1)

array([20, 11, 24, 14, 25])

In [23]:
arr.reshape(-1,2)

array([[2, 9],
       [7, 2],
       [5, 6],
       [0, 0],
       [5, 8],
       [9, 2],
       [5, 5],
       [3, 1],
       [7, 8],
       [3, 7]])

In [39]:
arr.reshape(-1,2).sum(axis=1).reshape(2,-1)

array([[11,  9, 11,  0, 13],
       [11, 10,  4, 15, 10]])

Methods that reduce the information in (or summarize) an array (such as *sum*) take an optional parameter called *axis* which specifies the dimension over which to perform a reduction.

* *axis=None*, the default, reduces overall dimensions
* *axis=0* reduces over the outermost/zeroth dimension
  * if we think about this dimension as the rows, we can imagine that it produces a new row
* *axis=1* reduces over the first dimension
  * if we think about this dimension as the columns, we can imagine that it produces a new column
    
![](img/axis.none.lightbg.scaled-noalpha.png)

![](img/axis.0.lightbg.scaled-noalpha.png)

![](files/img/axis.1.lightbg.scaled-noalpha.png)

In [96]:
a = np.random.randint(1,10,100).reshape(10,10)
a

array([[3, 2, 5, 9, 5, 2, 3, 4, 3, 7],
       [8, 4, 2, 9, 4, 8, 7, 6, 3, 2],
       [6, 2, 7, 3, 4, 8, 8, 1, 6, 2],
       [1, 2, 3, 8, 5, 6, 6, 8, 5, 5],
       [6, 6, 9, 2, 6, 5, 5, 3, 4, 4],
       [6, 4, 8, 8, 9, 2, 2, 3, 6, 9],
       [5, 2, 5, 4, 7, 5, 6, 1, 8, 9],
       [2, 9, 6, 1, 8, 9, 9, 7, 7, 5],
       [8, 7, 9, 5, 9, 2, 1, 7, 6, 9],
       [6, 1, 2, 6, 6, 6, 3, 2, 2, 3]])

In [97]:
a.reshape(10,10).T

array([[3, 8, 6, 1, 6, 6, 5, 2, 8, 6],
       [2, 4, 2, 2, 6, 4, 2, 9, 7, 1],
       [5, 2, 7, 3, 9, 8, 5, 6, 9, 2],
       [9, 9, 3, 8, 2, 8, 4, 1, 5, 6],
       [5, 4, 4, 5, 6, 9, 7, 8, 9, 6],
       [2, 8, 8, 6, 5, 2, 5, 9, 2, 6],
       [3, 7, 8, 6, 5, 2, 6, 9, 1, 3],
       [4, 6, 1, 8, 3, 3, 1, 7, 7, 2],
       [3, 3, 6, 5, 4, 6, 8, 7, 6, 2],
       [7, 2, 2, 5, 4, 9, 9, 5, 9, 3]])

In [152]:
def my_pyramid(my_number):
   a = np.random.randint(1,10,(my_number**2)).reshape(my_number,my_number)
   print(a)
   a.reshape(-1,2).mean(axis=1).reshape(my_number,-1).T.reshape(-1,2).mean(axis=1).reshape(my_number/2,my_number/2).T
   return a

In [176]:
my_number =int(1e3)
a = np.random.randint(1,10,(my_number**2)).reshape(my_number,my_number)
%timeit a.reshape(-1,2).mean(axis=1).reshape(my_number,-1).T.reshape(-1,2).mean(axis=1).reshape(my_number//2,my_number//2).T



10 loops, best of 3: 37.6 ms per loop


In [166]:
ba = np.arange(6).reshape((3,2))
a

array([[5, 3, 5, 2, 9, 8, 5, 2, 6, 9, 9, 1, 8, 5, 9, 6],
       [7, 4, 9, 6, 5, 1, 9, 6, 1, 6, 7, 6, 3, 3, 8, 8],
       [4, 4, 2, 3, 7, 7, 2, 2, 4, 3, 5, 7, 1, 2, 7, 2],
       [6, 5, 4, 5, 9, 4, 9, 8, 7, 2, 2, 1, 5, 9, 5, 8],
       [9, 7, 6, 8, 9, 4, 7, 7, 3, 3, 2, 6, 4, 7, 2, 3],
       [9, 6, 3, 3, 1, 2, 2, 8, 4, 2, 5, 9, 6, 2, 6, 1],
       [4, 7, 9, 8, 4, 9, 7, 3, 6, 3, 2, 2, 4, 3, 4, 6],
       [4, 8, 9, 1, 4, 9, 8, 9, 6, 7, 7, 8, 1, 7, 9, 8],
       [2, 9, 1, 4, 9, 7, 4, 6, 1, 7, 3, 5, 9, 1, 5, 5],
       [1, 3, 7, 3, 8, 6, 9, 3, 9, 8, 8, 9, 6, 6, 4, 7],
       [6, 8, 4, 1, 3, 9, 8, 8, 3, 2, 1, 3, 1, 9, 3, 7],
       [4, 2, 4, 4, 1, 9, 8, 2, 6, 3, 3, 9, 7, 1, 4, 5],
       [4, 5, 9, 2, 4, 6, 3, 8, 4, 1, 5, 4, 8, 4, 6, 7],
       [4, 4, 3, 8, 4, 5, 3, 9, 9, 4, 2, 6, 3, 5, 6, 3],
       [7, 3, 8, 7, 4, 5, 7, 7, 6, 7, 4, 5, 4, 5, 4, 5],
       [9, 4, 7, 7, 8, 2, 4, 3, 6, 9, 4, 2, 3, 3, 2, 4]])

In [136]:
d

array([[6, 2, 4, 1, 2, 8, 4, 5, 6, 3],
       [4, 3, 8, 8, 2, 4, 8, 2, 7, 6],
       [3, 7, 8, 4, 2, 1, 8, 9, 9, 4],
       [8, 9, 1, 6, 2, 2, 9, 4, 1, 5],
       [6, 6, 8, 7, 9, 8, 8, 2, 5, 9],
       [6, 1, 8, 6, 9, 8, 6, 5, 4, 5],
       [3, 2, 3, 4, 2, 6, 4, 8, 6, 4],
       [8, 8, 3, 6, 1, 8, 3, 1, 4, 4],
       [8, 8, 3, 3, 6, 8, 6, 5, 8, 8],
       [9, 4, 5, 8, 6, 6, 4, 8, 1, 2]])

In [64]:
b = np.array([10,20,30])

In [73]:
for i,v in enumerate(b): #harder, slower
    print(a[i]*v)

[ 0 10]
[40 60]
[120 150]


In [71]:
a*b.reshape([3,1]) #easier

array([[  0,  10],
       [ 40,  60],
       [120, 150]])

In [74]:
def my_broadcast(a,b):
    c = np.empty(a.shape)
    for i,v in enumerate(b):
        c[i] = a[i]*v
    return c

In [75]:
a * b.reshape((3,1)) == my_broadcast(a,b)

array([[ True,  True],
       [ True,  True],
       [ True,  True]], dtype=bool)

In [76]:
big_a = np.random.random(size=(int(1e3),3))
big_b = np.random.randint(0,100,int(1e3))

In [77]:
big_a.shape

(1000, 3)

In [78]:
big_b.shape

(1000,)

In [88]:
%timeit my_broadcast(big_a,big_b)

100 loops, best of 3: 3.14 ms per loop


In [87]:
%timeit big_a*big_b[:,np.newaxis]

The slowest run took 11.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 12.8 µs per loop


In [82]:
%timeit big_a*big_b.reshape((-1,1))

The slowest run took 6.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 12.9 µs per loop


In [85]:
c_slow = my_broadcast(big_a,big_b)

In [86]:
c_new = big_a * big_b[:,np.newaxis]

In [90]:
np.allclose(c_slow,c_new,1e-5) #comapre two identical arrays within the speciifed tolerance

True

In [95]:
a


array([[0, 1],
       [2, 3],
       [4, 5]])

In [8]:
arr = np.arange(15).reshape(3,5)
dump_array(arr)

print(vsep)
grandSum = arr.sum()
colSums  = arr.sum(axis=0)
rowSums  = arr.sum(axis=1)

print("grandSum:", grandSum)
print("colSums (a new pseudo-row):", colSums, colSums.shape)
print("rowSums (a new pseudo-col):", rowSums, rowSums.shape)

(3, 5) array of int32:
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

-------------------

grandSum: 105
colSums (a new pseudo-row): [15 18 21 24 27] (5,)
rowSums (a new pseudo-col): [10 35 60] (3,)


# Tile

One other array creator, `np.tile`, rewards careful examination of some examples.  We'll use the following `arr` as our base array.

In [9]:
arr = np.arange(1,5).reshape(2,2)
dump_array(arr)

(2, 2) array of int32:
[[1 2]
 [3 4]]


Tiling over one axis appends "tiles" of data on the ends of the rows.

In [10]:
print(np.tile(arr,1), end=vsep)
print(np.tile(arr,2), end=vsep)
print(np.tile(arr,4))

[[1 2]
 [3 4]]
-------------------
[[1 2 1 2]
 [3 4 3 4]]
-------------------
[[1 2 1 2 1 2 1 2]
 [3 4 3 4 3 4 3 4]]


In [12]:
mytile = np.tile(arr, 4)
mytile.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Tiling over multipled dimensions creates tiles (of the original) in the specified shape.

In [None]:
print("with a (2,2) inner")
print("... and a (2,1) outer")
print(np.tile(arr, (2,1))) # 2x1 tileing of a 2x2 array
print("... and a (1,2) outer")
print(np.tile(arr, (1,2))) # 1x2 tileing of a 2x2 array
print("... and a (2,2) outer")
print(np.tile(arr, (2,2))) # 2x2 tileing of a 2x2 array

Things get slightly different in more dimensions, especially when the dimensions of the tile and the tiling differ.  The tile is first expanded to the same number of dimensions as the tiling.

In [None]:
# original 2x2 promoted to 1x2x2 ... then used for 3x1x1 tiling
print(np.tile(arr, (3,1,1)))

In [None]:
print(np.tile(arr, (1,1,3)))

# Rows and Columns

As mentioned earlier, in greater than two dimensions, you need to be very careful thinking in terms of "rows and columns".  Specifically, in the string representation of a 3-D array, the outermost dimension is no longer the visual rows; it's the different panels. Thus, for sums over different axes, we might talk about:

|sum(axis=?)|over which dim?|visual element|added elements|
|-----------|---------------|--------------|--------------|
|axis=0|outer-most|across panels|[1+1+1, 2+2+2, ...]|
|axis=1|middle    |across colums|[1+4+7, 2+5+8, ...]|
|axis=2|inner-most|across rows|[1+2+3, 4+5+6, ...]|

In [None]:
arr = np.tile(np.arange(1,10).reshape(3,3), (3,1,1))
print(arr)

In [None]:
print("axis=0")
print(arr.sum(axis=0), end=vsep)

print("axis=1")
print(arr.sum(axis=1), end=vsep)

print("axis=2")
print(arr.sum(axis=2), end=vsep)

print("shapes: ", arr.sum(axis=0).shape, 
                  arr.sum(axis=1).shape, 
                  arr.sum(axis=2).shape)

## Some Simple Setup

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

# Broadcasting

Broadcasting lets arrays with *different but compatible* shapes be arguments to *ufuncs*.

In [None]:
arr1 = np.arange(5)
print("arr1:\n", arr1, end=vsep)

print("arr1 + scalar:\n", arr1+10, end=vsep)

print("arr1 + arr1 (same shape):\n", arr1+arr1, end=vsep)

arr2 = np.arange(5).reshape(5,1) * 10
arr3 = np.arange(5).reshape(1,5) * 100
print("arr2:\n", arr2)
print("arr3:\n", arr3, end=vsep)

print("arr1 + arr2 [ %s + %s --> %s ]:" % 
      (arr1.shape, arr2.shape, (arr1 + arr2).shape))
print(arr1+arr2, end=vsep)
print("arr1 + arr3 [ %s + %s --> %s ]:" % 
      (arr1.shape, arr3.shape, (arr1 + arr3).shape))
print(arr1+arr3)

In [None]:
arr1 = np.arange(6).reshape(3,2)
arr2 = np.arange(10, 40, 10).reshape(3,1)

print("arr1:")
dump_array(arr1)
print("\narr2:")
dump_array(arr2)
print("\narr1 + arr2:")
print(arr1+arr2)

Here, an array of shape `(3, 1)` is broadcast to an array with shape `(3, 2)`

![](files/img/broadcasting2D.lightbg.scaled-noalpha.png)

## What are the rules for broadcasting? 

In order for an operation to broadcast, the size of all the trailing dimensions for both arrays must either be *equal* or be *one*.  Dimensions that are one and dimensions that are missing from the "head" are duplicated to match the larger number.  So, we have:

|Array             |Shape          |
|:------------------|---------------:|
|A      (1d array)|              3|
|B      (2d array)|          2 x 3|
|Result (2d array)|          2 x 3|

|Array             |Shape          |
|:------------------|-------------:|
|A      (2d array)|          6 x 1|
|B      (3d array)|      1 x 6 x 4|
|Result (3d array)|      1 x 6 x 4|

|Array             |Shape          |
|:-----------------|---------------:|
|A      (4d array)|  3 x 1 x 6 x 1|
|B      (3d array)|      2 x 1 x 4|
|Result (4d array)|  3 x 2 x 6 x 4|

Some other interpretations of compatibility:
    
  *  Tails must be the same, ones are wild.
  

  *  If one shape is shorter than the other, pad the shorter shape on the LHS with `1`s.
    * Now, from the right, the shapes must be identical with ones acting as wild cards.

In [None]:
a1 = np.array([1,2,3])       # 3 -> 1x3
b1 = np.array([[10, 20, 30], # 2x3
               [40, 50, 60]]) 
print(a1+b1)

In [None]:
result = (np.ones((  6,1)) +  # 3rd dimension replicated
          np.ones((1,6,4)))
print(result.shape)

result = (np.ones((3,6,1)) + 
          np.ones((1,6,4)))   # 1st and 3rd dimension replicated
print(result.shape)

Sometimes, it is useful to explicitly insert a new dimension in the shape.  We can do this with a fancy slice that takes the value `np.newaxis`.

In [None]:
arr1 = np.arange(6).reshape((2,3))  # 2x3
arr2 = np.array([10, 100])          #   2
arr1 + arr2

In [None]:
# let's massage the shape
arr3 = arr2[:, np.newaxis] # arr2 -> 2x1
print("arr3 shape:", arr3.shape)
print("arr1 + arr3")
print(arr1+arr3)

In [None]:
arr = np.array([10, 100])
print("original shape:", arr.shape)

arrNew = arr2[np.newaxis, :]
print("arrNew shape:", arrNew.shape)

In [None]:
arr1 = np.arange(0,6).reshape(2,3)
arr2 = np.arange(10,22).reshape(4,3)
np.tile(arr1, (2,1)) * arr2

## Some Simple Setup

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

# Array Joining and Splitting

Several functions for combining (concatenating) and splitting up arrays are free functions in the NumPy module.  The main functions (warning, these take a single tuple as their argument) are:

  * `np.concatenate`
  * `np.vstack`
  * `np.column_stack`
  * `np.hstack`
  
Combining arrays together is not ideal (because we have to reallocate memory — a slow operation). An alternative is to use Python lists (which append quickly on the right) until all (or sufficient) data is gathered and then convert to  NumPy array.
  
Note: If you find yourself constantly needing to restack vectors into 2-D arrays (when doing linear algebra), you may want to look at NumPy's `matrix` class. Essentially, matrices keep their 2-D shape throughout operations applied to them. They also define $*$ as matrix multiplication. (Note: In Python 3.5+, the operator $@$ is dedicated to matrix multiplication).

In [None]:
joined = np.concatenate((np.array([1,2,3]),
                         np.array([4,5,6]),
                         np.array([7,8,9])))
print(joined)

In [None]:
arr1 = np.arange(4).reshape(2,2)
arr2 = (np.arange(4).reshape(2,2) + 1) * 10
print("arr1:\n", arr1, end=vsep)
print("arr2:\n", arr2, end=vsep)
print("elementwise addition:\n", arr1 + arr2, end=vsep)
print("concatenate axis 0:\n", 
      np.concatenate((arr1, arr2), axis=0), end=vsep)
print("concatenate axis 1:\n",
      np.concatenate((arr1, arr2), axis=1))

Note, for concatenation, all dimensions (except the dimension being concatenated) must have the same size.

In [None]:
arr1 = np.array([1,   2,  3])
arr2 = np.array([11, 22, 33])

# create a vertical stack
print("vstack:\n", np.vstack((arr1, arr2)), end=vsep)

# creates a column stack (horizontally)
print("column_stack:\n", np.column_stack((arr1, arr2)), end=vsep)

# compare np.column_stack with np.hstack()
print("hstack (1D):")
# the 1-D arrays are not treated as columns
print(np.hstack((arr1, arr2))) 

In [None]:
arr = np.vstack((arr1, arr2))
print("original array:\n", arr, end=vsep)

# concatenate with 1D, but ok if we have 2D things
print("hstacking:\n", np.hstack((arr, arr)), end=vsep)
print("vstacking:\n", np.vstack((arr, arr)))

## Some Simple Setup

In [None]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import os.path as osp
import numpy.random as npr
vsep = "\n-------------------\n"

def dump_array(arr):
    print("%s array of %s:" % (arr.shape, arr.dtype))
    print(arr)

# Array Meta-data and *dtype*

Internally, NumPy arrays look like this:

<center>
![](img/memory.lightbg.scaled-noalpha.png)
</center>

Individual elements (scalars) have lightweight wrappers around them that treat them as single-element arrays.

In [None]:
# arrays have several pieces of meta-data, driven in part by the dtype of the array
def dump_arrayInfo(arr):
    print("%15s: %s" % ("shape", arr.shape))
    print("%15s: %s" % ("dtype", arr.dtype))
    print("%15s: %s" % ("size", arr.size))
    print("%15s: %s" % ("itemsize", arr.itemsize))
    print("%15s: %s" % ("size * itemsize", arr.size * arr.itemsize))
    
arr = np.arange(10)
dump_array(arr)
print(vsep)
dump_arrayInfo(arr)

In [None]:
arr = np.arange(10.0).reshape(5,2)
dump_arrayInfo(arr)

And the *dtype* itself can be queried for information:

In [None]:
def dumpDtypeInfo(dt):
    print("%15s: %s" % ("name", dt.name))
    print("%15s: %s" % ("byteorder", dt.byteorder))
    print("%15s: %s" % ("itemsize",  dt.itemsize))
    print("%15s: %s" % ("type", dt.type))

dumpDtypeInfo(arr.dtype)

Usually, we can get the right *dtype* through inference:

In [None]:
arr1 = np.array([1,2,3])
arr2 = np.array([1, 2, 3.14150])

print("arr1 type: ", arr1.dtype)
print("arr2 type: ", arr2.dtype)

In [None]:
x = np.zeros(10, dtype=np.longdouble)
x

We can also manually specify a *dtype* in many array creation routines

In [None]:
arr1 = np.array([1,2,3], dtype=np.float32)
dump_array(arr1)

And we can convert *dtype*s with `array.astype`

In [None]:
arr = np.arange(10)
dump_array(arr)

converted = arr.astype(np.float_)
print("\nafter converting types:")
dump_array(converted)

<img src="img/mef_numpy_dtype_hierarchy.png" width="400px"/>

One other set of information about `array`s comes from the `flags` attribute.

In [None]:
arr = np.arange(10)
print(arr.flags)

In [None]:
x = np.zeros(10, dtype=int)
x[0] = 3.1
x

When a NumPy array is sliced, the slice is created with *new* dtype, shape, and stride information.  However, the underlying data is referenced (not copied).  We can determine if an array *owns* its own data via its *array.flags.owndata* attribute.

In [None]:
arr1 = np.arange(10)
arr2 = arr1[3:7]

print("arr1 owndata: ", arr1.flags.owndata)
print("arr2 owndata: ", arr2.flags.owndata)

# and what will happen if we write into arr2?
arr2[:] = 0
print("arr1 (after assigning into arr2):", arr1)

arr3 = arr1.copy()
print("arr3 owndata: ", arr3.flags.owndata)

<img src='img/copyright.png'>