# 2. Multidemensional Arrays

## 2.1 Dimensionality of arrays

An array is an ordered and structured collection of elements. Arrays are structured around the number of dimensions they contain, as well as how many elements exist along each dimension. Today we will focus on arrays that are more than one dimension.

A one-dimensional array that contains six elements looks like:

  4 5 6 7 8 9

Or we could have a two-dimensional array that contains six elements with two rows and three columns:

  4 5 6
  
  7 8 9

## Exercise 2.1

Import numpy

In [1]:
import numpy

## 2.2 Creating multidemensional arrays from python lists

### 2.2.1 Reviewing one dimensional arrays

Last week we focused on creating arrays that contain one dimesion:

In [2]:
b = numpy.array([1, 2, 3])

print( b )

[1 2 3]


We have multiple functions to get insight in the dimensionality:
- `x.ndim` : the number of dimensions.
- `x.shape` : the length of each dimension.
- `x.size` : the total number of elements.

In [3]:
print( b.ndim, b.shape, b.size )

1 (3,) 3


### 2.2.2 Create two dimensonal arrays

When creating two dimensional arrays, the simplist way to do so is based on python structures. Specifically, create the numpy array based off a python list, where each element of the list is another list. In this case the outer list is a list of the rows and each inner list specifies the elements of that row.

In [4]:
a = numpy.array([[1, 3, 6, 5], [2, 4, 3, 0]])

print( a )
print( a.ndim, a.shape, a.size )

[[1 3 6 5]
 [2 4 3 0]]
2 (2, 4) 8


A true 2-dimensional array (also called a matrix) is created only if the number of elements in each row is the same across all rows. Otherwise, a one-dimensional numpy array of python lists is created:

In [5]:
b = numpy.array([[1, 3, 6, 5], [2, 4, 3]])

print( b )
print( b.ndim, b.shape, b.size )

[list([1, 3, 6, 5]) list([2, 4, 3])]
1 (2,) 2


### 2.2.3 N dimensional arrays

In general, an array can be $n$-dimensional. One way to think of $n$-dimensional arrays in terms of the bookshelf analogy:

- 1d array is a single row of a bookshelf, where a book can be identified by its position in the row
- 2d array is the whole bookshelf, where a book can be identified by its row number and its position in the row
- 3d array is a room full of bookshelves, where a book can be identified by the number of the bookshelf, row, and position in the row
- 4d array is a library with rooms with bookshelves, where a book can be identified by the room, bookshelf, row and position in the row
- ...and so on...

N-dimensional arrays can be created in the same way as two-dimensional arrays but with more nested lists:

In [6]:
# create a three dimensional array:
z1 = numpy.array([[[1, 3], [2, 4]], [[11, 13], [12, 14]] ])

# dimensions/shape/size of the array:
print(z1)
print( "Number of dimensions:", z1.ndim )
print( "Length of each dimension:", z1.shape )
print( "The total number of elements:", z1.size )

[[[ 1  3]
  [ 2  4]]

 [[11 13]
  [12 14]]]
Number of dimensions: 3
Length of each dimension: (2, 2, 2)
The total number of elements: 8


## Exercise 2.2

Part A: Create a 2x3x2 three-dimensional array (you decide which values to include)
    
Part B: Create 2x2x3 three dimensional array.

Verify the arrays have the same number of dimensions and elements but different shapes.

## 2.3 Creating N-dimensional arrays using functions

Creating arrays by specifying nested lists of elements is tedious. Therefore, there are nice tools for creating mulit-dimensional arrays.

### 2.3.1 Creating arrays containing nothing, zeroes, ones

An empty array or an array with only zeros or ones can be created with the `empty`, `zeros` and `ones` functions. 
Be careful with the empty array. This is the fastest way to inialize an array but the content of the empty array can be anything (it will be whatever was already written in that location in memory).

The shape of the array is the input for these functions:


In [8]:
# create a one-dimensional empty array:
a = numpy.empty( (3) )
print(a, a.shape)

# create two-dimensional array with only zeros:
b = numpy.zeros( (2, 3) )
print(b, b.shape)

# create a four-dimensional array with only ones:
c = numpy.ones( (2,2,2,2) )
print(c, c.shape)

[  4.94065646e-324   9.88131292e-324   1.48219694e-323] (3,)
[[ 0.  0.  0.]
 [ 0.  0.  0.]] (2, 3)
[[[[ 1.  1.]
   [ 1.  1.]]

  [[ 1.  1.]
   [ 1.  1.]]]


 [[[ 1.  1.]
   [ 1.  1.]]

  [[ 1.  1.]
   [ 1.  1.]]]] (2, 2, 2, 2)


### 2.3.2 Creating identity matrices

An identity matrix is a two-dimensional square matrix (same number of rows and columns) in which all values are zeros, except for ones along the diagonal. 
The identity matrix can be created with the `eye(n)` function. Since the identity matrix is always a square, only one input parameter is needed to create the 2-dimensional matrix.

In [9]:
# create n by n identity matrix:
print( numpy.eye(4) )

[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]


### 2.3.3 Array stacking

More complex arrays can be created by combining two or more one-dimensional arrays. This is called vector stacking. There are two functions to stack arrays and they produce arrays with different shapes:

- horizontal stack: `numpy.hstack([x, y, z])`
- vertical stack: `numpy.vstack([x, y, z])`

In [10]:
x = numpy.arange(0, 5)                     
y = numpy.arange(5, 10)   
z = numpy.arange(10, 15)

print("Horizontal stack: " )
print( numpy.hstack([x, y, z]) )

print("Vertical stack: ")
print( numpy.vstack([x, y, z]) )

Horizontal stack: 
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
Vertical stack: 
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


The opposite of vector stacking is to split a two-dimensional array into a list of one-dimensional arrays (vectors). The first argument is the 2D array and the second argument is the number of 1D arrays to create. It is important that that the number of pieces you want is possible given the number of rows or columns. Two different ways of splitting:
- Vertical split: `numpy.vsplit(x, N)`
- Horizontal split: `numpy.hsplit(x, N)`


In [11]:
a = numpy.arange(0, 5)                     
b = numpy.arange(5, 10)   
c = numpy.arange(10, 15)
d = numpy.vstack([a, b, c])

print(d, d.shape)

print("\nVertical split: ")
print( numpy.vsplit(d, 3) )

print("\nHorizontal split: ")
print( numpy.hsplit(d, 5) )

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]] (3, 5)

Vertical split: 
[array([[0, 1, 2, 3, 4]]), array([[5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14]])]

Horizontal split: 
[array([[ 0],
       [ 5],
       [10]]), array([[ 1],
       [ 6],
       [11]]), array([[ 2],
       [ 7],
       [12]]), array([[ 3],
       [ 8],
       [13]]), array([[ 4],
       [ 9],
       [14]])]


## 2.4 Reshaping arrays

A 1D array can be converted to an $n$ dimensional array using the `reshape()` function. 

Note that the the total number of elements in the array have to be the same as the product of the lengths of the dimensions. For example, if the length of the list is 24, then we can reshape it to a 4 by 6 matrix, but also to a 2 by 3 by 4 matrix.

Let's assume we have a 2 by 3 by 4 matrix, which we will call z. Since the index in Python starts at 0, the first element of the array is `z[0,0,0]`, but the last element of the array is not z[2, 3, 4] but rather `z[1, 2, 3]`.

In [12]:
# reshape into a two dimensional array:
a = numpy.arange(2, 14, 2).reshape((2, 3))
print( a )
print( a.ndim, a.shape, a.size)

[[ 2  4  6]
 [ 8 10 12]]
2 (2, 3) 6


In [13]:
# reshape into a three dimensional array:
z2 = numpy.arange(24).reshape((2, 3, 4))

# dimensions/shape/size of the array:
print( z2 )
print( "Number of dimensions:", z2.ndim )
print( "Length of each dimension:", z2.shape )
print( "The total number of elements:", z2.size )

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Number of dimensions: 3
Length of each dimension: (2, 3, 4)
The total number of elements: 24


## Exercise 2.4

Redo Exercise 2.2 using reshape.

## 2.5 Creating the other one dimensional array: column vectors

The standard one-dimensional numpy array has one row and multiple columns:

In [14]:
a = numpy.random.random(5)
print( a, a.size )

[ 0.34403669  0.71724853  0.79961649  0.89316333  0.71325027] 5


For some purposes (linear algebra, matrix multiplication, etc.) it is useful to have a one-dimensional array that consists of multiple rows and one column. 

This is called a column vector (intead of the standard row vector). Numpy represents column vectors as a two-dimensional array with only one column. They can be created by using the list of lists construction, by specifying a two-dimensional shape to `numpy.zeros` or other functions, or by using the reshape function:

In [15]:
a = numpy.array([[1], [2], [3]])
print( a )
print( a.ndim, a.shape, a.size, "\n" )

b = numpy.zeros((3,1))
print( b )
print( b.ndim, b.shape, b.size, "\n" )

c = numpy.array([1,2,3]).reshape((3,1))
print( c )
print( c.ndim, c.shape, c.size, "\n" )

[[1]
 [2]
 [3]]
2 (3, 1) 3 

[[ 0.]
 [ 0.]
 [ 0.]]
2 (3, 1) 3 

[[1]
 [2]
 [3]]
2 (3, 1) 3 



## 2.6 Matrix Indexing

For complete information about indexing see
http://docs.scipy.org/doc/numpy/user/basics.indexing.html

Last week we talked about linear indexing, boolean indexing, and indexing by number. Today we will talk about matrix indexing and how it relates to linear indexing.

Linear indexing works when the array is a single dimension:

In [16]:
a = numpy.random.uniform(-0.5, .5, 9)
print( a )
print( a[2] )

[-0.09060926 -0.27400251 -0.04690387 -0.31740315  0.42807189  0.33950993
 -0.3674358   0.49550885  0.10683923]
-0.0469038688989


But gives slightly confusing results (which we will explain in a bit), when there is more than one dimension:

In [17]:
b = numpy.random.uniform(-0.5, .5, (3,3))
print( b )
print( b[2] )

[[ 0.28170831  0.08316668  0.29155183]
 [-0.27224755  0.11383073  0.47971962]
 [-0.11528719  0.24830377 -0.35826978]]
[-0.11528719  0.24830377 -0.35826978]


Instead, to access a single value in a multidimensional array, we need to specify an index along each dimension. This is referred to as Matrix Indexing:

In [18]:
c = numpy.random.uniform(-0.5, .5, (3,3))
print( c )

# Return a value specified by a matrix index
print( c[1, 1] )

[[-0.25645619  0.0543062  -0.47847811]
 [ 0.2290951   0.44572004 -0.35197097]
 [-0.37078143 -0.12718253  0.38193165]]
0.445720044459


### 2.6.1 Ranged matrix indexing

As in one-dimensional arrays, when accessing more than one element, the slicing `":"` can be used. As we saw above, if no index is given for a dimension, then the `":"` will be assumed.

As before, if the index is `[a:b]` then indices that are used are `a` up to but not including `b`.

In [19]:
z = numpy.arange(24).reshape((2, 3, 4))
print( z )
print()

# Print a few slices of a 3-dimensional array:
print( "Slices:" )
print( z[0:2, 1:3, 3] )
print()
print( z[:, 2, :] )

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

Slices:
[[ 7 11]
 [19 23]]

[[ 8  9 10 11]
 [20 21 22 23]]


### 2.6.2 Converting linear index to matrix index

Some functions, like `argmax`, return a linear index even if the input is a multidimensional array. To convert from a linear index to a matrix index, use the function `numpy.unravel_index()`. The first argument is the linear index and the second argument is the shape of the array for which you want to transform the index. For example: `numpy.unravel_index(linear_index, (2,3))`. 

In [20]:
z = numpy.arange(24).reshape((2, 3, 4))

# Converting a linear index to a matrix index:
linear_index = 10
matrix_index = numpy.unravel_index(linear_index, z.shape)

print("For a matrix with dimensions (2, 3, 4), the linear index: ", linear_index, " is equal to \
matrix index: ", matrix_index)

print( z )
print( z[matrix_index] )

For a matrix with dimensions (2, 3, 4), the linear index:  10  is equal to matrix index:  (0, 2, 2)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
10


Alternatively, you can convert your multidimensional array to a one-dimensional array by using the `flatten` property:

In [21]:
z = numpy.arange(24).reshape((2, 3, 4))
linear_index = 10

a = z.flatten()
print( a )
print( a[linear_index] )

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
10


## Exercise 2.6.1

Generate a $5\times 5 \times 5$ 3D array of random numbers between -10.0 and 10.0. Reshape it to a $5 \times 25$ matrix. Extract the last 10 elements of the first two rows of this matrix. 


## Exercise 2.6.2

Create a $4\times3$ matrix of random numbers between $0$ and $1$. 
Find the row and column position of the minimum and the maximum value.

## 2.7 Reading array from file and saving arrays to file

There are two major numpy based methods for saving objects: saving them as a text file, or as a numpy-specific binary file. The first option is very similar to a csv file, the second preserves the numpy structure (like python's pickle library). The way to use these methods are:

- `numpy.savetxt(filename, array)` : save an array to a text file. 
- `numpy.save(filename, array)` : save an array to a binary file in numpy `.npy` format.

The corresponding functions for reading these formats are `numpy.loadtxt()` and `numpy.load()`.

## Exercise 2.7

Part 1: Generate a one dimensional array. Save to disk as a text file, and as a npy file. Create a new array that is based on reading the array back from the text file, and a new array that is based on the npy file. Is the information preserved in both formats?

Part 2: Repeat part 1 with a two-dimensional array and then a three-dimensional array. What works and what does not?

## 2.8 Structured arrays

A structured array is fundamentally different than a numpy array. Structured arrays are two-dimensional arrays in which each column can be a different datatype. They are particularly useful for loading and working with tabular data with heterogeneous column types. 

## 2.8.1 Creating structured arrays 

One of the possible ways to specify a structured array is to use a list of tuples as `dtype`:
A tuple is specified with the name of the column and the type of data in each column in the array. For example: 

In [26]:
dtype = [('Name', 'U10'), ('Country', 'U10'), ('Area', 'float64')]

The content of the array can then be given as a list of tuples, like so:

In [27]:
city = numpy.array([('Amsterdam', 'Netherlands', 219.3),
                    ('Paris',     'France',      105.4 ),
                    ('Barcelona', 'Spain',       101.9 )],
                     dtype=dtype)
print( city )

[('Amsterdam', 'Netherland',  219.3) ('Paris', 'France',  105.4)
 ('Barcelona', 'Spain',  101.9)]


## Exercise 2.8.1.1

Modify the code above to determine what happens when you use a list of lists instead of a list of tuples to define the types? Or when specifying the values in a structured array?

You can also specify a structured array using a list with one entry for each column in the `dtype` option of the `numpy.loadtxt()` function. `loadtxt()` is a very fast and efficient function that assumes your data is nicely formatted.

## Exercise 2.8.1.2

Uncomment and complete the following code loading the data from file [populations.txt](populations.txt). Load the year column as an `int`, and the other columns as `float`.

In [28]:
#dtype = [('year',  ...
#         ('hare',  ...
#         ...
#          ] 
#population = numpy.loadtxt("populations.txt", dtype=...)
#print( population )

Finally, the `numpy.genfromtxt()` is a slower but more powerful tool for building structured arrays that can handle missing data and has more options than `loadtxt()`

In [30]:
population = numpy.genfromtxt("populations.txt", 
             names=True,
             dtype=['int','float','float','float'])

# Print the  lynx column
print( population['lynx'] )

[  4000.   6100.   9800.  35200.  59400.  41700.  19000.  13000.   8300.
   9100.   7400.   8000.  12300.  19500.  45700.  51100.  29700.  15800.
   9700.  10100.   8600.]


### 2.8.2 Dimensionality of structured arrays

Despite structured arrays consisting of rows and columns, structured arrays are treated as one-dimensional arrays by numpy. The data type is a list with one tuple for each column:


In [31]:
# Print information about the array
print( city.shape )
print( city.dtype )

(3,)
[('Name', '<U10'), ('Country', '<U10'), ('Area', '<f8')]


### 2.8.3 Indexing structured arrays

The rows in a structured array can be accessed by standard array indexing. The columns of the array are indexed by using the column names that are specified when the array was created.

In [32]:
# Access first row
print( city[0] )

# Access first two rows
print( city[0:2] )

# Access column by name
print( city['Area'] )

# Access two columns using list of names
print( city[['Name', 'Area']] )

('Amsterdam', 'Netherland',  219.3)
[('Amsterdam', 'Netherland',  219.3) ('Paris', 'France',  105.4)]
[ 219.3  105.4  101.9]
[('Amsterdam',  219.3) ('Paris',  105.4) ('Barcelona',  101.9)]


You can also access the column names using the `.dtype.names` property which can also be used to modify the names of the columns:

In [33]:
print( city.dtype.names )

city.dtype.names = ('name', 'country', 'area')
print( city['area'] )

('Name', 'Country', 'Area')
[ 219.3  105.4  101.9]


## Exercise 2.8.3.1

Uncomment and complete the following code to print years with the smallest number of hares, lynxes and carrots in the 
populations dataset.

In [34]:
#for species in [....]:
#    index = ...
#    year = ...
#    print("Least # of {} in year {}".format(species, year))

## Exercise 2.8.3.2
Use the population data to

1. Select all the years in which there are more than 50000 lynxes;
2. Select all the years in which there are more lynxes than hares.

# Extras about arrays

### For loops over arrays

Most of the time you want to use arrays to avoid loops, but sometimes you want to use a loop. Iterating over a 1-dimensional Numpy array is the same as in base python:

In [37]:
a = numpy.arange(5, 10)

for element in a:
    print( element )
    
for index, element in enumerate(a):
    print( "Element {} at index {}".format(element, index) )

5
6
7
8
9
Element 5 at index 0
Element 6 at index 1
Element 7 at index 2
Element 8 at index 3
Element 9 at index 4


It is also possible to iterate across a multi-dimensional Numpy array. If only one loop is used, then it will iterate over the first dimension and process a Numpy array in each step of the loop.

In [38]:
z = numpy.arange(24).reshape((2, 3, 4))
print( z )

# for loop over the first dimension:
for firstdimension in z:
    print( "Print separator" )
    print( firstdimension )

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Print separator
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Print separator
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


If two nested for loops are used, then the loops iterate over the first two dimensions. 

In [39]:
for firstdimension in z:
    for seconddimension in firstdimension:
        print( "Print separator" )
        print( seconddimension )

Print separator
[0 1 2 3]
Print separator
[4 5 6 7]
Print separator
[ 8  9 10 11]
Print separator
[12 13 14 15]
Print separator
[16 17 18 19]
Print separator
[20 21 22 23]


The same process occurs for multi-dimensional arrays. For loops will iterate over Numpy arrays or elements, depending on the dimensionality.

When you use a for loop over a 3 dimensional array with shape (2, 3, 4) then you will receive exactly 2 iterations and in each iteration a 3 by 4 array is used. 
When you use two nested for loops over a 3 dimensional array with shape (2, 3, 4) then you will receive exactly 2*3=6 iterations and in each iteration an array of length 4 is used. 

If you want to loop over all the elements in the array regardless of dimensionality, then you should use the flat method instead of nested loops:

In [40]:
for element in z.flat:
    print( element )

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
