![Erudio logo](img/Erudio-logo.png)

---

![NumPy logo](img/numpylogo.svg)

# Joining, Stacking, and Spliting Arrays

While many of the operations on arrays that we have seen in earlier modules produce new arrays, largely it amounted to deriving values from existing ones rather than aggragating as such.

In [None]:
import numpy as np

## Reductions

In passing we have looked at some reductions.  Often it is useful to take a collection of values and produce an aggregate scalar result.  By default, most reduction operations do that. However, almost all of them also accept an `axis` argument to only reduce in one particular dimensions.

In [None]:
# Create a 3-D array
arr = np.arange(0, 12).reshape(2, 2, 3)
print(arr)

In [None]:
# Reduce globally
arr.sum(axis=None)

In [None]:
# Reduce over panels
arr.sum(axis=0)

In [None]:
# Reduce over columns
arr.sum(axis=1)

In [None]:
# Reduce over rows
arr.sum(axis=2)

It is more to keep track of, but we can also reduce over multiple dimensions while retaining others.  Past three dimensions, it mostly only makes sense to call these "dimension zero", "dimension one", and so on, as the ordinary words like "rows" and "columns" are not easy to correlate, nor adequate for most dimensions.

In [None]:
# Sum over columns and rows
print(arr.sum(axis=(1,2)))

# Slightly less efficient (and more verbose)
# The intermediate result has 2 not 3 dimensions; index adjusts down
print('-----')
print(arr.sum(axis=1).sum(axis=1))

## Combining Arrays

Sometimes we want to combine multiple arrays simply by concatenation of some sort.  The most general operation is `np.concatenate()`, but a number of special functions also exist.  Note that we pass in one collection (e.g. a list or tuple) of arrays to operate on, not separate arguments for each.

In [None]:
arr1 = np.arange(0, 12).reshape(2, 2, 3)
arr2 = np.arange(0, 120, 10).reshape(2, 2, 3)

In [None]:
# By default axis=0, along the panels here
print(np.concatenate([arr1, arr2]))

In [None]:
# Concatenate the columns
print(np.concatenate([arr1, arr2], axis=1))

In [None]:
# Concatenate the rows
print(np.concatenate([arr1, arr2], axis=2))

### Special forms

We have some shortcut or mnemonic functions for some specific operations.  

`np.r_` and `np.c_` are special objects that can be "sliced" to concetanate.  These are meant to make you think of "copy rows" and "copy columns" in their operation.  

In [None]:
m1 = np.array([[5, 7], [6, 8]])
m2 = np.array([[10, 20], [30, 40]])
print(np.r_[m1, m2])

In [None]:
# Now columns
print(np.c_[m1, m2])

Technically, `np.r_` and `np.c_` just mean axes 0 and 1.  But they become less intuitive for higher dimensions.

In [None]:
# This is really "concetanate panels"
arr1 = np.arange(0, 12).reshape(2, 2, 3)
arr2 = np.arange(0, 120, 10).reshape(2, 2, 3)
np.r_[arr1, arr2]

You can also use these shortcut objects to combine more than two arrays.

In [None]:
np.c_[np.arange(8), np.arange(8)*2, np.arange(8)*3, np.arange(8)*4]

### Tiling versus concatenation

We looked at using `np.tile()` to get arrays of *compatible* shapes for broadcast combination.  In some sense we can do the same thing using concatenation of an array with itself.  The difference is that concatenations always copy the underlying values while tiles just remember the shape of the tiling but copy no values.

For arrays with tens or hundreds of values, this is insignificant, but when you are working with hundreds of millions of values, it can be important.

In [None]:
from sys import getsizeof
arr = np.arange(8)
print(arr)
print("Size of original array:", getsizeof(arr))
print("Smallest possible array:", getsizeof(np.array(0, dtype=np.byte)))

In [None]:
# Tiling describes the shape of the tile and references original
t = np.tile(arr, (5, 4))
print(t)
# This size contains a single reference to the original arr
print("Size of tiling object:", getsizeof(t))

In [None]:
# Concatenate makes copies greedily
cat = np.concatenate([arr, arr, arr, arr]).reshape(1, -1)
cat = np.concatenate([cat, cat, cat, cat, cat])
print(cat)
print("Size of concetenation:", getsizeof(cat))

## More on Shaping

When calling `.reshape()`, `-1` can be used as a wildcard that will fill in the shape for one dimension, based on the others.  For small arrays, finding the factorization of the size is not particularly hard.  But in high dimensions and with large sizes, it is sometimes easier not to bother.

In [None]:
arr = np.arange(1, 11)
print(arr)

In [None]:
# shape of 5 x ? -> 5 x 2
print(arr.reshape(5, -1))

In [None]:
# shape of ? x 5 -> 2 x 5
arr.reshape(-1, 5)

Some other shaping utilities include:  

  * `.ravel()`
  * `.flatten()`
  * `.squeeze()`

`.flatten()`, like its name implies, makes a 1-D version of the array.  `.ravel()` behaves similarly, but it will try to avoid making a copy (i.e. usually it creates a view).

Run `help(<arr.method>)` to get more information on these functions, or on any functions.

# Exercises

Each of these exercises will ask you to create new arrays based on existing ones, generally utilizing both what you have learned in this module, and also techniques in earlier modules about slicing, selecting, reshaping, and so on.

In [None]:
from src.numpy_exercises import *

In this exercise, the starting array is 4✕4, with the same value in each "quadrant" of it.  Transform this array into one that is 2✕8 instead, with the numbers arranged in in sequential order by 2✕2 block.  

While there are many ways you might create the desired result independently, do so as a transformation of the provided original (i.e. do not assume the values are 1, 2, 3, 4; they might be 16 different values).  Note that `arr.reshape(2, 8)` will create the desired shape, but will not put the numbers in desired positions.

In [None]:
# Transform into 2 by 8 keeping same numbers in sub-blocks
arr = ex5_1.arr.copy()
ex5_1

In [None]:
ex5_1.result

There are a number of ways to obtain the result using techniques we have learned.  Try to do it differently than in the prior exercise.

In [None]:
arr = ex5_1.arr.copy()

---

Reverse the transformation in the prior exercice.  I.e. we start with 2✕8, in the arrangement shown, and we want to get back to the shape and position of numbers in the original 4✕4 array.

In [None]:
arr = ex5_2.arr.copy()
ex5_2

There are a number of ways to obtain the result using techniques we have learned.  Try to do it differently than in the prior exercise.

In [None]:
arr = ex5_2.arr.copy()
arr

---

From the 2✕8 array in the last exercise, transform the array into a 1-D array where each sub-block is contiguous.  In the example, this will be equivalent to monotonic ascending order; but again, do not rely on the specific values in the array used for illustration.

In [None]:
arr = ex5_3.arr.copy()
print(ex5_3.result)

There are a number of ways to obtain the result using techniques we have learned.  Try to do it differently than in the prior exercise.

In [None]:
arr = ex5_3.arr.copy()

---

Similarly to some above exercises, reverse the transformation from a 1-D array into a 2✕8 array with 2✕2 sub-blocks taken from contiguous portions of the 1-D aray.

In [None]:
arr = ex5_4.arr.copy()
ex5_4

---

Tranform a 2-D array of shape 4✕4 into a 3-D array of shape 2✕2✕4, again preserving the 2✕2 sub-blocks in the manner of other exercises in this modules.

In [None]:
arr = ex5_5.arr.copy()
ex5_5.result

---

That last exercise should have been easy.  For a more difficult variation, tranform the 2-D array of shape 4✕4 into a 3-D array of shape 4✕2✕2, again preserving the 2✕2 sub-blocks.  "Visually" this is each sub-block of the same number on a plane/panel.

In [None]:
arr = ex5_6.arr.copy()
ex5_6.result

---

Transform the 4✕2✕2 3-D array that was the result in the last exercise by multiplying each sub-block by a 2✕2 mask of the following:

```
[[-1  0]
 [ 0 10]]
```

In [None]:
arr = ex5_7.arr.copy()
ex5_7.result

---

Materials licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by the authors