<a href="https://colab.research.google.com/github/JustinShawAcademy/DataTalent/blob/main/Organizing_and_Manipulating_NumPy_Arrays.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Organizing and Manipulating NumPy Arrays

In this workshop we will learn how to organize and manipulate our arrays, covering the topics of:

- Reshaping arrays and ravelling arrays
- Combining and splitting arrays
- Copies and views of arrays

## Shape Manipulation

Sometimes our arrays will have the data that we need but their shape is not convenient for the operation we want to perform. NumPy provides a few important functions to help with this.

We can use the `reshape` function to modify the shape of an array.

In [None]:
import numpy as np

a = np.arange(0, 6)
# a is one dimensional
print(a)
# a with 2 rows and 3 columns
print(a.reshape((2, 3)))
# a with 3 rows and 2 columns
print(a.reshape((3, 2)))
# a with 6 rows and 1 column
print(a.reshape((6, 1)))
# a with 1 row and 6 columns
print(a.reshape((1, 6)))
# a with three dimensions
print(a.reshape((1, 1, 6)))

[0 1 2 3 4 5]
[[0 1 2]
 [3 4 5]]
[[0 1]
 [2 3]
 [4 5]]
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]]
[[0 1 2 3 4 5]]
[[[0 1 2 3 4 5]]]


**Challenge:** Create an array of numbers from 0 up to 50 with 10 rows and 5 columns.

In [None]:
print(np.arange(0, 50).reshape((10, 5)))

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]
 [35 36 37 38 39]
 [40 41 42 43 44]
 [45 46 47 48 49]]


We can use the `ravel` method on an array to flatten it.


In [None]:
a = np.arange(0, 12).reshape((3, 4))
print(a)
print(a.ravel())
# To get back the original shape after ravel we can use reshape
print(a.ravel().reshape(a.shape))

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 0  1  2  3  4  5  6  7  8  9 10 11]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]



Note that this is not the same as `flat` because `flat` is just an iterator while `ravel` actually creates an array. Iterators only return the elements in a given order which is good for looping and saves memory.

In short, use `flat` when you want to loop through the elements of an array, and use `ravel` when you need a flattened version of the array.

In [None]:
print(a.ravel())
print(a.flat)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
<numpy.flatiter object at 0x38ce9cb0>


`reshape` and `ravel` return new arrays and **do not modify the original array**. If you want to modify the original you can use `resize` instead.

In [None]:
a.reshape((4, 3))
# Remains the same
print(a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [None]:
a.ravel()
# Remains the same
print(a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [None]:
a.resize((4, 3))
# Modified
print(a)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


If you only care about one of the dimensions when reshaping an array, you can specify `-1` for the other dimension and they will be automatically calculated.

In [None]:
# We want 2 rows and it doesn't matter how many columns
print(a.reshape(2, -1))

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


In [None]:
a2 = np.arange(0, 36)

# We want 12 columns and it doesn't matter how many rows
print(a2.reshape(-1, 12))

print('---')
# We want a list of arrays of shape (4, 3), we don't care how many arrays
print(a2.reshape(-1, 4, 3))

[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31 32 33 34 35]]
---
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]

 [[24 25 26]
  [27 28 29]
  [30 31 32]
  [33 34 35]]]


You can also *transpose* an array, which means making the columns into the rows and the rows into the columns. When you transpose an array of size `(m, n)`, you will get an array of size `(n, m)`.

In [None]:
a2 = np.arange(0, 36).reshape(4, 9)
print(a2)

[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]


In [None]:
a2t = a2.T
# Notice how each row becomes a column at the position of that row,
# and in the process each column becomes a row (or you can look at it the other way around)
print(a2t)

[[ 0  9 18 27]
 [ 1 10 19 28]
 [ 2 11 20 29]
 [ 3 12 21 30]
 [ 4 13 22 31]
 [ 5 14 23 32]
 [ 6 15 24 33]
 [ 7 16 25 34]
 [ 8 17 26 35]]


In [None]:
print(a2.shape)
print(a2t.shape)

(4, 9)
(9, 4)


**Challenge:** Create an array of ones of shape (5, 6, 7), update it so its flattened, then return it to its original shape.

**Bonus challenge:** Explain the shape of the array.

In [None]:
ones = np.ones((5,6,7))
ones = ones.ravel()
print(ones)
ones = ones.reshape((5,6,7))
print(ones)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[[[1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]]

 [[1. 1. 1. 1. 1. 1. 1.]
  [1. 1. 1. 1. 1. 1. 1.]
  [1. 

## Stacking and Splitting Arrays

It is sometimes useful to join arrays together or split them apart, for example if we have data split across two different tables.

`hstack` and `vstack` allow us to 'stack' arrays along different axes.

In [None]:
a = np.arange(0, 12).reshape((3, 4))
b = np.arange(12, 24).reshape((3, 4))
print(a)
print(b)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


To stack these along the first axes, or along the rows, we can use `vstack`. Stacking along the rows means each row in each array becomes a row in the resulting array. The order of the rows are determined by the order the arrays are passed in.

In [None]:
vstack = np.vstack((a, b))
print(vstack)
print(vstack.shape)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
(6, 4)


To stack these along the second axes, or along the columns, we can use `hstack`. Stacking along the columns means each column in each array becomes a column in the resulting array. The order of the columns are determined by the order the arrays are passed in.

In [None]:
hstack = np.hstack((a, b))
print(hstack)
print(hstack.shape)

[[ 0  1  2  3 12 13 14 15]
 [ 4  5  6  7 16 17 18 19]
 [ 8  9 10 11 20 21 22 23]]
(3, 8)


These also work as you would expect for simpler 1D arrays.

In [None]:
a1 = np.array([1, 2, 3])
b1 = np.array([4, 5, 6])
print(np.vstack((a1, b1)))
# In this case, this one is basically the same as Python's extend
print(np.hstack((a1, b1)))

[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]


Another function called `column_stack` takes 1D arrays and stacks them as columns into a 2D array. This can be useful for building a table from columns stored as 1D arrays.

In [None]:
print(np.column_stack((a1, b1)))

[[1 4]
 [2 5]
 [3 6]]


Another useful trick to add axes to an array is to use `np.newaxis`.

In [None]:
print(a1[:, np.newaxis])
print(b1[:, np.newaxis])

[[1]
 [2]
 [3]]
[[4]
 [5]
 [6]]


For 1D arrays, `column_stack` effectively just adds a new axis like this, then `hstack`s the results together.

In [None]:
print(np.column_stack((a1, b1)))
print(np.hstack((a1[:,np.newaxis], b1[:,np.newaxis])))

[[1 4]
 [2 5]
 [3 6]]
[[1 4]
 [2 5]
 [3 6]]


`concatenate` allows for stacking along any axis via its `axis` parameter. This can be useful if you have more than two axes in your arrays.

In [None]:
a3 = a.reshape((-1, 3, 2))
b3 = b.reshape((-1, 3, 2))

print(a3)
print('------')
print(b3)

[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]]
------
[[[12 13]
  [14 15]
  [16 17]]

 [[18 19]
  [20 21]
  [22 23]]]


We can concatenate along the rows **containing** the inner arrays.

In [None]:
print(np.concatenate((a3, b3), axis=0))

[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]

 [[12 13]
  [14 15]
  [16 17]]

 [[18 19]
  [20 21]
  [22 23]]]


We can concatenate along the rows **within** the inner arrays.

In [None]:
print(np.concatenate((a3, b3), axis=1))

[[[ 0  1]
  [ 2  3]
  [ 4  5]
  [12 13]
  [14 15]
  [16 17]]

 [[ 6  7]
  [ 8  9]
  [10 11]
  [18 19]
  [20 21]
  [22 23]]]


We can concatenate along the columns **within** the inner arrays.

In [None]:
print(np.concatenate((a3, b3), axis=2))

[[[ 0  1 12 13]
  [ 2  3 14 15]
  [ 4  5 16 17]]

 [[ 6  7 18 19]
  [ 8  9 20 21]
  [10 11 22 23]]]


The `r_` and `c_` objects allow for greater flexibility in building up arrays and allow us to build arrays using ranges. `r_` stacks along the rows and `c_` stacks along the columns.

In [None]:
print(np.r_[1:4, 0, 4])
print(np.c_[1:4, 4:7])
c = np.c_[24:27, 28:31, 32:35, 36:39]
print(c)
print(np.r_[a, b, c])
print(np.c_[a, b, c])

[1 2 3 0 4]
[[1 4]
 [2 5]
 [3 6]]
[[24 28 32 36]
 [25 29 33 37]
 [26 30 34 38]]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]
 [24 28 32 36]
 [25 29 33 37]
 [26 30 34 38]]
[[ 0  1  2  3 12 13 14 15 24 28 32 36]
 [ 4  5  6  7 16 17 18 19 25 29 33 37]
 [ 8  9 10 11 20 21 22 23 26 30 34 38]]


We can use `hsplit` and `vsplit` to split an array either into `n` equally sized arrays, or to split it at a specified point.

In [None]:
a2 = np.c_[a, b, c]
print(a2)

[[ 0  1  2  3 12 13 14 15 24 28 32 36]
 [ 4  5  6  7 16 17 18 19 25 29 33 37]
 [ 8  9 10 11 20 21 22 23 26 30 34 38]]


In [None]:
# Split it horizonatally into 3 parts
splits = np.hsplit(a2, 3)
for split in splits:
    print(split)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
[[24 28 32 36]
 [25 29 33 37]
 [26 30 34 38]]


In [None]:
# Split it vertically into 3 parts
splits = np.vsplit(a2, 3)
for split in splits:
    print(split)

[[ 0  1  2  3 12 13 14 15 24 28 32 36]]
[[ 4  5  6  7 16 17 18 19 25 29 33 37]]
[[ 8  9 10 11 20 21 22 23 26 30 34 38]]


In [None]:
# Split after the third and tenth columns
splits = np.hsplit(a2, (3, 10))
for split in splits:
    print(split)

[[ 0  1  2]
 [ 4  5  6]
 [ 8  9 10]]
[[ 3 12 13 14 15 24 28]
 [ 7 16 17 18 19 25 29]
 [11 20 21 22 23 26 30]]
[[32 36]
 [33 37]
 [34 38]]


In [None]:
print(a2.T)
# Split after the fifth and seventh rows
splits = np.vsplit(a2.T, (5, 7))
for split in splits:
    print(split)

[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [12 16 20]
 [13 17 21]
 [14 18 22]
 [15 19 23]
 [24 25 26]
 [28 29 30]
 [32 33 34]
 [36 37 38]]
[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [12 16 20]]
[[13 17 21]
 [14 18 22]]
[[15 19 23]
 [24 25 26]
 [28 29 30]
 [32 33 34]
 [36 37 38]]


To split along any axis you can use `array_split`.

**Challenge:** This is a multi-step challenge

1. Create an array with values ranging from 1 to 30 in two different ways
2. Reshape the arrays to have a shape of (3, 10)
3. Split the array into 3 different parts in any way you choose
4. Join the array parts back together to get back the array with shape (3, 10)

In [None]:
mya = np.arange(1, 31)
print(mya)
mya = np.r_[1:31]
print(mya)

mya = mya.reshape((3, 10))
print(mya)

mya = np.hsplit(mya, (3, 6))
print(mya)
mya = np.hstack(mya)
print(mya)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30]
[[ 1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]]
[array([[ 1,  2,  3],
       [11, 12, 13],
       [21, 22, 23]]), array([[ 4,  5,  6],
       [14, 15, 16],
       [24, 25, 26]]), array([[ 7,  8,  9, 10],
       [17, 18, 19, 20],
       [27, 28, 29, 30]])]
[[ 1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]]


## Copies and Views

When working with arrays, their data is sometimes copied and sometimes it isn't. It is important to understand this distinction and when copying occurs so that you don't unexpectedly mutate arrays.

When you assign an array to a variable **no copying occurs**. This means that manipulating the variable will manipulate the original array.

In [None]:
a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])
b = a # no new object is created
print(b is a) # a and b are two names for the same ndarray object
b.resize((4, 3))
print(b)
print('---')
print(a) # a is mutated along with b
print('---')
print(a.shape, b.shape)

True
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
---
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
---
(4, 3) (4, 3)


Different arrays can share the same data using the `view` method. This means that changing the data of the 'view' object will change the data in the original, but changing other properties, such as the `shape`, will not change anything about the original.

In [None]:
c = a.view()
print(c is a)

False


In [None]:
c = c.reshape((2, 6)) # a's shape doesn't change
print(a.shape)

(4, 3)


In [None]:
print(c, c.shape)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]] (2, 6)


In [None]:
# Here we update the value in the first row and the fifth column, which is
# effectively the fifth value in the array (reading from left to right, top to bottom)
c[0, 4] = 1234 # a's data changes
# Therefore the fifth value in a's data is also updated
print(a)

[[   0    1    2]
 [   3 1234    5]
 [   6    7    8]
 [   9   10   11]]


In [None]:
# We can see the relationship between the data with ravel
print(c.ravel())
print(a.ravel())

[   0    1    2    3 1234    5    6    7    8    9   10   11]
[   0    1    2    3 1234    5    6    7    8    9   10   11]


`Views` are returned when you slice an array.

In [None]:
s = a[1:3, :] # s is a view on the middle two rows of a
print(s)

[[   3 1234    5]
 [   6    7    8]]


In [None]:
s[:] = 10 # So modifying the values of s modifies those same values in a
print(a)

[[ 0  1  2]
 [10 10 10]
 [10 10 10]
 [ 9 10 11]]


If you want a complete copy of an array with no connection to the original, use `copy`.

In [None]:
d = a.copy()
d[0, 0] = 9999
print(d)
print(a)

[[9999    1    2]
 [  10   10   10]
 [  10   10   10]
 [   9   10   11]]
[[ 0  1  2]
 [10 10 10]
 [10 10 10]
 [ 9 10 11]]


**Challenge:** Write out what you think the following code will print.

In [None]:
ma = np.array([1, 2, 3, 4, 5, 6])
ma = ma.reshape((-1, 2))

mb = ma[1]
mb[:] = 90
mb = mb.reshape((2, 1))

print(mb)
print('---')
print(ma)

[[90]
 [90]]
---
[[ 1  2]
 [90 90]
 [ 5  6]]


## References

- [NumPy Quickstart](https://numpy.org/doc/stable/user/quickstart.html)