![logo](../../img/license_header_logo.png)
> **Copyright &copy; 2021 CertifAI Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). <br>
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0**

# 05 - Manipulating ndarray 
Authored by: [Kian Yang Lee](https://github.com/KianYang-Lee) - kianyang.lee@certifai.ai

## <a name="description">Notebook Description</a>

This notebook discusses on various methods in manipulating shape of `ndarray`.

By the end of this tutorial, you will be able to:

1. Modify shape of a `numpy ndarray`
2. Apply `-1` as argument when performing shape manipulation
3. Stack different `numpy ndarray`s together
4. Split `numpy ndarray` into smaller chunks

## Notebook Outline
Below is the outline for this tutorial:
1. [Notebook Description](#description)
2. [Notebook Configurations](#configuration)
3. [Modifying Shape](#shape)
4. [`-1` Argument](#-1)
5. [Stacking](#stack)
6. [Splitting](#split)
7. [Summary](#summary)
8. [Reference](#reference)

## <a name="configuration">Notebook Configurations</a>
This notebook will works only on `numpy` module, a popular `python` library for numerical computation. It is common for people to import it using the alias `np`.

In [1]:
### BEGIN SOLUTION
import numpy as np
### END SOLUTION

## <a name="shape">Modifying Shape</a>
There are various ways to modify the shape of an `ndarray`. The common ones will be demonstrated below. Note that all of them can be called using `numpy` method and pass in the original `ndarray` as an argument or by accessing the methods of the original `ndarray`. All of the methods below return a `view` of the original array, and not the `copy`.

In [2]:
# create a ndarray with randomized value
### BEGIN SOLUTION
np.random.seed(38)
arr_1 = np.random.rand(3, 5)
arr_1
### END SOLUTION

array([[0.38477312, 0.85970785, 0.94419964, 0.70282489, 0.6336341 ],
       [0.60596128, 0.20012684, 0.38738789, 0.25898316, 0.07460728],
       [0.28095697, 0.43843415, 0.48324904, 0.86848949, 0.52962938]])

In [3]:
# ravel method returns a contiguous flattened ndarray
### BEGIN SOLUTION
np.ravel(arr_1)
arr_1.ravel()
### END SOLUTION

array([0.38477312, 0.85970785, 0.94419964, 0.70282489, 0.6336341 ,
       0.60596128, 0.20012684, 0.38738789, 0.25898316, 0.07460728,
       0.28095697, 0.43843415, 0.48324904, 0.86848949, 0.52962938])

In [4]:
# reshape method returns an array with a modified shape
### BEGIN SOLUTION
arr_1.reshape(5, 3)
np.reshape(arr_1, (5, 3))
### END SOLUTION

array([[0.38477312, 0.85970785, 0.94419964],
       [0.70282489, 0.6336341 , 0.60596128],
       [0.20012684, 0.38738789, 0.25898316],
       [0.07460728, 0.28095697, 0.43843415],
       [0.48324904, 0.86848949, 0.52962938]])

In [5]:
# transpose method returns a transposed matrix
### BEGIN SOLUTION
arr_1.T
np.transpose(arr_1)
### END SOLUTION

array([[0.38477312, 0.60596128, 0.28095697],
       [0.85970785, 0.20012684, 0.43843415],
       [0.94419964, 0.38738789, 0.48324904],
       [0.70282489, 0.25898316, 0.86848949],
       [0.6336341 , 0.07460728, 0.52962938]])

If one wants to straightaway modifies the `ndarray`, one can use `resize`.

In [6]:
### BEGIN SOLUTION
np.random.seed(38)
arr_2 = np.random.rand(2,5)
arr_2
### END SOLUTION

array([[0.38477312, 0.85970785, 0.94419964, 0.70282489, 0.6336341 ],
       [0.60596128, 0.20012684, 0.38738789, 0.25898316, 0.07460728]])

In [7]:
### BEGIN SOLUTION
arr_2.resize(5, 2) # in-place modifications
arr_2
### END SOLUTION

array([[0.38477312, 0.85970785],
       [0.94419964, 0.70282489],
       [0.6336341 , 0.60596128],
       [0.20012684, 0.38738789],
       [0.25898316, 0.07460728]])

## <a name="-1">`-1` Argument</a>
Sometimes, it might be difficult to compute what are the actual dimension for the axes. We can leverage on `numpy`'s module to compute the dimension size. `-1` argument tells `numpy` to compute the required dimension for that particular axis, while satisfying the condition of the rest of the axes (which we need to specify the dimensions that we want explicitly). 

Example shown below:

In [8]:
# create a ndarray
### BEGIN SOLUTION
arr_3 = np.arange(20)
arr_3
### END SOLUTION

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [9]:
# create a ndarray with 2 columns
### BEGIN SOLUTION
arr_3.reshape(-1, 2)
### END SOLUTION

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [10]:
# create a ndarray with 4 columns
### BEGIN SOLUTION
arr_3.reshape(-1, 4)
### END SOLUTION

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [11]:
# create a ndarray with 5 rows
### BEGIN SOLUTION
arr_3.reshape(5, -1)
### END SOLUTION

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

## <a name="stack">Stacking</a>
`ndarray` can also be stacked on each other. Some common stacking methods are as below:

In [12]:
### BEGIN SOLUTION
arr_4 = np.arange(20).reshape(2, 10)
arr_4
### END SOLUTION

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])

In [13]:
### BEGIN SOLUTION
arr_5 = np.arange(start=20, stop=40).reshape(2, 10)
arr_5
### END SOLUTION

array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

In [14]:
# stacking vertically on first axis
### BEGIN SOLUTION
print(np.vstack((arr_4, arr_5)))
### END SOLUTION

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]]


In [15]:
# stacking horizontally on second axis
### BEGIN SOLUTION
print(np.hstack((arr_4, arr_5)))
### END SOLUTION

[[ 0  1  2  3  4  5  6  7  8  9 20 21 22 23 24 25 26 27 28 29]
 [10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39]]


In order to make sense of how the stacking actually works, one can take a look at the `shape` of `ndarray`. The axis with increased size is the axis that the stacking function is operating on. For example, `numpy.hstack` increases the size of second axis, while `numpy.vstack` increases the size of first axis.

In [16]:
### BEGIN SOLUTION
np.hstack((arr_4, arr_5)).shape
### END SOLUTION

(2, 20)

In [17]:
### BEGIN SOLUTION
np.vstack((arr_4, arr_5)).shape
### END SOLUTION

(4, 10)

`numpy.concatenate` allows for more control as users are able to specify the axis for which concatenation should happen.

In [18]:
# concat along dimension 1
### BEGIN SOLUTION
print(np.concatenate((arr_4, arr_5), 1)) 
### END SOLUTION

[[ 0  1  2  3  4  5  6  7  8  9 20 21 22 23 24 25 26 27 28 29]
 [10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39]]


In [19]:
# concat along dimension 0
### BEGIN SOLUTION
print(np.concatenate((arr_4, arr_5), 0)) 
### END SOLUTION

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]]


## <a name="split">Splitting</a>
Users can also split one `ndarray` into a few. `numpy.hsplit` splits an `ndarray` along the horizontal axis (column-wise), while `numpy.vsplit` splits an `ndarray` along the vertical axis (row-wise). For the second argument, users can specify either the amount of `ndarray` to be splitted into, or the columns after which splitting should occur.

In [20]:
### BEGIN SOLUTION
arr_6 = np.arange(36).reshape(6,6)
arr_6
### END SOLUTION

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [21]:
# indexing the first ndarray that was spiltted using hsplit method
### BEGIN SOLUTION
np.hsplit(arr_6, 2)[0]
### END SOLUTION

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14],
       [18, 19, 20],
       [24, 25, 26],
       [30, 31, 32]])

In [22]:
# indexing the second ndarray that was spiltted using hsplit method
### BEGIN SOLUTION
np.hsplit(arr_6, 2)[1]
### END SOLUTION

array([[ 3,  4,  5],
       [ 9, 10, 11],
       [15, 16, 17],
       [21, 22, 23],
       [27, 28, 29],
       [33, 34, 35]])

In [23]:
# split arr_6 column-wise after the third and fifth column
### BEGIN SOLUTION
np.hsplit(arr_6, (2,4))
### END SOLUTION

[array([[ 0,  1],
        [ 6,  7],
        [12, 13],
        [18, 19],
        [24, 25],
        [30, 31]]),
 array([[ 2,  3],
        [ 8,  9],
        [14, 15],
        [20, 21],
        [26, 27],
        [32, 33]]),
 array([[ 4,  5],
        [10, 11],
        [16, 17],
        [22, 23],
        [28, 29],
        [34, 35]])]

In [24]:
# let's recall how arr_6 looks like
### BEGIN SOLUTION
arr_6
### END SOLUTION

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [25]:
# indexing the first ndarray that was spiltted using vsplit method
### BEGIN SOLUTION
print(np.vsplit(arr_6, 2)[0])
### END SOLUTION

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]]


In [26]:
# indexing the first ndarray that was spiltted using vsplit method
### BEGIN SOLUTION
print(np.vsplit(arr_6, 2)[1])
### END SOLUTION

[[18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]


In [27]:
# split arr_6 row-wise after the third and fifth row
### BEGIN SOLUTION
np.vsplit(arr_6, (2,4))
### END SOLUTION

[array([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]]),
 array([[12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23]]),
 array([[24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35]])]

One last splitting method that will be introduce here is `numpy.array_split`. Similar with `concatenate`, it allows users to specify which axis to split.

In [28]:
# splitting into 3 ndarrays along axis 0
### BEGIN SOLUTION
np.array_split(arr_6, 3, 0)
### END SOLUTION

[array([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]]),
 array([[12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23]]),
 array([[24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35]])]

In [29]:
# splitting into 2 ndarrays along axis 1
### BEGIN SOLUTION
np.array_split(arr_6, 3, 1)
### END SOLUTION

[array([[ 0,  1],
        [ 6,  7],
        [12, 13],
        [18, 19],
        [24, 25],
        [30, 31]]),
 array([[ 2,  3],
        [ 8,  9],
        [14, 15],
        [20, 21],
        [26, 27],
        [32, 33]]),
 array([[ 4,  5],
        [10, 11],
        [16, 17],
        [22, 23],
        [28, 29],
        [34, 35]])]

##  <a name="summary">Summary</a>
To conclude, you should now be able to:
1. Modify shape of a `numpy ndarray`
2. Apply `-1` as argument when performing shape manipulation
3. Stack different `numpy ndarray`s together
4. Split `numpy ndarray` into smaller chunks<br><br>
Congratulations, that concludes this lesson.    

## <a name="reference">Reference</a>
* [NumPy Quickstart](https://numpy.org/doc/stable/user/quickstart.html)