In [None]:
# Copyright (c) 2020-2021 CertifAI Sdn. Bhd.
# 
# This program is part of OSRFramework. You can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
# 
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

# Array Indexing and Slicing

Array indexing and slicing are methods used in Python to access, modify or restructure the contents stored in data structure objects. <br> In indexing and slicing, we use `[]` or so-called `square brackets` to index/slice data structures. Examples of data structures that could be sliced are lists, tuples, NumPy arrays etc. 

In [2]:
import numpy as np
import sklearn.datasets as dataset

## Indexing
Indexing refers to accessing elements through the use of indices. Most programming languages follows zero-based index.That is index of an array with $n$ elements will start from 0 and end at $n-1$. For example, the first element of an array will have index of 0.

### 1-Dimension Array

Let's try to access the **second element of the array** and name it as variable **`elem`**.
<i> Reminder: 2nd element corresponds to index 1.

In [3]:
array_1d = np.array([1,2,3,4,5])
array_1d.shape
elem = array_1d[1]

### 2-Dimension Matrix

For 2 or more dimensions (>2d tensors), indexing will start from higher dimensions. In matrices, row is a higher dimension than columns, so we will have to specify the row which the element is in, and then the column as well.<br><br>
Let's initialize a $3\times5$ matrix.

In [4]:
# 2d matrix
mat_2d = np.array([[1,4,5,6,7],
                   [2,4,8,5,9],
                   [0,5,4,7,2]])
mat_2d.shape

(3, 5)

Acessing the value **9**, which is the $5^{th}$ element from the $2^{nd}$ row.<br>
<i>Reminder: index = n-1

In [5]:
mat_2d[1][4]

9

### N-Dimension Tensor

For high-dimensionss tensors, indexing will be similar, from high dimensions to low dimensions. We can try this on a 3-dimension tensor. Accesing the element located at depth = 1, row = 2, column = 3, which has the value `2`.

In [6]:
tensor_3d = np.array([[[1,2,3],
                       [3,2,2],
                       [6,4,1],
                       [3,4,5]],
                      [[4,6,7],
                       [5,6,7],
                       [6,6,6],
                       [8,8,8]],
                      [[6,0,1],
                       [2,3,6],
                       [4,5,1],
                       [2,3,7]]])
tensor_3d.shape

(3, 4, 3)

In [7]:
tensor_3d[0][1][2]

2

## Slicing

Slicing is just another way to access values, multiple objects at a time. In slicing, we use `[start:stop:step]` to indicate what we want to slice.<br>
1. `start`: the index which to start. The default will be 0 if not specified.
2. `stop`: the index which slicing stops. This index will not be included into the sliced result. The default will be `len(array)` if not specified.
3. `step`: how much index per step is taken. The default will be 1 if not specified.<br>

Let's use the predefined `array_1d` for example. We want to slice the values from $2^{nd}$ to the $4^{th}$ element.

In [10]:
print(array_1d[1:4])

[2 3 4]


### 2-dimension matrix

In slicing, unlike indexing all dimensions are defined in a single square brackets. The start-stop-step for each dimension will be spllit using a `,` <br><br>
`[start1:stop1:step1, start2:stop2:step2]`.<br><br>Like indexing, the sequence of dimensions are also arranged from high to low. 

We'll try to slice `mat_2d` out to obtain the slices that contain the last three elements of the first two rows. `rows = 1->2`,`columns = 3->5`.

In [11]:
# 2D matrix slicing
mat_2d[0:2,2:5]

array([[5, 6, 7],
       [8, 5, 9]])

Slicing techniques are really handy when it comes to handling real datasets. Here, we are going to try and slice a dataset imported from sklearn.

In [19]:
data = dataset.load_breast_cancer()
print(data.DESCR)

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        worst/largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 0 is Mean Radi

`dataset.load_breast_cancer()` will return us a dataset containing 569 instances regarding breast cancer and the corresponding attributes/characteristics *(e.g area, perimeter and smoothness)* of the tumor. 

For demonstration purposes, let's say that the researcher only wants 5 attriibutes from the first 50 samples. Here we can perform slicing which is very helpful.

In [13]:
# data.data will return us all the features or attributes of all 569 instances of breast cancer
X = data.data 
# data.target will return us 0 or 1, which is the target label of the corresponding instances, 
# showing whether or not the tumor is cancerous
y = data.target

# We only want 50 samples and the first 5 attributes of the data
X.shape, y.shape

((569, 30), (569,))

In [16]:
X = X[0:50,0:5]
y = y[0:50]

In [17]:
print(X.shape)
print(len(y))

(50, 5)
50


## Exercise

1. Initialize a random tensor with 3 dimensions with `shape: 5,3,4` as **`t_1`**.<br>
Print out `t_1`.<br>
<i> Hint: use `np.random.rand()`

In [18]:
t_1 = np.random.rand(5,3,4)
print(t_1)

[[[0.61263696 0.53370284 0.74737916 0.71485898]
  [0.28819845 0.12920936 0.56991834 0.99955624]
  [0.69057516 0.02568579 0.48194794 0.52605437]]

 [[0.04873142 0.92735274 0.5287637  0.90389785]
  [0.96709623 0.96066719 0.56698611 0.78494301]
  [0.55177537 0.76763515 0.4399508  0.24230362]]

 [[0.50927244 0.32257385 0.60250987 0.3687013 ]
  [0.53895145 0.14641481 0.75668748 0.05168186]
  [0.27855938 0.76249359 0.33111207 0.14115507]]

 [[0.10657944 0.4451757  0.05375945 0.12566576]
  [0.35420905 0.57058442 0.36785375 0.90403231]
  [0.32689537 0.67290278 0.56058587 0.15957834]]

 [[0.42411558 0.60763741 0.84964214 0.23079858]
  [0.82072962 0.72075187 0.68063451 0.62713932]
  [0.2752349  0.2150857  0.29600081 0.45597582]]]


2. Index the elements of these dimensions.
<ul>
    <li> column = 2, row = 2, depth = 1
    <li> column = 3, row = 1, depth = 5
        

In [14]:
print(t_1[0][1][1])
print(t_1[4][0][2])

0.40096134726406774
0.8789875876196123


3. Load iris dataset from `sklearn.datasets`. Slice the dataset down to 3 attributes and 30 instances.

In [15]:
import sklearn.datasets as dataset
data = dataset.load_iris()
X = data.data[:30,:3]

In [16]:
X.shape

(30, 3)

In [17]:
print(X)

[[5.1 3.5 1.4]
 [4.9 3.  1.4]
 [4.7 3.2 1.3]
 [4.6 3.1 1.5]
 [5.  3.6 1.4]
 [5.4 3.9 1.7]
 [4.6 3.4 1.4]
 [5.  3.4 1.5]
 [4.4 2.9 1.4]
 [4.9 3.1 1.5]
 [5.4 3.7 1.5]
 [4.8 3.4 1.6]
 [4.8 3.  1.4]
 [4.3 3.  1.1]
 [5.8 4.  1.2]
 [5.7 4.4 1.5]
 [5.4 3.9 1.3]
 [5.1 3.5 1.4]
 [5.7 3.8 1.7]
 [5.1 3.8 1.5]
 [5.4 3.4 1.7]
 [5.1 3.7 1.5]
 [4.6 3.6 1. ]
 [5.1 3.3 1.7]
 [4.8 3.4 1.9]
 [5.  3.  1.6]
 [5.  3.4 1.6]
 [5.2 3.5 1.5]
 [5.2 3.4 1.4]
 [4.7 3.2 1.6]]
