# Array Indexing and Slicing

Array indexing and slicing are methods used in Python to access, modify or restructure the contents stored in data structure objects. <br> In indexing and slicing, we use `[]` or so-called `square brackets` to index/slice data structures. Examples of data structures that could be sliced are lists, tuples, NumPy arrays etc. 

In [1]:
import numpy as np
import sklearn.datasets as dataset

## Indexing
Indexing refers to accessing elements through the use of indices. Most programming languages follows zero-based index.That is index of an array with $n$ elements will start from 0 and end at $n-1$. For example, the first element of an array will have index of 0.

### 1-Dimension Array

Let's try to access the **second element of the array** and name it as variable **`elem`**.
<i> Reminder: 2nd element corresponds to index 1.

In [2]:
array_1d = np.array([1,2,3,4,5])
array_1d.shape
elem = array_1d[1]

### 2-Dimension Matrix

For 2 or more dimensions (>2d tensors), indexing will start from higher dimensions. In matrices, row is a higher dimension than columns, so we will have to specify the row which the element is in, and then the column as well.<br><br>
Let's initialize a $3\times5$ matrix.

In [3]:
# 2d matrix
mat_2d = np.array([[1,4,5,6,7],
                   [2,4,8,5,9],
                   [0,5,4,7,2]])
mat_2d.shape

(3, 5)

Acessing the value **9**, which is the $5^{th}$ element from the $2^{nd}$ row.<br>
<i>Reminder: index = n-1

In [4]:
mat_2d[1][4]

9

### N-Dimension Tensor

For high-dimensionss tensors, indexing will be similar, from high dimensions to low dimensions. We can try this on a 3-dimension tensor. Accesing the element located at depth = 1, row = 2, column = 3, which has the value `2`.

In [5]:
tensor_3d = np.array([[[1,2,3],
                       [3,2,2],
                       [6,4,1],
                       [3,4,5]],
                      [[4,6,7],
                       [5,6,7],
                       [6,6,6],
                       [8,8,8]],
                      [[6,0,1],
                       [2,3,6],
                       [4,5,1],
                       [2,3,7]]])
tensor_3d.shape

(3, 4, 3)

In [6]:
tensor_3d[0][1][2]

2

## Slicing

Slicing is just another way to access values, multiple objects at a time. In slicing, we use `[start:stop:step]` to indicate what we want to slice.<br>
1. `start`: the index which to start. The default will be 0 if not specified.
2. `stop`: the index which slicing stops. This index will not be included into the sliced result. The default will be `len(array)` if not specified.
3. `step`: how much index per step is taken. The default will be 1 if not specified.<br>

Let's use the predifined `array_1d` for example. We want to slice the values from $2^{nd}$ to the $4^{th}$ element.

In [7]:
print(array_1d[2:5])

[3 4 5]


### 2-dimension matrix

In slicing, unlike indexing all dimensions are defined in a single square brackets. The start-stop-step for each dimension will be spllit using a `,` <br><br>
`[start1:stop1:step1, start2:stop2:step2]`.<br><br>Like indexing, the sequence of dimensions are also arranged from high to low. 

We'll try to slice `mat_2d` out to obtain the slices that contain the last three elements of the first two rows. `rows = 1->2`,`columns = 3->5`.

In [8]:
# 2D matrix slicing
mat_2d[0:2,2:5]

array([[5, 6, 7],
       [8, 5, 9]])

Slicing techniques are really handy when it comes to handling real datasets. Here, we are going to try and slice a dataset imported from sklearn.

In [9]:
data = dataset.load_breast_cancer()

For demonstration purposes, let's say that the researcher only wants the first 50 samples and only 5 attributes. Here we can perform slicing which is very helpful.

In [10]:
X = data.data
y = data.target

# We only want 50 samples and the first 5 attributes of the data
X.shape, y.shape

((569, 30), (569,))

In [11]:
X = X[0:51,0:6]
y = y[0:51]

In [12]:
print(len(X))
print(len(y))

51
51


## Exercise

1. Initialize a random tensor with 3 dimensions with `shape: 4,3,5` as **`t_1`**.<br>
Print out `t_1`.<br>
<i> Hint: use `np.random.rand()`

In [13]:
t_1 = np.random.rand(5,3,4)
print(t_1)

[[[0.3700905  0.14111316 0.54295378 0.02864503]
  [0.72032326 0.40096135 0.61231836 0.85396175]
  [0.18921522 0.43302193 0.01705158 0.37309041]]

 [[0.66907635 0.51175778 0.28471508 0.99631568]
  [0.28098545 0.26556543 0.03030671 0.69184659]
  [0.4496471  0.7299634  0.81131992 0.13790986]]

 [[0.20831324 0.73748997 0.35089648 0.0947607 ]
  [0.47417513 0.97667316 0.97539758 0.67948589]
  [0.2432138  0.39900329 0.56654368 0.92470028]]

 [[0.52392922 0.66920179 0.56307211 0.08973273]
  [0.37106134 0.08180103 0.61906946 0.35747787]
  [0.42385947 0.75037602 0.40110625 0.31725548]]

 [[0.32502916 0.98020135 0.87898759 0.75420282]
  [0.92275826 0.38140878 0.83514057 0.90349204]
  [0.85688373 0.04278598 0.43721119 0.53021002]]]


2. Index the elements of these dimensions.
<ul>
    <li> column = 2, row = 2, depth = 1
    <li> column = 3, row = 1, depth = 5
        

In [14]:
print(t_1[0][1][1])
print(t_1[4][0][2])

0.40096134726406774
0.8789875876196123


3. Load iris dataset from `sklearn.datasets`. Slice the dataset down to 3 attributes and 30 instances.

In [15]:
import sklearn.datasets as dataset
data = dataset.load_iris()
X = data.data[:30,:3]

In [16]:
X.shape

(30, 3)

In [17]:
print(X)

[[5.1 3.5 1.4]
 [4.9 3.  1.4]
 [4.7 3.2 1.3]
 [4.6 3.1 1.5]
 [5.  3.6 1.4]
 [5.4 3.9 1.7]
 [4.6 3.4 1.4]
 [5.  3.4 1.5]
 [4.4 2.9 1.4]
 [4.9 3.1 1.5]
 [5.4 3.7 1.5]
 [4.8 3.4 1.6]
 [4.8 3.  1.4]
 [4.3 3.  1.1]
 [5.8 4.  1.2]
 [5.7 4.4 1.5]
 [5.4 3.9 1.3]
 [5.1 3.5 1.4]
 [5.7 3.8 1.7]
 [5.1 3.8 1.5]
 [5.4 3.4 1.7]
 [5.1 3.7 1.5]
 [4.6 3.6 1. ]
 [5.1 3.3 1.7]
 [4.8 3.4 1.9]
 [5.  3.  1.6]
 [5.  3.4 1.6]
 [5.2 3.5 1.5]
 [5.2 3.4 1.4]
 [4.7 3.2 1.6]]
