![*INTERTECHNICA - SOLON EDUCATIONAL PROGRAMS - TECHNOLOGY LINE*](https://solon.intertechnica.com/assets/IntertechnicaSolonEducationalPrograms-TechnologyLine.png)

# Python for Data Processing - Numpy Advanced Indexing and Slicing

*Numpy allows advanced indexing via arrays and boolean conditions. This indexing is different from the standard indexing which only uses tuple values*.

## 1. Advanced Indexing

Let's initialize numpy and create some working data.

In [1]:
import numpy as np

x_1d = np.array([ 1,   2,   3,   4,   5,   6,   7,   8,   9,  10])

x_2d = np.array(
      [[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [ 11,  12,  13,  14,  15,  16,  17,  18,  19,  20],
       [ 21,  22,  23,  24,  25,  26,  27,  28,  29,  30],
       [ 31,  32,  33,  34,  35,  36,  37,  38,  39,  40],
       [ 41,  42,  43,  44,  45,  46,  47,  48,  49,  50],
       [ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60],
       [ 61,  62,  63,  64,  65,  66,  67,  68,  69,  70],
       [ 71,  72,  73,  74,  75,  76,  77,  78,  79,  80],
       [ 81,  82,  83,  84,  85,  86,  87,  88,  89,  90],
       [ 91,  92,  93,  94,  95,  96,  97,  98,  99, 100]])

Numpy uses two types of advanced indexing: integer indexing and boolean indexing.

### Advanced integer indexing

This kind of indexing is based on using integer arrays in order to select multipe elements of an array. It is possible to use an array of indexes to obtain a sub-set of the array elements from the unidimensional array:

In [2]:
odd_elements_indexes = [0,2,4,6,8]

print("The odd elements in the array can be used via an array of indexes {} and it is the array {}".format(
    odd_elements_indexes,
    x_1d[odd_elements_indexes]
))

The odd elements in the array can be used via an array of indexes [0, 2, 4, 6, 8] and it is the array [1 3 5 7 9]


We can also use multiple arrays of indexes to access multiple elements from multi-dimensional arrays.  
In our case we can apply this to the two dimensional array created:

In [3]:
index_row = [0, 1, 2]
index_column = [0, 1, 2]
print("The elements at index row {} and index column {} are {}".format(
    index_row,
    index_column,
    x_2d[index_row, index_column]
))

The elements at index row [0, 1, 2] and index column [0, 1, 2] are [ 1 12 23]


In [4]:
index_row = [-1, -2, -3]
index_column = [-1, -2, -3]
print("The elements at index row {} and index column {} are {}".format(
    index_row,
    index_column,
    x_2d[index_row, index_column]
))

The elements at index row [-1, -2, -3] and index column [-1, -2, -3] are [100  89  78]


### Advanced boolean indexing

This advanced indexing occurs when **using arrays of boolean type elements for indexing**, such as may be returned from comparison operators. The array object itself will be used as an element for the boolean expression.  
In this case **only the array elements which have True index values will be selected**.

In [5]:
boolean_index_array = [True, False,  False,  False,  False,  False,  False,  False,  False, True]
x_1d[boolean_index_array]

print("The elements selected by the boolean filter {} are {}".format(
    boolean_index_array,
    x_1d[boolean_index_array]
))

The elements selected by the boolean filter [True, False, False, False, False, False, False, False, False, True] are [ 1 10]


In [6]:
print("The even elements in the array {} are {}".format(
    x_1d,
    x_1d[x_1d % 2 == 0]
))

The even elements in the array [ 1  2  3  4  5  6  7  8  9 10] are [ 2  4  6  8 10]


In [7]:
print("The powers of 2 in the array \n {} \n are: \n {}".format(
    x_2d,
    x_2d[np.log2(x_2d).astype(int)  == np.log2(x_2d)]
))

The powers of 2 in the array 
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]] 
 are: 
 [ 1  2  4  8 16 32 64]


The boolean selection is **a very powerful tool for removal of empty elements** and cleaning up the data.

A standard usage is removal of empty elements:

In [8]:
x_1d_with_empty_data = np.array([ None,   2,   None,   4,   5,   None,   7,   None,   None,  10])

print("Non-empty elements in the array \n {} \n are: \n {}".format(
    x_1d_with_empty_data,
    x_1d_with_empty_data[x_1d_with_empty_data != None]
))

Non-empty elements in the array 
 [None 2 None 4 5 None 7 None None 10] 
 are: 
 [2 4 5 7 10]


Another standard usage is **removal of invalid data**, for example non-numeric values from an array of strings:

In [9]:
x_1d_with_invalid_data = np.array([ "1",   "2",   "three",   "4",   "Invalid",   "6",   "7",   "Eight",   "maybe 9",  10])

print("Numeric elements in the array \n {} \n are: \n {}".format(
    x_1d_with_invalid_data,
    x_1d_with_invalid_data[np.char.isnumeric(x_1d_with_invalid_data)]
))

Numeric elements in the array 
 ['1' '2' 'three' '4' 'Invalid' '6' '7' 'Eight' 'maybe 9' '10'] 
 are: 
 ['1' '2' '4' '6' '7' '10']


# 2. Slicing

*Numpy allows the usage of sub-arrays via the slice notation. The process of using sub-arrays via the slice notation is called slicing.*

The standard notation for slicing is [start:stop:step] where:
    
* **start** is the start index for the slice. By default the start is 0;
* **stop** is the stop index for the slice (the stop value is not included). By default the stop is -1;
* **step** is the step using from one index to another. By default the step is 1. 

This will allow selecting a sub-array from the start index to end index using a specified step. This slice notation can be applied to multiple dimensions as well.

In [10]:
elements_length = 3
print("The first\n {} elements from aray \n {} \n are: \n{}".format(
    elements_length,
    x_1d,
    x_1d[0:elements_length]
))

The first
 3 elements from aray 
 [ 1  2  3  4  5  6  7  8  9 10] 
 are: 
[1 2 3]


In [11]:
print("The array \n {} \n without first and last element is: \n{}".format(
    x_1d,
    x_1d[1:-1]
))

The array 
 [ 1  2  3  4  5  6  7  8  9 10] 
 without first and last element is: 
[2 3 4 5 6 7 8 9]


In [12]:
print("The first row from array \n {} \n is: \n{}".format(
    x_2d,
    x_2d[0,:]    
))

The first row from array 
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]] 
 is: 
[ 1  2  3  4  5  6  7  8  9 10]


In [13]:
print("The first column from array \n {} \n is: \n{}".format(
    x_2d,
    x_2d[:,0]    
))

The first column from array 
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]] 
 is: 
[ 1 11 21 31 41 51 61 71 81 91]


In [14]:
center_length = 2
print("The array center sub-array with length {} \n from array \n {} \n is: \n{}".format(
    center_length,
    x_2d,
    x_2d
      [x_2d.shape[0] // 2  - center_length: x_2d.shape[0] // 2 + center_length,
      x_2d.shape[1] // 2 - center_length: x_2d.shape[1] // 2 + center_length]    
))

The array center sub-array with length 2 
 from array 
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 11  12  13  14  15  16  17  18  19  20]
 [ 21  22  23  24  25  26  27  28  29  30]
 [ 31  32  33  34  35  36  37  38  39  40]
 [ 41  42  43  44  45  46  47  48  49  50]
 [ 51  52  53  54  55  56  57  58  59  60]
 [ 61  62  63  64  65  66  67  68  69  70]
 [ 71  72  73  74  75  76  77  78  79  80]
 [ 81  82  83  84  85  86  87  88  89  90]
 [ 91  92  93  94  95  96  97  98  99 100]] 
 is: 
[[34 35 36 37]
 [44 45 46 47]
 [54 55 56 57]
 [64 65 66 67]]


It is important to note that using a negative step **will select the data from the end to beginning** (practically reversing the array):

In [15]:
print("The array \n {} \n reversed is: \n {}".format(
    x_1d,
    x_1d[::-1]
))

The array 
 [ 1  2  3  4  5  6  7  8  9 10] 
 reversed is: 
 [10  9  8  7  6  5  4  3  2  1]
