___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# NumPy Indexing and Selection

In this lecture we will discuss how to select elements or groups of elements from an array.

In [36]:
import numpy as np

In [37]:
#Creating sample array
arr = np.arange(0,11)

In [38]:
#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

## Bracket Indexing and Selection
The simplest way to pick one or some elements of an array looks very similar to python lists:

In [39]:
#Get a value at an index
arr[8]

# this is returning the value at index 8 which is 8 itself
# array has simila properties to a list so you can use slice notation (the colon mark)

8

In [40]:
#Get values in a range
arr[1:5]

# starting index 1 (and including) and stop of 5th index
# indexing starts at 0, and does not include 5 

array([1, 2, 3, 4])

In [41]:
#Get values in a range
arr[0:5]

array([0, 1, 2, 3, 4])

In [42]:
arr[:6]
# you dont have to specify starting index
# "0" is the 0th index but the first element of the array

array([0, 1, 2, 3, 4, 5])

In [43]:
arr[5:]
# you dont have to specify ending index
# "5" is the 6th element in the array, but it is located at index 5 (just like how it would be at a normal python list)
# arr[5:] grabs everything beyond the 5th index (which is everything starting AFTER the value "4") which is the value "5"

array([ 5,  6,  7,  8,  9, 10])

## Broadcasting

Numpy arrays differ from a normal Python list because of their ability to broadcast:

In [44]:
#Setting a value with index range (Broadcasting)
arr[0:5]=100

#Show
arr

# this will broadcast the value "100" to the first 5 elements of the array

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [45]:
# Reset array, we'll see why I had to reset in  a moment
arr = np.arange(0,11)

#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [46]:
slice_of_arr = arr[0:6]

slice_of_arr

array([0, 1, 2, 3, 4, 5])

In [47]:

slice_of_arr[:]=99 

# colon notation with no numbers means everything inside the array, we are broadcasting the array

slice_of_arr 
# the sliced version of the array's numbers were changed to "99"

array([99, 99, 99, 99, 99, 99])

Now note the changes also occur in our original array!

In [48]:
# the original "arr" has 11 numbers from "0" to "10" (including "0")

arr

# it changed the original array that was called "arr[0:6]"
# the data is not copied but it is viewed of the original array
# numpy does this to avoid memory issues with large arrays meaning numpy will not set automatic copies of the arrays
# meaning if you want an actual copy and not reference the original array, you need to specify "copy"

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

Data is not copied, it's a view of the original array! This avoids memory problems!

In [49]:
#To get a copy, need to be explicit
arr_copy = arr.copy()

arr_copy

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

In [50]:
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

In [51]:
# if we broadcast the copy to have 100...

arr_copy[:]=100

In [52]:
# original arr is unaffected
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

In [53]:
arr_copy

# the premise is that if you grab a slice of the array and set it as as variable without explicitly saying you want a .copy()
# of the array, you should keep in mind that you are just "viewing" a link to the original array and that changes you do, it 
# will affect that original array. 

array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100])

## Indexing a 2D array (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**. I recommend usually using the comma notation for clarity.

In [54]:
arr_2d = np.array([[5,10,15],[20,25,30],[35,40,45]])

#Show
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [55]:
# if we wanted to grab the digit "5"

arr_2d[0][0]

# first pass in "0" is row
# second pass "0" is column

5

In [56]:
# if we wanted to grab the digit "25"

arr_2d[1][1]

# or you can do arr_2d[1,1]

25

In [57]:
arr_2d[1,1]

25

In [58]:
#Indexing entire row

arr_2d[1]


array([20, 25, 30])

In [59]:
# Format is arr_2d[row][col] or arr_2d[row,col]

# Getting individual element value

arr_2d[1][0]

20

In [60]:
# Getting individual element value

arr_2d[1,0]

# this kind of notation can just get rid of double bracket

20

In [61]:
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [62]:
# 2D array slicing

# slicing is to grab certain sections of the array instead of single elements

#Shape (2,2) from top right corner

arr_2d[:2,1:]

# the ":2" is grabbing all rows BEFORE the 2nd element (a.k.a. third row) -- grab rows 1, and 2, and do not include 3
# the "1:" is grabbing all columns after 1st element (a.k.a. second column) onward -- grab columns 2, 3, do not include column 1

array([[10, 15],
       [25, 30]])

In [63]:
arr_2d[:2]

array([[ 5, 10, 15],
       [20, 25, 30]])

In [64]:
arr_2d[,1:]

SyntaxError: invalid syntax (4107186277.py, line 1)

In [65]:
#Shape bottom row

arr_2d[2]

array([35, 40, 45])

In [66]:
#Shape bottom row

arr_2d[2,:]

array([35, 40, 45])

## conditional selection


In [67]:
arr2 = np.arange(1,16)

In [68]:
arr2

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [71]:
bool_arr = arr2 > 5 # comparison operator
bool_arr

array([False, False, False, False, False,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True])

In [73]:
# can pass this into brackets and conditionally select data

arr2[bool_arr]

# i will only get the results/instances where this boolean array is true

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [74]:
# you typically do this all in 1 step:

arr2[arr2>5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [76]:
arr2[arr2=<3]

# it doesn't like the less than or equal to symbol

SyntaxError: invalid syntax (3764936926.py, line 1)

In [78]:
arr2[((arr2<3) or (arr2==3))]

# it also doesn't like this

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [80]:
arr2[arr2==3]

array([3])

In [81]:
my_arr = np.arange(50).reshape(4,4)

# cannot reshape properly will miss data

ValueError: cannot reshape array of size 50 into shape (4,4)

In [83]:
my_arr = np.arange(50).reshape(5,10)
my_arr

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [84]:
# this is to the 1st row (skipping over 0th row) to 3rd (but not inclusive of 3rd row)

my_arr[1:3,]


array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [87]:
my_arr[1:3,3:5]

array([[13, 14],
       [23, 24]])

## NumPy Operations

In [88]:
import numpy as np
arr3 = np.arange(0,21)

In [89]:
arr3

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

In [90]:
# if you wanted to add arrays together at an element by element basis, you can just literally "+" them together

arr3 + arr3 

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40])

In [91]:
arr3 - arr3

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [92]:
arr3 * arr3

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144,
       169, 196, 225, 256, 289, 324, 361, 400])

In [104]:
arr3**3

array([   0,    1,    8,   27,   64,  125,  216,  343,  512,  729, 1000,
       1331, 1728, 2197, 2744, 3375, 4096, 4913, 5832, 6859, 8000],
      dtype=int32)

In [93]:
arr3**arr3

array([          1,           1,           4,          27,         256,
              3125,       46656,      823543,    16777216,   387420489,
        1410065408,  1843829075,  -251658240, -1692154371, -1282129920,
        1500973039,           0,  1681328401,   457441280,  -306639989,
                 0], dtype=int32)

In [94]:
# this is scalar operation, a single number
# numpy broadcasts this single number to every element in the array so every number is changed by that single number

arr3 + 100

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
       113, 114, 115, 116, 117, 118, 119, 120])

In [96]:
arr3 - 10

array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9,  10])

In [97]:
arr3/2

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,
        5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5, 10. ])

In [98]:
# you cannot divide by 0

0/0

ZeroDivisionError: division by zero

In [100]:
arr3/arr3

# it will issue a warning ("nan" or "null object") for 0/0 but the rest is 1.0 (or 1)
# that means the code will continue to run, you will not get an error, but you will get a null object

  arr3/arr3


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [101]:
# if you do 1/0, it will show infinity ("inf) instead of "null"

1/arr3

  1/arr3


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111,
       0.1       , 0.09090909, 0.08333333, 0.07692308, 0.07142857,
       0.06666667, 0.0625    , 0.05882353, 0.05555556, 0.05263158,
       0.05      ])

In [102]:
1/0

ZeroDivisionError: division by zero

### Universal Array Functions

In [105]:
np.sqrt(arr3)

# takes square route of everything in the array

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ,
       3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739,
       3.87298335, 4.        , 4.12310563, 4.24264069, 4.35889894,
       4.47213595])

In [106]:
np.exp(arr3)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03, 2.20264658e+04, 5.98741417e+04,
       1.62754791e+05, 4.42413392e+05, 1.20260428e+06, 3.26901737e+06,
       8.88611052e+06, 2.41549528e+07, 6.56599691e+07, 1.78482301e+08,
       4.85165195e+08])

In [107]:
np.max(arr3)

20

In [108]:
arr3.max()

20

In [109]:
np.sin(arr3)

# this will pass in every element into sine

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849,
       -0.54402111, -0.99999021, -0.53657292,  0.42016704,  0.99060736,
        0.65028784, -0.28790332, -0.96139749, -0.75098725,  0.14987721,
        0.91294525])

In [110]:
np.cos(arr3)

# this will pass in every element into cosine

array([ 1.        ,  0.54030231, -0.41614684, -0.9899925 , -0.65364362,
        0.28366219,  0.96017029,  0.75390225, -0.14550003, -0.91113026,
       -0.83907153,  0.0044257 ,  0.84385396,  0.90744678,  0.13673722,
       -0.75968791, -0.95765948, -0.27516334,  0.66031671,  0.98870462,
        0.40808206])

In [111]:
np.log(arr3)

# this is taking the logarithm of every element
# log of 0 is negative infinity, it will show that as a warning

  np.log(arr3)


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458,
       2.30258509, 2.39789527, 2.48490665, 2.56494936, 2.63905733,
       2.7080502 , 2.77258872, 2.83321334, 2.89037176, 2.94443898,
       2.99573227])

### Fancy Indexing

Fancy indexing allows you to select entire rows or columns out of order,to show this, let's quickly build out a numpy array:

In [21]:
#Set up matrix
arr2d = np.zeros((10,10))

In [22]:
#Length of array
arr_length = arr2d.shape[1]

In [23]:
#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.]])

Fancy indexing allows the following

In [24]:
arr2d[[2,4,6,8]]

array([[ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.]])

In [25]:
#Allows in any order
arr2d[[6,4,2,7]]

array([[ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.]])

## More Indexing Help
Indexing a 2d matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching NumPy indexing to fins useful images, like this one:

<img src= 'http://memory.osu.edu/classes/python/_images/numpy_indexing.png' width=500/>

## Selection

Let's briefly go over how to use brackets for selection based off of comparison operators.

In [28]:
arr4 = np.arange(1,11)
arr4

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [30]:
arr4 > 4

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [31]:
bool_arr4 = arr4>4

In [32]:
bool_arr4

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [33]:
arr4[bool_arr4]

array([ 5,  6,  7,  8,  9, 10])

In [34]:
arr4[arr4>2]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

In [37]:
x = 2
arr4[arr4>x]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

# Great Job!
