<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# Subsetting: Indexing and Slicing

When wanting to subset data (select data),
1. Indexing (Selecting one element)
2. Slicing (Selecting a range of elements)

## Lists & Arrays in 1D | Subsetting & Indexing

In [7]:
import numpy as np

py_list = ["a1", "b2", "c3", "d4", "e5", "f6", "g7", "h8", "i9", "j10"]
np_array = np.array(py_list)

# Pick one
x = py_list  # OR
# x = np_array
print(x)

['a1', 'b2', 'c3', 'd4', 'e5', 'f6', 'g7', 'h8', 'i9', 'j10']


## Arrays only | Subsetting by masking

| Syntax    | Result                          |                                 | Note                                     |
|-----------|---------------------------------|---------------------------------|------------------------------------------|
| x[0]      | First element                   | 'a1'                            |                                          |
| x[-1]     | Last element                    | 'j10'                           |                                          |
| x[0:3]    | Index 0 to 2                    | ['a1','b2','c3']                | Gives  3 − 0 = 3 elements                |
| x[1:6]    | Index 1 to 5                    | ['b2','c3','d4','e5','f6']      | Gives  6 − 1 = 5 elements                |
| x[1:6:2]  | Index 1 to 5 in steps of 2      | ['b2','d4','f6']                | Gives every other of  6 − 1 = 5 elements |
| x[5:]     | Index 5 to the end              | ['f6','g7','h8','i9','j10']     | Gives len(x) − 5 = 5 elements            |
| x[:5]     | Index 0 to 5                    | ['a1','b2','c3','d4','e5']      | Gives  5 − 0 = 5 elements                |
| x[5:2:-1] | Index 5 to 3 (i.e., in reverse) | ['f6','e5','d4']                | Gives  5 − 2 = 3 elements                |
| x[::-1]   | Reverses the list               | ['j10','i9','h8',...,'b2','a1'] |                                          |

Note: If you slice with [i:j], the slice (extract subset) will start at i and end at j-1, giving you a total of j-i elements. <br>
Reverse indexing slicing can be done with [i:j:-1], the slice will start at index i and end at index j+1, moving in reverse direction, thus including elements from i down to j+1 in the result.

In [18]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [19]:
np_array[my_mask] #Boolean masks selects elements which are "True" or np_array[np_array > 3] #Code is np_array[variable]

array([ 4,  5,  6,  7,  8,  9, 10])

In [21]:
np_array[~(np_array > 3)]                 # '~' means 'NOT' called Bitwise Not 

array([1, 2, 3])

In [25]:
np_array[(np_array > 3) & (np_array < 8)] # '&' means 'AND', will combine 2 masks together. Will only show when both masks are TRUE!!

array([4, 5, 6, 7])

In [26]:
np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'. Will only show if either masks is true.

array([ 1,  2,  9, 10])

## Lists & Arrays in 2D | Indexing & Slicing

In [11]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

In [12]:
np_array_2d[3] #Or py_list_2d[3] , Find out element at position 4 (index 3)

array(['4', 'D'], dtype='<U11')

In [15]:
py_list_2d[3] #Difference in the use of either np array or py list is how it returns the entire row in either numpy array or python list. Hence why there is an additional "dtype=" due to numpy array being used.

[4, 'D']

In [31]:
np_array_2d[3, 0] #Or py_list_2d[3][0], Find out FIRST element at position 4 (index 3)

'4'

In [10]:
np_array_2d[:3] #Or py_list_2d[:3],  Find out the first three elements.

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U11')

In [45]:
py_list_2d[:3][0]  #Output is both elements at position 1 (index 0)
# This is because py_list_2d[:3] extracts the first three sublists (rows) of the 2D list.
# [0] then accesses the first sublist from the result of py_list_2d[:3].

[1, 'A']

In [50]:
np_array_2d[:3, 0] #Output is first elements at each of the first 3 elements.
# np_array_2d[:3, 0] directly selects the first column of the first three rows from the NumPy array.

array(['1', '2', '3'], dtype='<U11')

In [54]:
py_list_2d[3:6][0]
#py_list_2d[3:6]: This extracts a sublist from index 3 (inclusive) to index 6 (exclusive).
#[0]: This then accesses the first element from the result obtained in the previous step.

[4, 'D']

In [55]:
np_array_2d[3:6, 0]
#3:6 selects rows from index 3 (inclusive) to index 6 (exclusive). It includes rows at positions 3, 4, and 5.
#, 0 selects the first column of these selected rows.

array(['4', '5', '6'], dtype='<U11')

In [57]:
np_array_2d[:, 0]
# : in the first position (: before the comma) selects all rows of the array.
# , 0 selects the elements from the first column of all rows.

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U11')

## Growing lists

List: Ease and efficiency in growing. <br>
Numpy Arrays: Math operations, slicing syntax (eg. [:3,0]) is more intuitive. <br>
Note: Not recommended to change the size (Add or remove elements) as will destory the array and recreate hence inefficient.
<br>
Other than my CPU ~~burning~~ crashing or looking at task manager to see the lag, we can use good old magic commands %time.

In [25]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

In [19]:
x=[1]
x+= [2]
x+= [3] #+= performs in-place concatenation. It modifies the existing list x by appending the elements from the right-hand side. 
x+= [4]
x   

[1, 2, 3, 4]

In [8]:
x=[1]
x= x + [2]
x= x + [3] #'+' creates a new list each time it is used for concatenation hence very inefficient.
x= x + [4]
x 

[1, 2, 3, 4]

In [4]:
x=[1]
x.append(2) #Append doesn't require resizing the underlying array every time and hence more efficient.
x.append(3)
x.append(4)
x 

[1, 2, 3, 4]

In [10]:
x=[1, 2, 3]
x.extend([4, 5, 6]) #extend method adds each element from the iterable to the end of the list. Really extend the list.
x

[1, 2, 3, 4, 5, 6]

In [13]:
x = [1, 2, 3]
x += [4, 5, 6] #+= operator, when used with lists, is similar to the extend method.
x

[1, 2, 3, 4, 5, 6]

In [12]:
x=[1, 2, 3]
x.append([4, 5, 6]) #append adds the entire list [4, 5, 6] as a single element at the end of the list.
x

[1, 2, 3, [4, 5, 6]]

## Tuples

Tupules are similar to list except they use ( ) and cannot be changed after creation(immutable).

In [15]:
a=(1, 2, 3)     # Define tuple
print(a[0])    # Access data

1


In [16]:
# The following will NOT work
a[0]=-1
a[0]+= [10]

TypeError: 'tuple' object does not support item assignment

## Be VERY careful when copying

In [29]:
x=[1, 2, 3]
y=x.copy()
z=x.copy() #Now, y and z are independent of x. If you modify one, the others remain unchanged
x.extend([4])
y

[1, 2, 3]

Note: Only have to know that you must use copy() to be safe; to understand why view discussion on [mutable and immutable objects](https://stackoverflow.com/questions/10951820/why-make-defensive-copies-in-getters-inside-immutable-classes).