<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# What to expect in this chapter

# 1 Subsetting: Indexing and Slicing

In [None]:
# As in mathematics, subsetting is the process of selecting a number of elements from the original set. \
#     In this case, list or array. 
# Indexing is selecting one single element. 
# Slicing is selecting more than one element in between a range.

In [2]:
import numpy as np

## 1.1 Lists & Arrays in 1D | Subsetting & Indexing

In [4]:
x=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
y=np.array(x)

In [None]:
# We have the following:

![Screenshot%202024-02-11%20at%2018.34.16.png](attachment:Screenshot%202024-02-11%20at%2018.34.16.png)

In [None]:
# The same syntaxes also work for arrays.

## 1.2 Arrays only | Subsetting by masking

In [None]:
# Masking is just like filtering by conditions. 
# Since NumPy operations work directly on the elements of arrays, the filtering condition is also applied to \
#     individual elements of the array. 
# For each element, there are only two possibilities: the element met the condition OR did not. 
# Therefore, the mask itself is a Boolean array (an array of only True and False) of the same length. 
# When applied the mask, only those whose corresponding mask entry is True get to be returned. 
# Moreover, NumPy by default takes the value of True as 1, and the value of False as 0. (This makes the summing \
#     of mask arrays easier.)
# Lastly, logic operations can be directly applied to masks. Instead of 'not', 'and', 'or', their analog '~', \
#     '&', and '|' are being used. 
# Terminology-wise, '~' is called a bitwise NOT, meaning that it flips individual bits. 

In [None]:
# An example:

In [5]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3   # Creating the mask.
my_mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [6]:
np_array[my_mask]    # Applying the mask.

array([ 4,  5,  6,  7,  8,  9, 10])

In [7]:
np_array[np_array > 3]     # Shorten the code to one line.

array([ 4,  5,  6,  7,  8,  9, 10])

In [8]:
np_array[~(np_array > 3)]        # Invert the mask.

array([1, 2, 3])

In [9]:
np_array[(np_array > 3) & (np_array < 8)]     # Combining two masks.

array([4, 5, 6, 7])

In [10]:
np_array[(np_array < 3) | (np_array > 8)]     # Taking the disjunction of two masks.

array([ 1,  2,  9, 10])

## 1.3 Lists & Arrays in 2D | Indexing & Slicing

In [11]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

In [12]:
py_list_2d[3]    # Usual list indexing

[4, 'D']

In [13]:
np_array_2d[3]    # # Usual array indexing

array(['4', 'D'], dtype='<U21')

In [14]:
# Indexing the second layer
py_list_2d[3][0]

4

In [15]:
np_array_2d[3, 0]

'4'

In [None]:
# Notice that list uses two separate brackets, while array uses a single bracket separated by a comma.

In [16]:
# Slicing the first layer
py_list_2d[:3]

[[1, 'A'], [2, 'B'], [3, 'C']]

In [17]:
np_array_2d[:3]    # So far so good...

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U21')

In [18]:
# Trying to slice the first element from the element in the first layer.
py_list_2d[:3][0]

[1, 'A']

In [19]:
np_array_2d[:3, 0]

array(['1', '2', '3'], dtype='<U21')

In [None]:
# Now we got different values...
# The reason for this is because py_list_2d[:3][0] returns the first element of the list \
#     [[1, 'A'], [2, 'B'], [3, 'C']], while np_array_2d[:3, 0] correctly interprets our goal of retrieving \
#     the first element from each of ['1', 'A'], ['2', 'B'], and ['3', 'C']. 

In [None]:
# More example on this:

In [20]:
py_list_2d[3:6][0]

[4, 'D']

In [21]:
np_array_2d[3:6, 0]

array(['4', '5', '6'], dtype='<U21')

In [22]:
np_array_2d[:, 0]

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')

## 1.4 Growing lists

In [None]:
# Growing the size of a list is much simpler and computationally efficient than an array.
# There are multiple ways of growing a list. 

In [23]:
# Method 1
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

In [24]:
# Method 2
x=[1]
x= x + [2]
x= x + [3]
x= x + [4]
x

[1, 2, 3, 4]

In [25]:
# Method 3
x=[1]
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

In [26]:
# Method 4
x=[1]
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

In [None]:
# Amongst these, method 4 is the fastest. 
# Now notice the difference between append() and extend().

In [28]:
x=[1, 2, 3]
x.extend([4, 5, 6])    # extend() appends the element in the function argument to the original list one by one.
x

[1, 2, 3, 4, 5, 6]

In [29]:
x=[1, 2, 3]
x.append([4, 5, 6])    # append() appends the whole thing in the functional argument as one single entity.
x

[1, 2, 3, [4, 5, 6]]

# Some loose ends

## 1.5 Tuples

In [None]:
# Tuple is similar to list, but faster and more boring...
# The only thing that you can do to a tuple is to create (storing data) and retrieve data from it (accessing data).
# A list uses [], whereas a tuple uses ().
# The good thing about a tuple is that it is fast computationally, and you don't have to worry about accidentally \
#     messing up the data stored inside a tuple (this is because tuples are immutable objects).
# The bad thing about a tuple is that there are only very limited operations you can perform on a tuple (this is \
#     also because tuples are immutable objects).
# You cannot change, update, or modify a tuple after you have created it. 
# For example:

In [30]:
a=(1, 2, 3)     # Define tuple

In [31]:
print(a[0])    # Access data

1


In [32]:
# The following will NOT work
a[0]+=[10]

TypeError: unsupported operand type(s) for +=: 'int' and 'list'

## 1.6 Be VERY careful when copying

In [None]:
# Contrasted with tuple, a list is a mutable object (meaning that is can be changed and modified). 
# Therefore the following assignment do NOT creates a new list:

In [33]:
x=[1, 2, 3]
y=x           # DON'T do this!

In [34]:
# Now if we modify y, x will be automatically modified as well. 
y.append(4)
print(x)

[1, 2, 3, 4]


In [None]:
# The reason for this is because y=x merely added an extra nametag ('y') to the existing list. Not that the same \
#    list have two different names ('x' and 'y') pointing to the same object. 
# To correctly introducing a NEW (separate) identical list, one can either create from scratch, or use .copy(). 

In [35]:
x=[1, 2, 3]
y=[1, 2, 3]    # Create a new, identical list from scratch.
y.append(4)
print(x)

[1, 2, 3]


In [36]:
x=[1, 2, 3]
y=x.copy()    # Copy list x and make a separate list out of it. 
y.append(4)
print(x)

[1, 2, 3]
