<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# Chapter Summary

- subset, index, slice definitions  
- slicing = `[i:j]`  
- masking  = `array_name[masking_criteria]`  
    - use of Bitwise NOT `~`, OR `|`, AND `&`
- list & array indexing and slicing differences
- growing list
    - extension
      - `+` or `+=`
      - `extend()`
    - append
      - `append()`
- tuple uses `()`, immutable
- copying = `copy()`  

# Subsetting: Indexing and Slicing

**subset** = select a subset  
**index** = select one element  
**slice** = select a range of elements

## Lists & Arrays in 1D | Subsetting & Indexing

the following applies to both lists & arrays

`[i:j]` = slice start at `i`, end at `j-1`, giving a total of `j-i` element

In [6]:
import numpy as np
py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
np_array=np.array(py_list)

# Pick one
x = py_list  # OR
x = np_array

|Syntax|Result| |Note|
|---|---|---|---|
|`x[0]`|First element|`'a1'`||
|`x[-1]`|Last element|`'j10'`||
|`x[0:3]`|Index 0 to 2|`['a1','b2','c3']`|Gives 3−0=3 elements|
|`x[1:6]`|Index 1 to 5|`['b2','c3','d4','e5','f6']`|Gives 6−1=5 elements|
|`x[1:6:2]`|Index 1 to 5 in steps of 2|`['b2','d4','f6']`|Gives every other of 6−1=5 elements|
|`x[5:]`|Index 5 to the end|`['f6','g7','h8','i9','j10']`|Gives `len(x)`−5=5 elements|
|`x[:5]`|Index 0 to 5|`['a1','b2','c3','d4','e5']`|Gives 5−0=5 elements|
|`x[5:2:-1]`|Index 5 to 3 (i.e., in reverse)|`['f6','e5','d4']`|Gives 5−2=3 elements|
|`x[::-1]`|Reverses the list|`['j10','i9','h8',...,'b2','a1']`||


## Arrays only | Subsetting by masking

**!!!** subsetting by masking **only** works with NumPy arrays  
**mask** = allows only a specific subset to be seen  
`array_name[masking_criteria]`

In [7]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [8]:
# to only show those that are True
np_array[my_mask]

array([ 4,  5,  6,  7,  8,  9, 10])

In [9]:
# OR more succintly
np_array[np_array>3]

array([ 4,  5,  6,  7,  8,  9, 10])

Operator:  
- Bitwise NOT `~` to invert mask
- Bitwise OR `|` show sth if either mask is true
- Bitwise AND `&` show sth only if both masks are true

**!!!** note the use of `()`

In [11]:
np_array[~(np_array > 3)]                 # '~' means 'NOT'

array([1, 2, 3])

In [12]:
np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'

array([ 1,  2,  9, 10])

In [13]:
np_array[(np_array > 3) & (np_array < 8)] # '&' means 'AND'

array([4, 5, 6, 7])

## Lists & Arrays in 2D | Indexing & Slicing

In [15]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

----
single index (e.g. index 3)

In [16]:
py_list_2d[3]

[4, 'D']

In [17]:
np_array_2d[3]

array(['4', 'D'], dtype='<U11')

----
double index (e.g. first element of index 3)  
**!!!** note array only uses single pair of square `[]`

In [18]:
py_list_2d[3][0]

4

In [20]:
np_array_2d[3, 0]

'4'

----
find the first three elements

In [21]:
py_list_2d[:3]

[[1, 'A'], [2, 'B'], [3, 'C']]

In [22]:
np_array_2d[:3]

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U11')

----
**!!!** DIFFERENCES  
list gives the **first element** from `py_list_2d[:3]`  
array gives the **first element of every** `np_array_2d[:3]`

In [23]:
py_list_2d[:3][0]

[1, 'A']

In [24]:
np_array_2d[:3, 0]

array(['1', '2', '3'], dtype='<U11')

----
more e.g.

In [25]:
py_list_2d[3:6][0]

[4, 'D']

In [26]:
np_array_2d[3:6, 0]

array(['4', '5', '6'], dtype='<U11')

use `:` to get **everything**

In [27]:
np_array_2d[:, 0]

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U11')

## Growing lists

NumPy arrays - efficiency in operating all at one go  
list - easy to grow

creating a larger list from a smaller one (polymer from monomers)

In [28]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

three ways to **grow a list**:
1. extension using `+`
2. extension using `+=` shorthand
3. use `append()` as `list_name.append(new_element)`

**!!!**  
`append()` **insert** the new element in, runs 1.5 times faster (see footnote) than the rest  
`extend()` **adds** to the list

In [29]:
x=[1]
x= x + [2]
x= x + [3]
x= x + [4]
x

[1, 2, 3, 4]

In [30]:
x=[1]
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

In [31]:
x=[1]
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

----
differences between `extend()` and `append()`

In [32]:
x=[1, 2, 3]
x.extend([4, 5, 6])
x

[1, 2, 3, 4, 5, 6]

In [33]:
x=[1, 2, 3]
x.append([4, 5, 6])
x

[1, 2, 3, [4, 5, 6]]

## Tuples

**tuples**
- similar to lists BUT
- use `()`
- CANNOT be changed after creation (i.e. immutable) 

In [34]:
a=(1, 2, 3)     # Define tuple
print(a[0])     # Access data

1


In [35]:
# The following will NOT work
a[0]=-1
a[0]+= [10]

TypeError: 'tuple' object does not support item assignment

## Be VERY careful when copying

reasons for using `copy()` see [here](https://sps.nus.edu.sg/sp2273/docs/python_basics/03_storing-data/2_storing-data_good.html#sec-python-variables)

In [36]:
x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!

In [38]:
x=[1, 2, 3]
y=x.copy()    # DO THIS!
z=x.copy()

# Footnotes

The gains in speed are due to NumPy doing things to all the elements in the array in one go. For this, the data needs to be stored in a specific order in memory. Adding or removing elements hinders this optimization. When you change the size of a NumPy array, NumPy destroys the existing array and creates a new one, making it extremely inefficient.