<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# Comments

**What's good:** All exercises completed. In some parts you tried out your own examples and messing around with the code.

**What's not good:** Same problem as in Storing Data (Need)

"NumPy has converted everything into strings" -- Not entirely correct because (see example below) #1 is not a string but #2 is:

np_array_2d[1] #1  
np_array_2d[1][0] #2

Secondly, an exceedingly large fraction of the work was copied from the 2273 site which makes me hesitant to confirm that you understand all the content in this chapter. While I will pass this notebook once the above error is amended, I strongly encourage trying your own examples in the submitted Jupyter notebook instead of copy-pasting from the website.

**Actionable:** Correct the first problem addressed above.

Good-to-know (Optional): NIL

Jupiter has 2 modes: one to write stuff and one that is the command mode
- if need to undo a deleted cell, press esc + z
You can search the functions under help. 

If you want to centralise math in markdown cell, use it as a block ($$)  
other ways of making code look nicer:
- \begin{align}
- \end{align}

$$\sqrt{b^2-4ac}$$

$$
\begin{align}
a=b\\
c=d\\
e=f
\end{align}
$$



- also where you put the ampersand sign will cause it to each other

$$
\begin{align}
a=&b\\
&c=d\\
e=f
\end{align}
$$



# What to expect in this chapter

- accessing and modifying lists, arrays, dictionaries
- this is important because most of what you do with programming is related to accessing and changing data.
- gain a better understanding of the differences and similarities between lists, NumPy arrays and dictionaries.

In [4]:
import numpy as np

# 1 Subsetting: Indexing and Slicing

- **indexing** is picking a single element
- **slicing** is selecting a range of elements
- **subsetting** means "to select"

In [30]:
py_superhero_info = [['Natasha Romanoff', 'Black Widow'],   #0, -3
                     ['Tony Stark', 'Iron Man'],            #1, -2
                     ['Stephen Strange', 'Doctor Strange']] #2, -1

## 1.1 Lists & Arrays in 1D | Subsetting & Indexing

In [34]:
py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]

In [35]:
py_list[0]

'a1'

In [36]:
py_list[-1]

'j10'

In [37]:
py_list[0:3] #slice of list, will get the difference of the number of elements - won't see the last one 

['a1', 'b2', 'c3']

In [38]:
py_list[1:3]

['b2', 'c3']

In [46]:
py_list[2:-3]

['c3', 'd4', 'e5', 'f6', 'g7']

In [43]:
py_list[1:6]

['b2', 'c3', 'd4', 'e5', 'f6']

In [45]:
py_list[1:6:2] #take steps of 2

['b2', 'd4', 'f6']

In [49]:
py_list[5:] #get everything up to the end

['f6', 'g7', 'h8', 'i9', 'j10']

In [48]:
py_list[5::2] #get every 2nd number up to the end

['f6', 'h8', 'j10']

In [50]:
py_list[:5] #get from the start to the 5th index, likeputting a 0

['a1', 'b2', 'c3', 'd4', 'e5']

In [51]:
py_list[5:2:-1]

['f6', 'e5', 'd4']

In [52]:
py_list[5:2] #does not work because default step size is 1

[]

- cool and useful thing: reverse the list
- eg when you are more interested in your last datapoint than the first

In [54]:
py_list[::-1]

['j10', 'i9', 'h8', 'g7', 'f6', 'e5', 'd4', 'c3', 'b2', 'a1']

## 1.2 Arrays only | Subsetting by masking

In [55]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [56]:
np_array > 3

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

mask: show me all the true ones and don't show me the false ones
- acts like a mask, allowing only the true subset to be seen
- subsetting by masking only works w NumPy arrays

In [59]:
my_mask = np_array > 3
np_array[my_mask]

array([ 4,  5,  6,  7,  8,  9, 10])

- instead of creating another variable, there is a more succinct way to do it

In [61]:
np_array[np_array > 3]

array([ 4,  5,  6,  7,  8,  9, 10])

**Example 1**  
invert our mask by using ~ the Bitwise Not operator

In [63]:
np_array[~(np_array > 3)]                 # '~' means 'NOT'

array([1, 2, 3])

**Example 2**  
combine one mask AND another mask, shows something only if both masks are true
- brackets are essential to clarify what you want the mask to do

In [64]:
np_array[(np_array > 3) & (np_array < 8)] # '&' means 'AND'

array([4, 5, 6, 7])

**Example 3**  
combine one mask OR another mask, shows something if either mask is true

In [65]:
np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'

array([ 1,  2,  9, 10])

## 1.3 Lists & Arrays in 2D | Indexing & Slicing

In [5]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

In [6]:
py_list_2d

[[1, 'A'],
 [2, 'B'],
 [3, 'C'],
 [4, 'D'],
 [5, 'E'],
 [6, 'F'],
 [7, 'G'],
 [8, 'H'],
 [9, 'I'],
 [10, 'J']]

In [7]:
np_array_2d

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C'],
       ['4', 'D'],
       ['5', 'E'],
       ['6', 'F'],
       ['7', 'G'],
       ['8', 'H'],
       ['9', 'I'],
       ['10', 'J']], dtype='<U21')

- **amended based on comment**  
NumPy has converted the elements into the same type. 

In [13]:
np_array_2d[1] #this is a slice of the NumPy array

array(['2', 'B'], dtype='<U21')

In [12]:
np_array_2d[1][0] #this is a string

'2'

**Example 1** What is at position 4 (index 3)

In [76]:
py_list_2d[3]

[4, 'D']

In [77]:
np_array_2d[3]

array(['4', 'D'], dtype='<U21')

**Example 2** What is the FIRST element at position 4 (index 3)

In [82]:
py_list_2d[3][0]

4

In [79]:
np_array_2d[3, 0]

'4'

- the array version is easier to play with

**Example 3** What are the first three elements

In [80]:
py_list_2d[:3]

[[1, 'A'], [2, 'B'], [3, 'C']]

In [81]:
np_array_2d[:3]

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U21')

**Example 4** NumPy arrays works very differently from lists

In [83]:
py_list_2d[:3][0]

[1, 'A']

In [85]:
np_array_2d[:3, 0]

array(['1', '2', '3'], dtype='<U21')

**Example 5** to get everything, use :

In [89]:
py_list_2d[3:6][0]

[4, 'D']

In [94]:
np_array_2d[3:6, 0] #applies it to all the elements in the array

array(['4', '5', '6'], dtype='<U21')

In [95]:
np_array_2d[:, 0]

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')

- understand the features to implement them when it is necessary

### More Examples

**subsetting by masking** for np arrays

In [25]:
ten_integers = list(range(13,23))
int_array = np.array(ten_integers)
print(int_array)

[13 14 15 16 17 18 19 20 21 22]


In [26]:
int_array[(int_array<20) & (int_array>15)] #bitwise AND &

array([16, 17, 18, 19])

In [27]:
int_array[(int_array<18) | (int_array>14)] #bitwise OR |

array([13, 14, 15, 16, 17, 18, 19, 20, 21, 22])

In [28]:
int_array[~(int_array>=16)] #bitwise NOT ~

array([13, 14, 15])

In [30]:
int_array[int_array % 2 == 0] #show only the even numbers

array([14, 16, 18, 20, 22])

- more subsetting by masking

In [43]:
more_integers = list(range(91,210,7))
sevens = np.array(more_integers)
print(sevens)

[ 91  98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203]


In [51]:
sevens[sevens % 3 == 0] #show numbers divisible by 3

array([105, 126, 147, 168, 189])

In [57]:
sevens[(~(sevens % 3 == 0)) & (sevens % 2 == 0)] #show even numbers not divisible by 3

array([ 98, 112, 140, 154, 182, 196])

In [58]:
sevens[(sevens <= 140) | (sevens % 3 == 0)] #show numbers less than 140 or divisible by 3

array([ 91,  98, 105, 112, 119, 126, 133, 140, 147, 168, 189])

**indexing and slicing 2d lists and arrays**

In [62]:
mrt_2d = [[3, "Esplanade"],
          [4, "Promenade"], 
          [5, "Nicoll"], 
          [6, "Stadium"], 
          [7, "Mountbatten"], 
          [8, "Dakota"]]
print(mrt_2d)
print('---')
mrt2d_array = np.array(mrt_2d)
print(mrt2d_array)

[[3, 'Esplanade'], [4, 'Promenade'], [5, 'Nicoll'], [6, 'Stadium'], [7, 'Mountbatten'], [8, 'Dakota']]
---
[['3' 'Esplanade']
 ['4' 'Promenade']
 ['5' 'Nicoll']
 ['6' 'Stadium']
 ['7' 'Mountbatten']
 ['8' 'Dakota']]


In [81]:
stations_arr = mrt2d_array[:, 1] #all the station names, which are in the 2nd column 2d array - indexed as column 1
print(stations_arr)
ccnumbers_arr = mrt2d_array[:, 0] #all the station names, which are in the 2nd column 2d array - indexed as column 1
print(ccnumbers_arr)

['Esplanade' 'Promenade' 'Nicoll' 'Stadium' 'Mountbatten' 'Dakota']
['3' '4' '5' '6' '7' '8']


In [92]:
mrt_2d[0:3] #the first element has index of 0

[[3, 'Esplanade'], [4, 'Promenade'], [5, 'Nicoll']]

In [82]:
#using list comprehension (from loops good) to extract the station names from the 2d list

[station[1] for station in mrt_2d]

['Esplanade', 'Promenade', 'Nicoll', 'Stadium', 'Mountbatten', 'Dakota']

## 1.4 Growing lists

- lists are easy and efficient to grow
- NumPy arrays are good and have more intuitive slicing syntax, useful for fast math operations if you do not change their size

**Example 1** creating a larger list from a smaller one

In [96]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

**Example 2** 3 ways to grow the list 

In [98]:
x=[1]
x= x + [2]
x= x + [3]
x= x + [4]
x

[1, 2, 3, 4]

In [99]:
x=[1]
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

- different execution speeds
- append runs 1.5 times faster than the rest

In [100]:
x=[1]
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

**Example 3** 3 ways of incorporating multiple elements.
- Notice the difference between the effects of extend() and append().

In [102]:
x = [1, 2, 3]
x += [4, 5, 6]
x

[1, 2, 3, 4, 5, 6]

In [103]:
x=[1, 2, 3]
x.extend([4, 5, 6])
x

[1, 2, 3, 4, 5, 6]

In [104]:
x=[1, 2, 3]
x.append([4, 5, 6])
x

[1, 2, 3, [4, 5, 6]]

In [105]:
x=[1,2,3]
[1,4,5] + x #add to the front of the list but may be slow

[1, 4, 5, 1, 2, 3]

- computer memory is like boxes, adding to the things at the front using .prepend() may be slow because it writes new list
- write in a way that makes sense, speed considerations later

# Some loose ends

## 1.5 Tuples

- another data storage structure
- **Tuples** are similar to lists, except they use () and cannot be modified after creation (**immutable**)
  - we can access data in the same way but CANNOT CHANGE data

In [106]:
a=(1, 2, 3)     # Define tuple

In [107]:
print(a[0])    # Access data

1


In [108]:
# The following will NOT work
a[0]=-1
a[0]+= [10]

TypeError: 'tuple' object does not support item assignment

In [110]:
c=[1,2,3,4,5]

In [114]:
c[0]=-100 #change first element to -100

In [115]:
c

[-100, 2, 3, 4, 5]

In [117]:
c_tuple=(1,2,3,4,5)

In [119]:
c_tuple[0] =-100

TypeError: 'tuple' object does not support item assignment

- good to have things available to have failsafes for daya

## 1.6 Be VERY careful when copying

- be particularly mindful when making copies of lists and arrays.
- use copy() to be safe

In [127]:
x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!

print(f'{id(x)=}, {id(y)=}, {id(z)=}')

id(x)=4436357888, id(y)=4436357888, id(z)=4436357888


- NEVER do this^^ as far as python is concerned, x y and z are all the same
- always use the copy function

In [129]:
x=[1, 2, 3]
y=x.copy()
z=x.copy()

print(f'{id(x)=}, {id(y)=}, {id(z)=}')

id(x)=4436517312, id(y)=4436367616, id(z)=4436369088


- different variables, working as intended

# Exercises & Self-Assessment

**Growing lists**

In [108]:
threes_list = list(range(0,30,3))
print(threes_list)

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]


In [98]:
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27] * 2

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

In [109]:
threes_list = threes_list + [2,4,6,8,10]
threes_list

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 2, 4, 6, 8, 10]

In [110]:
threes_list+=[7,14]
threes_list

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 2, 4, 6, 8, 10, 7, 14]

In [115]:
threes_list.append(100) #ran the cell 4 times
threes_list

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 2, 4, 6, 8, 10, 7, 14, 100, 100, 100, 100]

**multiple elements when growing lists**

In [118]:
randomlist = list(range(15,20))

In [119]:
randomlist

[15, 16, 17, 18, 19]

In [120]:
randomlist+=['hello', 20, 22]
randomlist

[15, 16, 17, 18, 19, 'hello', 20, 22]

In [121]:
randomlist.extend([23, 24])

In [122]:
randomlist

[15, 16, 17, 18, 19, 'hello', 20, 22, 23, 24]

In [127]:
randomlist.append([26,27,28])

In [128]:
randomlist

[15, 16, 17, 18, 19, 'hello', 20, 22, 23, 24, [26, 27, 28]]

- extend and append only work on the end of the list

**other edits that can be made to lists**
- insert()
    - first number is the index, next is whatever element you want inserted
- remove.() -- i made a mistake counting the index i was inserting "25" so i used it to remove the wrong number
    - removes the first occurence of that thing in the list (if it exists)
- can replace element by assigning a new thing to its index in the list

In [135]:
randomlist.insert(10,25)

In [136]:
randomlist

[15, 16, 17, 18, 19, 'hello', 20, 22, 23, 24, 25, [26, 27, 28]]

In [137]:
randomlist[5]= "goodbye"
randomlist

[15, 16, 17, 18, 19, 'goodbye', 20, 22, 23, 24, 25, [26, 27, 28]]

## Footnotes