<div style="text-align:left;font-size:2em"><span style="font-weight:bolder;font-size:1.25em">SP2273 | Learning Portfolio</span><br><br><span style="font-weight:bold;color:darkred">Storing Data (Good)</span></div>

# What to expect in this chapter

- more details on accessing and modifying lists, arrays and dictionaries.
- better understanding of the differences and similarities between lists, NumPy arrays and dictionaries.

# 1 Subsetting: Indexing and Slicing

- to select a subset of the data from lists and arrays is called **subsetting**
- **indexing** is also a form of subsetting, by selecting only one element.
- **slicing** is to select a range of elements.

## 1.1 Lists & Arrays in 1D | Subsetting & Indexing

Since slicing gives us a range of elements, we must **specify two indices to indicate where to start and end**. The various syntaxes for these are shown in the table below.

The following applies to **both lists and arrays**.

In [1]:
import numpy as np

In [2]:
py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
np_array=np.array(py_list)

# Pick one
x = py_list  # OR
x = np_array

| Syntax    | Result                     | Display                         | Note                                |
| :---------:| :-------------------------- | :-------------------------------: | :-----------------------------------: |
| x[0]      | First element              | 'a'                             |                                     |
| x[-1]     | Last element               | 'j10'                           |                                     |
| x[0:3]    | Index 0 to 2               | ['a1', 'b2', 'c3']              | Gives 3-0=3 elements                |
| x[1:6]    | Index 1 to 5               | ['b2', 'c3', 'd4', 'e5', 'f6']  |  Gives 6-1=5 elements               |
| x[1:6:2]  | Index 1 to 5 in steps of 2 | ['b2', 'd4', 'f6']              | Gives every other of 6-1=5 elements |
| x[5:]     | Index 5 to the end         | ['f6', 'g7', 'h8', 'i9', 'j10'] | Gives len(x)-5=10-5=5 elements      |
| x[:5]     | Index 0 to 5               | ['a1', 'b2', 'c3', 'd4', 'e5']  | Gives 5-0=0 elements                |
| x[5:2:-1] | Index 5 to 3 in reverse    | ['j6', 'e5', 'd4']              | Gives 5-2=3 elements                |
| x[::-1]   | Reverse the list           | ['j10', 'i9', 'h8',...'a1']     |                                     |

### Remember

If you slice with [a:b], the slice will start at a end at b with a total of b-a number of elements. 

## 1.2 Arrays only | Subsetting by masking

In [3]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask #LESS than or EQUAL to 3 is FALSE, MORE than is TRUE]

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [4]:
np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask #LESS than or EQUAL to 3 is FALSE, MORE than is TRUE]

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [5]:
np_array[my_mask] # the FALSE elements are not included in the new array (FALSE elements have been MASKED)

array([ 4,  5,  6,  7,  8,  9, 10])

**Subset masking only works with NumPy arrays**


In [6]:
# or we can achieve the same result more succinctly
np_array[np_array > 3]

array([ 4,  5,  6,  7,  8,  9, 10])

In [7]:
# or you can inver the mask with ~
#~ means NO
np_array[~(np_array>3)]

array([1, 2, 3])

<font color='blue'> ~ is called the **Bitwise Not** operator. </font>

In [8]:
# combining one mask with another mask
# Both mask must have true
np_array[(np_array > 3) & (np_array < 8)] # '&' means 'AND'

array([4, 5, 6, 7])

<font color='blue'> & is called the **Bitwise AND** operator. </font>

In [9]:
#Combine one mask or another mask
#as long as one mask have true 
np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'

array([ 1,  2,  9, 10])

## 1.3 Lists & Arrays in 2D | Indexing & Slicing

The **differences between lists and arrays** become even **more apparent with higher dimensional lists and arrays**. Especially when you try **indexing and slicing in higher dimensions**.

Let’s consider the following 2D list.

In [10]:
py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)

**What is at Index 3 (position 4)?**

In [11]:
py_list_2d[3]

[4, 'D']

In [12]:
np_array_2d[3]

array(['4', 'D'], dtype='<U11')

**What is the FIRST element at Index 3 (position 4)?**

Both have different syntax

In [13]:
py_list_2d[3][0]

4

In [14]:
np_array_2d[3,0] #only need a single pair of []

'4'

**What are the FIRST THREE position?**

In [15]:
py_list_2d[:3]

[[1, 'A'], [2, 'B'], [3, 'C']]

In [16]:
np_array_2d[:3]

array([['1', 'A'],
       ['2', 'B'],
       ['3', 'C']], dtype='<U11')

**What are the FIRST ELEMENTS of the first three position?**

In [17]:
# this does not work anymore as it only shows the first position
py_list_2d[:3][0]


[1, 'A']

In [18]:
np_array_2d[:3, 0] # this works

array(['1', '2', '3'], dtype='<U11')

In [34]:
py_list_2d[3:6][0] # does not work as expected


[4, 'D']

In [36]:
first_elements = [sublist[0] for sublist in py_list_2d]
print(first_elements)
#to list out all the first elements in the list

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [39]:
first_elements = []
for sublist in py_list_2d:
    first_elements.append(sublist[0])
print(first_elements)
#another way to print out all the first elements in the list

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [20]:
np_array_2d[3:6, 0] # first elements of position 4 to 6

array(['4', '5', '6'], dtype='<U11')

In [21]:
np_array_2d[:, 0] # first elements of all positions

array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U11')

## 1.4 Growing lists

**Advantages of lists**
- can grow their sizes easily, which Numpy cannot do

**Multiplication**: create a larger list from a smaller one

In [22]:
x=[1, 2]*5
x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

**Three different ways to grow the list by appending one element at a time**

In [23]:
# first solution
x=[1]
x= x + [2]
x= x + [3]
x= x + [4]
x

[1, 2, 3, 4]

In [24]:
# second solution
x=[1]
x+= [2]
x+= [3]
x+= [4]
x

[1, 2, 3, 4]

In [25]:
#third solution with .append() function
x=[1]
x.append(2)
x.append(3)
x.append(4)
x

[1, 2, 3, 4]

<font color='blue'> **.append()** runs the fastest by abut 1.5 faster than the rest </font>

**Three different ways to grow the list with multiple elements**

In [26]:
# First solution
x = [1, 2, 3]
x += [4, 5, 6]
x

[1, 2, 3, 4, 5, 6]

In [27]:
# Second solution
x=[1, 2, 3]
x.extend([4, 5, 6])
x


[1, 2, 3, 4, 5, 6]

In [28]:
# Third solution
x=[1, 2, 3]
x.append([4, 5, 6])
x

[1, 2, 3, [4, 5, 6]]

In [29]:
x[3] # with .append(), the [4,5,6] inserted is now ONE index, not three

[4, 5, 6]

In [30]:
x[3][0] # the first element of index 3

4

# Some loose ends

## 1.5 Tuples

Tuples are similar to lists, but they are **immutable**

In [31]:
a=(1,2,3) # [for lists], (for tuples)

In [32]:
# access the data
a[0]

1

In [33]:
# but you cannot change the data 
# The following will NOT work
a[0]=-1
a[0]+= [10]

TypeError: 'tuple' object does not support item assignment

But you can change the data for list

In [None]:
b=[1,2,3]
b[0]=-10
b

In [None]:
b=[4,5,6]
b[0]+=[10]
b # this will not work because the index 0 is not a list, it is a integer


In [None]:
b=[[4],5,6]
b[0]+=[10]
b # by changing the index 0 into a list, now we can add a list into the list at index 0

## 1.6 Be VERY careful when copying

In [None]:
x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!
y             # Actually we can do this but now the list x,y and z become linked

In [None]:
y+=[100]
y

In [None]:
x

In [None]:
print(id(x), id(y), id(z)) # yes they are the same now

### Build duplicate copies with .copy instead

In [None]:
x=[1, 2, 3]
y=x.copy()
z=x.copy()
print(x, y, z)

In [None]:
print(id(x), id(y), id(z)) # Now they are NOT linked

## Footnotes